CN103997515A

CN103997515A - Distributed cloud computing center selection method and application thereof

Info

Publication number: CN103997515A
Application number: CN201410172326.2A
Authority: CN
Inventors: 沈玉龙; 宗旋; 张琪; 姜晓鸿; 裴庆祺; 张华庆
Original assignee: Xidian University; Kunshan Innovation Institute of Xidian University
Current assignee: Xidian University; Kunshan Innovation Institute of Xidian University
Priority date: 2014-04-25
Filing date: 2014-04-25
Publication date: 2014-08-20
Anticipated expiration: 2034-04-25
Also published as: CN103997515B

Abstract

The invention discloses a distributed cloud computing center selection method, belongs to the technical field of cloud platform data processing, and provides a solution for the problem that computing center selection is high in communication cost in the prior art. According to the method, the computing center of a sub-graph is selected based on the length of the longest path in the sub-graph, and all the vertexes of the sub-graph pass the longest path and only pass the longest path once. The technical defect of a random strategy and a greedy strategy in computing center selection is effectively overcome, virtual machines required by users can be provided, the performance and reliability required for cloud mass data processing are improved, the cost of communication between computing centers is reduced, and the overall unnecessary resource consumption is reduced.

Description

A kind of distributed Yun Zhong computer center's system of selection and application thereof

Technical field

The invention belongs to cloud platform data processing technology field, relate to a kind of distributed Yun Zhong computer center's system of selection and application thereof, be specifically related to a kind of for realizing the distributed Yun Zhong computer center's system of selection and the application thereof that reduce the communication cost between computer center.

Background technology

Distributed cloud computing system is made up of the sub-computer center that is dispersed in various places, and each sub-computing system is made up of local computational resource and storage resources and Internet resources, for user provides the service such as storage grade of calculating.Wherein, the computer center that is distributed in diverse geographic location connects by wide area network.

Distributed cloud computing system has advantages of many with respect to centralized cloud computing system.Distributed cloud computing mode is low to the dependence of the network bandwidth, because the processing of data and storage are all to carry out in local cloud computing subsystem, the customer data that each subsystem need to be served on the one hand greatly reduces, each subsystem only need to be processed the service request that the own near client of distance sends, effectively reduce like this offered load, improved the response time of Processing tasks.Distributed cloud computing mode has high flexibility, each subsystem can be independent process be user's specific demand, the ability of the task that processing demands complicated calculations amount is provided of also can mutually cooperating.It also has robustness height and the strong feature of extensibility, and in distributed cloud computing system, subsystems is independently disposed the running that can not have influence on other subsystems.In the huge task process of computational processing, module priority allocation large amount of calculation can be processed to the computer center of load free time, be realized the load balancing of whole system.In distributed cloud computing system, even if a certain subsystem breaks down, the task dynamic migration of this subsystem is processed to the sub-computer center of normal operation, can not caused whole cloud computing system to be paralysed, complaints are heard everywhere also can break in service not make user.Build in distributed system process, subsystems can utilize existing calculating storage resources to reduce the expense of building.

Application program will obtain optimum performance at cloud platform, reduces the unnecessary resource consumption of entirety, is key factor for application program provides the placement of the virtual machine of service.In the time that user asks to arrive, a virtual machine that computer center may not have enough resource support users to need, particularly particularly general for the task of processing large data mass data, cloud platform needs multiple computer centers jointly for user's request provides service, the overall communication cost reducing between computer center will contribute to user program to obtain better performance, and the overall communication therefore how reducing between computer center becomes a key issue.

In the time that user asks the virtual machine of some, cloud platform has two kinds of strategies to process conventionally.One is randomized policy, selects a computer center to process user's request at random in distributed cloud system, the virtual machine needing for user assignment.Another kind is Greedy strategy, and in distributed cloud system, the computer center of selection capacity maximum processes for user request, in the time there is this computer center off-capacity, then selects one of capacity maximum in remaining computer center to provide service for user's request.But, be the virtual machine number of meeting consumers' demand, no matter be randomized policy or Greedy strategy, all can cause excessive network service cost, its underlying cause is very low at the probability of selecting to select when computer center Optimal calculation center.Therefore, in distributed cloud computing in the urgent need to find a kind of computer center's system of selection that reduces network service cost as far as possible, to make up the defect of above-mentioned two kinds of selection strategies.

Summary of the invention

In the time selecting computer center, easily cause the technical problem of excessive network service cost for overcoming existing randomized policy and Greedy strategy, the present invention is directed to this situation that user submits required virtual machine quantity to, a kind of distributed Yun Zhong computer center's system of selection and application thereof are provided, ensureing that user can obtain under the prerequisite of cloud service, provide and reduce network service cost, reduce unnecessary network resources waste.

The technical scheme that the present invention takes is:

The system of selection of the distributed Yun Zhong of one computer center, comprises the steps:

1) set the non-directed graph G=(V, E) of distributed Yun Zhong computer center, wherein V is the set on summit in non-directed graph G, represents the computer center in distributed cloud, and E is the set on limit in non-directed graph G, represents the connection between different computer centers;

2) in non-directed graph G, choose at random the set V' on a summit, the calculated capacity of the computer center of set V' representative is preset by user; Source point v0 taking set in V', as starting point, around the nearest summit of distance source point v0 joins source point v0 in set V', if the calculated capacity that the computer center in set V' has reaches predefined calculated capacity, stops adding of summit; If the calculated capacity that the computer center in set V' has does not reach predefined calculated capacity, continue source point v0 to join in set V' on the nearest summit of distance source point v0 around;

3) stop to meeting the set V' that summit adds, in set V', the length sum on the limit of connected graph is as the inside nose section between two summits, using this inside nose section as Article 1 line segment;

4) by two summit composition vertex set V1 of Article 1 line segment in set V', the every other summit of removing two summits of Article 1 line segment forms another vertex set V2; The summit of calculating in vertex set V2 is gathered outside nose section and the outside longest path length on the summit of V1 to the limit, two summit composition set U of outside nose section; Two summits by set in U join in vertex set V1, and by two summits of outside nose section from set V ₂middle removal; Then, the length of the outside nose section calculating and outside longest path length are summed up, and using this add and the length value that obtains as the outside longest path length of next iteration process;

5) described step 4) be iterative process, the end condition of iterative process is until vertex set V2 is empty set; If vertex set V2 meets the end condition of iterative process, obtain outside longest path length and the optimum subgraph G' that is distributed with selective Optimal calculation center.

The technical scheme that the present invention takes further comprises:

The application of a kind of distributed Yun Zhong computer center system of selection, according to the mission requirements of user Xiang Yun platform trustship, adopt the system of selection of described distributed Yun Zhong computer center, in cloud platform, screen optimum subgraph G', the Optimal calculation center in optimum subgraph G' provides the virtual machine of meeting consumers' demand;

Suppose that the resource requirement that meets application program or required by task that cloud platform provides is (θ ₁, θ ₂..., θ _n), in cloud platform, there is the dissimilar virtual machine of m kind, the virtual machine that type is k takies resource for (c _1k, c _2k..., c _nk), defrayment is p _k;

Allow user under the prerequisite that obtains enough resources, in described optimum subgraph G', select Optimal calculation center, the computing formula of minimum payment expense that the virtual machine that Optimal calculation center provides takies resource is as follows:

\begin{matrix} Minimize & Σ_{i = 1}^{m} x_{i} p_{i} \end{matrix}

Wherein, i represents the virtual machine that the type of cloud platform existence is i, the span [1, m] of i; x _irepresent the virtual machine demand number that type is i; p _irepresent the virtual machine defrayment that type is i;

The computing formula of described expenditure minimum cost meets following constraints:

Wherein, (x ₁, x ₂..., x _m) represent the virtual machine demand number of every type.

Further, suppose to have n computer center in described optimum subgraph G', the capacity of each computer center is respectively d ₁, d ₂..., d _n; The distributed task scheduling of user's request is made up of m sub-task, and the required virtual machine number in each subtask is respectively g ₁, g ₂..., g _m; The distributed task scheduling of user's request needs N virtual machine, and meets g ₁+ g ₂+ ... + g _m=N;

The variable that some virtual machines are placed by certain computer center is p _j(y _j), j ∈ 1,2 ..., n}, variable p _j(y _j) represent that the j of computer center is at current placement y _javailable bandwidth size under the condition of individual virtual machine, variable p _j(y _j) reduce along with placing the increase of virtual machine number;

Variable x _ijk, i ∈ 1,2 ..., N}, j ∈ 1,2 ..., n}, k ∈ 1,2 ..., m} represents that virtual machine i, for subtask k provides service, is placed in the j of computer center simultaneously;

Make the maximized target formula of available bandwidth sum between computer center as follows:

\begin{matrix} Maxmize & \overset{N}{Σ} \overset{n}{Σ} \overset{m}{Σ} x_{ijk} p_{j} (y_{j}) \end{matrix}

Describedly make the maximized target formula of available bandwidth sum between computer center meet following constraints:

Constraints one:

\begin{matrix} Σ_{j = 1}^{n} Σ_{k = 1}^{m} x_{ijk} = 1, & forall & i = 1,2, . . ., N \end{matrix}

Constraints two:

\begin{matrix} Σ_{j = 1}^{N} Σ_{k = 1}^{m} x_{ijk} = y_{j} \leq d_{j}, & forall & j = 1,2, . . ., n \end{matrix}

Constraints three:

\begin{matrix} Σ_{i = 1}^{N} Σ_{j = 1}^{n} x_{ijk} = g_{k}, & forall & k = 1,2, . . ., m \end{matrix}

Wherein, the variable x in constraints one _ijk{ 0,1}, is an integer variable to ∈, and certain concrete virtual machine is only that a subtask service can only be placed in a computer center simultaneously; Constraints two represents to distribute the virtual machine number of certain computer center to be less than the capacity that equals this computer center; Constraints three represents that the virtual machine number sum that all computer centers provide for certain subtask equals the required virtual machine number in this subtask.

Beneficial effect of the present invention:

For make application program cloud platform obtain best should be able to, need to rationalize and be placed as application program the virtual machine of service is provided, to reduce the unnecessary resource consumption of entirety.The present invention is directed to this situation that user submits required virtual machine quantity to, the system of selection of a kind of distributed Yun Zhong computer center is provided.The method has been established and has been judged the standard of selecting computer center in suitable subgraph in cloud platform, not only can provide the virtual machine of meeting consumers' demand number for user, also realize and reduced the communication cost between computer center, effectively overcome existing randomized policy and the Greedy strategy technical deficiency in the time selecting computer center.The present invention has improved the required Performance And Reliability of satisfying magnanimity data processing, can realize in distributed cloud mass data is processed accurately and rapidly.

Below with reference to embodiment, the present invention is elaborated further.

Figure of description

In Fig. 1 non-directed graph, find suitable subgraph and longest path flow chart.

Longest path schematic diagram in Fig. 2 subgraph.

Fig. 3 processes the longest path length of the single request of 1000 virtual machines of application.

The large request of 50 to 100 virtual machines of Fig. 4 processing demands.

The little request of 10 to 20 virtual machines of Fig. 5 processing demands.

The Performance Ratio that Fig. 6 asks algorithms of different for the user of 100 virtual machines of demand.

Embodiment

Embodiment 1:

In cloud computing platform service research, a crucial research point is reasonable distribution and the dynamic dispatching of resource.In the time that development of user hosts applications is served, cloud platform need to provide the operation with support application program of concrete virtual machine, and this is to be typically also that the most ripe cloud computing service mode-infrastructure serve (IaaS) at present.User has the mode of two kinds of application services available, and a kind of is the virtual machine quantity of submitting user's request to, waits for cloud platform response; Another kind is that user submits the application program of deployment trustship in cloud platform to, and cloud platform distributes required virtual machine run user program automatically.

Submit this situation of required virtual machine quantity to for user, cloud computing platform need to select one to meet the shortest computer center's set of communication distance mutually simultaneously of user's resource requirement for user.In order to realize the target that reduces the communication cost between computer center as far as possible, need to select optimum subgraph to provide service for user, the good and bad standard of weighing selected subgraph is very crucial.The standard that judges subgraph quality is herein the length of longest path that subgraph exists, in subgraph all summits through and only through a longest path.Process data or the request of identical scale, the time that task serial is carried out is greater than the time that task distribution formula is carried out.In the worst case, serial task is through the longest path in subgraph, and the time of its processing is the time upper limit that subgraph is processed all tasks of same size.The length of this longest path is the upper limit of communication distance in subgraph.

Based on above-mentioned thought, the present embodiment provides the system of selection of a kind of distributed Yun Zhong computer center, as shown in Figure 1, comprises the steps:

3) stop to meeting the set V' that summit adds, in set V', the length sum on the limit of connected graph is as the inside nose section between two summits, using this inside nose section as Article 1 line segment, and calculate the inside nose segment length of gathering Article 1 line segment in V';

4) by two summit composition vertex set V1 of Article 1 line segment in set V', the every other summit of removing two summits of Article 1 line segment forms another vertex set V2; The summit of calculating in vertex set V2 is gathered outside nose section and the outside longest path length on the summit of V1 to the limit, two summit composition set U of outside nose section; Two summits by set in U join in vertex set V1, and by two summits of outside nose section from vertex set V ₂middle removal; Then, the length of the outside nose section calculating and outside longest path length are summed up, and using this add and the length value that obtains as the outside longest path length of next iteration process;

Found as stated above capacity to meet the subgraph of virtual machine quantitative requirement, all summits in this subgraph, as much as possible near source point v0, form the topological diagram centered by source point v0.Obtain simultaneously into the length in the path farthest on all summits in subgraph, this is the upper limit of communication distance in this subgraph.Set V1 comprises formation two summits of (length) line segment length farthest, gathers V2 and comprises in required subgraph all summits of other summit in V1.

Found to be gathered in source point subgraph around centered by a point, and obtained connecting the longest path on all summits in subgraph, each summit was only entered once.Travel through all summits in former figure, in each subgraph obtaining, find the shortest subgraph of longest path, this subgraph is optimum subgraph.It is the performance lower limit that computer center in this subgraph carries out every other task that task was entered performance that longest path serial carries out in this subgraph.The performance of the computer center in other subgraphs is lower than optimum subgraph.

Embodiment 2:

In cloud platform, enterprise-class tools or development of user can directly submit to cloud platform the application program and the complex task that need trustship to.First cloud platform needs to meet the resource requirement of application program or task, simultaneously maximized reduction user's cost of use cost.

According to the mission requirements of user Xiang Yun platform trustship, adopt the distributed Yun Zhong computer center system of selection described in embodiment 1, in cloud platform, screen optimum subgraph G', the Optimal calculation center in optimum subgraph G' provides the virtual machine of meeting consumers' demand;

\begin{matrix} Minimize & Σ_{i = 1}^{m} x_{i} p_{i} \end{matrix}

Wherein, (x ₁, x ₂..., x _m) represent the virtual machine demand number of every type.The constraints of the present embodiment is to all restricted relations of each resource of demand, and the resource needing according to client, provides dissimilar virtual machine, and the virtual machine number of each type is in a scope.Calculated relationship can be calculated by lingo program the virtual machine demand (x of every type ₁, x ₂..., x _m).

Embodiment 3:

Method described in embodiment 1 is also applicable to the execution of distributed task scheduling.Suppose that two each summits of subgraph have onesize capacity, as shown in Figure 2, in figure, the distance of each grid is 1/3 length, and in A figure and B figure, the longest distance of point-to-point transmission is respectively with the longest path length of A figure is respectively and the longest path length of entering all summits of B figure is the point-to-point transmission longest distance of first subgraph is the longest distance that the longest distance of 2 of first subgraphs is less than second subgraph point-to-point transmission, when the longest path of second subgraph shorter, the situation that occurs the intensive gathering of computer center is described, mutual virtual machine is frequently placed in the computer center bunch of gathering, can reduce the overall bandwidth that task run needs, second subgraph going out taking longest path length as Standard Selection is also more helpful to the execution of distributed task scheduling.

Embodiment 4:

Because cloud computing needs data volume to be processed very huge large, the requirement of can not the satisfying magnanimity data processing required aspect such as performance and reliability of the calculating storage resources of single machine.Therefore in distributed data system, how mass data being processed accurately and quickly, is that one of key problem solving is badly in need of in current cloud computing.

The distributed programmed model of MapReduce meets the data processing under cloud computing environment and has been subject to extensive concern.MapReduce programming model is also in constantly improving, and the company that a lot of team and supporting increase income is respectively for its strategy execution efficiency of this model refinement with optimize inner strategy and the integrated existing outstanding system etc. of increasing income.Application based on MapReduce is more and more, becomes gradually the main contents of cloud platform trustship.Distributed task scheduling has become the main task of cloud computing processing.

The object of the method described in embodiment 1 is to select the optimum subgraph that service is provided for user task or application program.Suppose that this is the distributed task scheduling of main flow in a cloud computing, formed by m sub-task, considerably less for the communication between the service virtual machine cluster of different subtasks, between the virtual machine for the service of same subtask, communicate by letter frequent.Reach the communication cost that as far as possible reduces between computer center and overall communication cost, need in subgraph G ', be different subtask classifying rationally computer centers bunch, make as gathering in same computer center or the shortest computer center of phase mutual edge distance as much as possible with a virtual machine of subtask service.

Suppose to adopt in the optimum subgraph G' of the distributed Yun Zhong computer center system of selection screening described in embodiment 1 to have n computer center, the capacity of each computer center is respectively d ₁, d ₂..., d _n; The distributed task scheduling of user's request is made up of m sub-task, and the required virtual machine number in each subtask is respectively g ₁, g ₂..., g _m; The distributed task scheduling of user's request needs N virtual machine, and meets g ₁+ g ₂+ ... + g _m=N;

\begin{matrix} Maxmize & \overset{N}{Σ} \overset{n}{Σ} \overset{m}{Σ} x_{ijk} p_{j} (y_{j}) \end{matrix}

Constraints one:

\begin{matrix} Σ_{j = 1}^{n} Σ_{k = 1}^{m} x_{ijk} = 1, & forall & i = 1,2, . . ., N \end{matrix}

Constraints two:

\begin{matrix} Σ_{j = 1}^{N} Σ_{k = 1}^{m} x_{ijk} = y_{j} \leq d_{j}, & forall & j = 1,2, . . ., n \end{matrix}

Constraints three:

\begin{matrix} Σ_{i = 1}^{N} Σ_{j = 1}^{n} x_{ijk} = g_{k}, & forall & k = 1,2, . . ., m \end{matrix}

Embodiment 5:

The present embodiment is realized computer center's bunch division under method described in embodiment 1 and distributed task scheduling and verifies the reasonability of proposed virtual machine Placement Strategy on CloudSim platform.In simulation process, in CloudSim emulation platform, All hosts configuration is internal memory, the external memory of 2TB and the bandwidth of 1Gbps of 5G, and host-processor is single core processor, and processor speed is 1000,2000 or 3000MIPS.Cpu busy percentage is in 0 situation, and node consumes 170 watts/hour of electric energy, and when CPU full load, node consumes 260 watts/hour of electric energy.The internal memory 1G of virtual machine in CloudSim emulation platform, external memory 100GB, bandwidth is 250Mbps, it is 250,500,750 or 1000MIPS that the CPU processing speed that each virtual machine needs is distributed in.

For the performance of method described in test implementation example 1, the present embodiment has created 1000 × 1000 grid and random user's request, and the Output rusults obtaining after method described in embodiment 1 is processed is measured the longest path length of selected subgraph in Output rusults.Random being distributed on this 1000 × 1000 network in position of computer center, the distance of computer center is Euclidean distance between points on grid.Design 5 distributed cloud scenes, comprised respectively 100,75,50,25,10 computer center's numbers.The number that each distributed cloud has server is the same.Therefore, computer center has the number of server and the number of a computer center that cloud has is inversely proportional to.There is number that in the cloud of 100 computer centers, each computer center has server between 50 to 100.The number that has server for the each computer center of the cloud that has 50 computer centers is randomly dispersed between 100 to 200.

In the present embodiment, method described in embodiment 1 is called to Approx strategy.Strategy Approx and randomized policy Random and Greedy strategy Greedy contrast.Random strategy is a random processing user of the computer center request of selecting, if a computer center cannot provide alone service, then selects at random the next co-treatment user of computer center request.Greedy strategy is the computer center of selecting a capacity maximum provides service for user request, if a computer center cannot fulfil the task all by oneself, select again the computer center of next capacity maximum to process together user's request, so circulation, until the capacity of computer center's set meets user's request.

In first experiment, the single request of 1000 virtual machines of test request.As shown in Figure 3, shown the result of each strategy.Approx strategy is all obviously better than two other strategy in 5 scenes, nearly exceeds 75%.Random strategy has similar performance with Greedy strategy, and as can be seen from Figure 4 the longest path length of subgraph diminishes along with the minimizing of computer center's number.This is because physical host becomes many along with the minimizing of computer center's number.

Next be the contrast experiment of distinct methods.In first client requests set, the virtual machine number of continuous 100 applications of user between 50 to 100, obtains 100 subgraphs, and this request is called large request by the present embodiment.Calculate the mean value of the longest path length of all subgraphs.In second client requests set, user sends 500 requests continuously, asks the virtual machine between 10 to 20 at every turn, is referred to as little request.Calculate equally the mean value of all subgraph longest paths.Fig. 4 and Fig. 5 have provided respectively the mean value of the longest path length of large request and little request.As can be known from Fig. 5 and Fig. 6, Greedy strategy is better than Random strategy, exceeds respectively 33% left and right and 66% left and right.Approx strategy is better than Greedy strategy, exceeds respectively 83% left and right and 86% left and right.For same strategy, the longest path length of large request is greater than little request.This is because the virtual machine of large request may be placed in multiple computer centers, has increased the length of longest path.

In statistical computation in the experiment of overall communication amount in the heart, the request of submitting to a virtual machine to distribute to emulation platform, the virtual machine number of request is 100.Under the distributed cloud scene that has 2,3,4,5,6,7 and 8 computer centers, measure respectively and use the summation of communicating by letter between computer center in Approx strategy, Greedy strategy and the optimum subgraph of Random strategy gained respectively.The virtual machine sum that under each scene, computer center has is a random number between 100 to 200.The request dry run that each distributed cloud scene is distributed for a virtual machine 100 times, obtains average communication summation.As can be seen from Figure 6, for All Policies, communication cost between computer center becomes large along with the increase of computer center's number, this is because of the increase along with computer center's number, the available virtual machine number of computer center diminishes, distribute to user's virtual machine and place in multiple computer centers, communication cost increases thereupon.The performance of Greedy strategy performance is better than Random strategy, and Approx strategy is better than Greedy strategy, and along with the increasing of computer center, gap is increasing.

The present invention has been done to further narration above, but the present invention is not limited to above-mentioned execution mode in conjunction with the embodiments, in the ken that one skilled in the relevant art possesses, can also under the prerequisite that does not depart from aim of the present invention, makes a variety of changes.

Claims

1. a distributed Yun Zhong computer center system of selection, is characterized in that comprising the steps:

2. the application of distributed Yun Zhong computer center system of selection, it is characterized in that: according to the mission requirements of user Xiang Yun platform trustship, in cloud platform, screen optimum subgraph G', provide the virtual machine of meeting consumers' demand according to the Optimal calculation center in optimum subgraph G';

\begin{matrix} Minimize & Σ_{i = 1}^{m} x_{i} p_{i} \end{matrix}

3. the application of distributed Yun Zhong according to claim 2 computer center system of selection, is characterized in that:

Suppose to have n computer center in described optimum subgraph G', the capacity of each computer center is respectively d ₁, d ₂..., d _n; The distributed task scheduling of user's request is made up of m sub-task, and the required virtual machine number in each subtask is respectively g ₁, g ₂..., g _m; The distributed task scheduling of user's request needs N virtual machine, and meets g ₁+ g ₂+ ... + g _m=N;

Variable x _ijk, i ∈ 1,2 ..., N}, j ∈ 1,2 ..., n}, _k∈ 1,2 ..., m} represents that virtual machine i, for subtask k provides service, is placed in the j of computer center simultaneously;

\begin{matrix} Maxmize & \overset{N}{Σ} \overset{n}{Σ} \overset{m}{Σ} x_{ijk} p_{j} (y_{j}) \end{matrix}

Constraints one:

\begin{matrix} Σ_{j = 1}^{n} Σ_{k = 1}^{m} x_{ijk} = 1, & forall & i = 1,2, . . ., N \end{matrix}

Constraints two:

\begin{matrix} Σ_{j = 1}^{N} Σ_{k = 1}^{m} x_{ijk} = y_{j} \leq d_{j}, & forall & j = 1,2, . . ., n \end{matrix}

Constraints three:

\begin{matrix} Σ_{i = 1}^{N} Σ_{j = 1}^{n} x_{ijk} = g_{k}, & forall & k = 1,2, . . ., m \end{matrix}