CN109447264B

CN109447264B - Virtual machine placement genetic optimization method based on VHAM-R model in cloud computing environment

Info

Publication number: CN109447264B
Application number: CN201811079838.9A
Authority: CN
Inventors: 陆佳炜; 赵伟; 李�杰; 吴涵; 肖刚; 高燕煦
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2018-09-17
Filing date: 2018-09-17
Publication date: 2021-11-23
Anticipated expiration: 2038-09-17
Also published as: CN109447264A

Abstract

A virtual machine placement genetic optimization method based on a VHAM-R model in a cloud computing environment comprises the following steps: the first step is as follows: the following formalized description is proposed for the virtual machine placement problem, the process is as follows: 1.1 defining a placement environment; 1.2 defining the resource state; 1.3 host availability; 1.4 calculating the electric energy consumption; 1.5 defining virtual machine placement; the second step is that: setting constraint conditions and optimization targets for virtual machine placement; the third step: creating a model: based on the constraint conditions and the optimization target of virtual machine placement given in the second step, establishing a virtual hierarchical structure model VHAM-R based on a Rendervous hash algorithm, and using the virtual hierarchical structure model VHAM-R to optimize and decide the selection process of the virtual machine to the host; the fourth step: and (3) improving the operation of the genetic algorithm based on the VHAM-R model. The invention improves the execution efficiency of the algorithm and the optimization of the finally obtained solution set.

Description

Virtual machine placement genetic optimization method based on VHAM-R model in cloud computing environment

Technical Field

The invention relates to the field of virtual machine placement in a cloud computing environment, in particular to optimization and improvement of virtual machine placement through encoding, selection operation, cross operation and mutation operation of a genetic algorithm on a VHAM-R model.

Background

Cloud computing is derived from distributed computing and grid computing, is a computing mode completely based on the internet, and provides low-cost, high-reliability and scalable computing resources and services for users according to a pay-as-needed mode. The basic idea of cloud computing is to provide physical device support through huge data centers distributed around the world, and provide high-quality computing and storage services for users through the internet at a low price based on a virtualization technology. Virtual Machine Placement (Virtual Machine platform) is a binning problem between cloud data center Virtual machines and physical hosts, an important component of resource management and allocation in a cloud computing environment. The essence of the problem is that the virtual machine is placed on a better physical node through a reasonable distribution method, and meanwhile, the resource requirement and specific constraint condition for running the virtual machine need to be met, so that the problem is an NP-hard problem. The good virtual machine placement strategy can effectively improve the resource utilization rate of each physical host of the cloud data center, reduce the overall energy consumption of the cloud data center, ensure the usability of user requirements and the like.

Z Zhang, CC Hsu and the like take the scale of a data center, the workload of a host and the requirement change of computing resources into consideration, and an energy-saving framework for placing a virtual machine on the host is provided, so that the problems of minimum resource waste and minimum energy consumption of a resource allocation virtual machine are effectively solved. X Li and Z Qian et al investigated the problem of selecting the appropriate physical host to deploy a virtual machine at runtime. As the physical host resources have multidimensional property and the use imbalance of the multidimensional resources can cause resource waste, the multidimensional resource partitioning model is provided for balancing the utilization rate of the multidimensional resources and reducing the number of the running physical hosts, so that the energy consumption of the data center is reduced. The domestic Liqiang et al provides a model based on long-term load performance for the virtual machine placement problem in the cloud computing environment, and effectively reduces the number of used physical host nodes by combining a genetic algorithm of multi-objective optimization.

The Genetic Algorithm (GA) is proposed by Holland in 1975 by referring to natural selection of the biological evolution theory and the biological evolution process in genetics, and is an algorithm for finding the optimal solution of the NP-hard problem by simulating the biological evolution process. Compared with the common virtual machine Placement algorithm under cloud computing based on ANSYS cluster finite element analysis, the virtual machine Placement algorithm comprises the classic algorithms such as First In First Service (First In First Service), First adaptive algorithm (Fist discovery), First adaptive descent algorithm (Fist First discovery), Best adaptive algorithm (Best Fit), Best adaptive descent algorithm (Best Fit), Greedy Placement algorithm (Greedy Placement), and the like, and the genetic algorithm can better find the global optimal solution.

Disclosure of Invention

In order to place the group of virtual machine placement requests on the server nodes, the optimization target of high virtual machine placement availability with fewer working hosts and low energy consumption of the data center is realized on the premise of meeting the constraint conditions of placement, resources, communication accessibility and the like. The invention provides an improved genetic algorithm for processing the virtual machine placement problem, and the encoding mode, selection, mating process and variation mode of the virtual machine placement problem are improved based on a VHAM-R model, so that the execution efficiency of the algorithm is improved, and the finally obtained solution set is optimized.

The invention provides the following technical scheme for solving the technical problems:

a virtual machine placement genetic optimization method based on a VHAM-R model in a cloud computing environment comprises the following steps:

the first step is as follows: the following formalized description is proposed for the virtual machine placement problem, the process is as follows:

1.1 define the placement environment, and the data center has a physical host set PM ═ PM₁,pm₂,…,pm_nN, the number of hosts is n, and a virtual machine set VM ═ VM needs to be placed₁,vm₂,…,vm_mAnd where the number of virtual machines is m, and assuming that the number of virtual machines m is greater than or equal to the number of hosts n, defining a set of virtual machine placement groups P ═ { P ═ P₁,p₂,…,p_hH is the number of the placing groups;

1.2 defining resource states, vm for a given virtual machine_iDefinition of

For virtual machines vm_iThe required resources of the CPU are used,

for virtual machines vm_iRequired memory resource, V_i-pesFor virtual machines vm_iCPU utilization of W_i-ramFor virtual machines vm_iMemory utilization of, for a given host pm_jDefinition of

Is a main engine pm_jThe current resources of the CPU are free from the CPU,

is a main engine pm_jFree resources of memory, U_j-pesIs a main engine pm_jCPU utilization, U_j-ramIs a main engine pm_jThe memory utilization rate of (1) then defines the host pm_jResource utilization rate U_jComprises the following steps:

U_j＝αU_j-pes+βU_j-ram

0< α <1,0< β <1, and α + β ═ 1;

definition of Tag_ijAt the current time t, the host pm_jWhether or not to satisfy vm of virtual machine_iResource requirements of, i.e.

1.3 host availability, availability of a node means the probability of a node working at any time during the entire service time, for any network component i, its availability A_iObtained by the following formula:

wherein MTTF represents mean time to failure, MTTR represents mean time to repair, assuming that the value of server availability is known and the availability between servers is independent and independent of each other;

1.4 calculating the power consumption, and in a cloud data center with n running physical hosts, for any physical host pm_jE PM, the power consumption at a certain time t is shown by the following formula:

wherein c is_jFor static energy consumption marking, f_j(t) is time t host pm_jCPU frequency, CPU utilization of U_j-pes(t), k is a constant coefficient, i.e. the power consumption is to some extent based on a linear model of the CPU utilization,

1.5 defining virtual machine Placement, VM set by Placement group p_kE.g. P, selecting a host in the corresponding physical host set to complete placement mapping, and defining a virtual machine placement matrix M by meeting various constraint conditions in the placement process as much as possible_k[i][j]If M is present_k[i][j]1 denotes the placement group p_kPlace virtual machine j on physical host i, otherwise, if M is_k[i][j]0 denotes the placement group p_kIn (1), virtual machine j is not placed on physical host i,

the second step is that: setting constraint conditions and optimization targets for virtual machine placement, wherein the process is as follows:

2.1 regarding the virtual machine placement problem in the cloud environment, not only needs to consider meeting the requirements of virtual machine resources, but also needs to consider how to reduce the energy consumption of a data center and efficiently utilize the resources, and in addition, needs to consider the availability problem of the placement request; therefore, the constraints to be considered are: the maximum use number of the server nodes is minimum, the energy consumption is minimum, the load is balanced, and the availability of the placement request is high, and the following constraint conditions are provided:

2.1.1 placing constraints, arbitrary virtual machine vm_iUnder the same placing group, the two can be placed in one and only oneA server node;

the constraints represent:

for the

In which a group p is placed_k∈P；

In the same placement group, a single virtual machine is considered to be deployed and operated on only one server node;

2.1.2 resource constraints, for any server node, the consumption of each resource type should not exceed the upper limit, defining server pm_jRespectively, of CPU and memory capacity

And

represents;

the constraints represent:

for the

Is provided with

The parameter is a constant coefficient, a part of resources are required to be reserved by the server node to ensure the normal operation of the server node, r is less than or equal to 1, and the normal value is 0.8;

2.1.3 reachability constraints, defining a function F (m, n, D) for representing reachability of inter-node communications, for any link (m, n) e.L, if communication delays of points m and n are at most D, the function F (m, n, D) returns 1, otherwise returns 0;

2.2, the virtual machine placement problem has a plurality of optimization targets, and the virtual machine placement problem is optimized and researched in two aspects of availability and energy consumption;

2.2.1 usability optimization

Assuming that the user request consists of a virtual machine between n different VM pairs with associated communication requirements, it is placed on the same server node pm_jNot more than once, the usability of the placement cannot be improved because when pm_jAt failure, all placement is at pm_jWill fail simultaneously, and thus it is desirable to try to get vm to_iPlaced on different nodes to increase availability; by H_iTo represent placement of virtual machines vm_iThe maximum number of nodes, i.e. H_iRepresents vm_iThe maximum number of server nodes that can be placed; definition of

The node number of the n virtual machines is H at most;

the availability definition and calculation of virtual machine placement can be divided into three categories: single placement, fully protected placement, and partially protected placement;

2.2.1.1 Single Placement

Single placement means that each virtual machine is placed on only one server node, i.e. H ═ 1, in the case of single placement, if the availability of n server nodes is a respectively₁,A₂,…,A_nK virtual machines (n ≦ k) are placed on the n nodes, then the availability of the virtual machine placement scheme may be used as A_pExpressed, defined as follows:

since the request contains k virtual machines, the probability that all the k virtual machines are running needs to be considered when calculating the availability;

2.2.1.2 complete protective Placement

Full protection placement refers to placement of arbitrary virtual machines

Are all placed in group p_i(i is more than or equal to 1 and less than or equal to H) are placed on H different nodes; thus, a complete protection can be consideredThe placement plan P can be composed of H single placement plans, and in each single placement plan, the virtual machine pair should satisfy placement, resource and communication accessibility constraints;

availability of a full protection placement solution is the probability that there is at least one placement group working within the life cycle of the service, and the availability is calculated as follows:

2.2.1.3 partial protection Placement

Partial protection placement means presence of virtual machine vm_iE.g. VM, placed on less than H different nodes, i.e. two or more placement groups will virtual machine VM_iPlaced on the same node and having a certain virtual machine vm_jE.g. VM, so that H is greater than 1, and under the condition of placement of partial protection, if a virtual machine is placed on less than H nodes, the virtual machine can be considered to be placed by a plurality of placement groups together; the availability of the server node cannot be directly calculated by the formula in 2.2.1.2 because the availability of the server node where the shared virtual machine is placed is calculated twice; redefining the operator to handle the placement; suppose there are n nodes pm₁,pm₂,…,pm_nTheir availability is A₁,A₂,…,A_n(ii) a For availability of A_xNode pm_xThe following definitions are given for the operators:

then according to the formula in the above, define

For operations between different sets, the availability of protection placement is calculated by the following formula:

2.2.2 energy consumption optimization

According to the formula in 1.4, during the period T, the physical host pm_jTotal energy consumption of

Expressed as:

therefore, the following formula shows that the total energy consumption E of the servers of the data center is within the T period^TFor the sum of the energy consumptions of the various running servers:

the third step: creating a model

Based on the constraint conditions and the optimization target of virtual machine placement given in the second step, a virtual hierarchical structure model (VHAM-R) based on a Rendervous hash algorithm is established for optimizing and deciding the selection process of the virtual machine to the host, and the steps are as follows:

3.1 initializing a host set PM and a virtual machine set VM, wherein the number of the virtual machines to be placed in the group H at most and the availability set A of host nodes are obtained, if the number n of the host is less than 4, the step 3.2 is carried out, otherwise, the step 3.3 is carried out;

3.2 when the number n of hosts is less than 4, i.e. the number 2 x 2 of the minimum bi-level virtual structure cannot be constructed, defining that the host has a set W of assigned weight scores corresponding to each host_i＝{w_i1,w_i2,…,w_ikWhere k.gtoreq.n defines w_ijFor virtual machines vm_iAt host pm_jA weight score of (1), w_ij＝h(vm_i,pm_j) Wherein the function h (vm, pm) is a hash function agreed in the Renderkvous hash algorithm, and then passes through h (vm_i,pm_j) Virtual machine vm_iIs assigned to a weight w_ijMaximum host pm_jIf the host pm_kIs h times that of other hosts, pm_kThe equal share is divided into h shares, and the probability of the virtual machine distributed to the host is h times of that of other hosts;

3.3 the number n of the hosts is more than or equal to 4, for the host cluster division, firstly, a constant z is selected, namely the number of the hosts in each cluster is z, and the host set is C-ceiling (n/z), wherein the ceiling function represents that the value of dividing n by z is rounded up to the nearest integer, C₀＝{cpm₁，cpm₂，…，cpm_z}，C₁＝{cpm_z+1，cpm_z+2，…， cpm_2z}, … until each host belongs to a cluster, each cluster being the lowest node in the virtual hierarchy;

the fourth step: the genetic algorithm operation improvement based on the VHAM-R model comprises the following steps:

4.1 for solving the process of placing the virtual machine in the server node, a grouping coding mode based on the host node cluster is provided, P represents an individual for the placing group, C_iRepresenting that the host clusters correspond to chromosomes and the host on each host cluster corresponds to a gene;

4.2, selecting individuals containing good genes in the population through a fitness function set by the selection operation, and taking the energy consumption of the data center and the placement availability of the virtual machines as optimization targets;

4.3 crossover operation in genetic algorithm is to simulate the mating process between individuals in the biological world, and achieve gene recombination between parents through mating, so that the offspring obtains a new chromosome containing an excellent gene and generates more excellent offspring;

4.4 mutation operations in genetic algorithms.

Further, in step 3.3, the process of dividing the host cluster includes:

3.3.1 determining virtual leaf node sectors and the depth of a virtual hierarchical structure, selecting the leaf number f of each sub-node sector in the virtual hierarchical structure, wherein f is an integer, and selecting proper f and z can enable the obtained algorithm benefit, load balance degree and the like to be approximate to expectations; according to the leaf number f of the node sector and the number z of the host clusters, the depth d of the virtual hierarchical structure can be obtained:

f^d≥c

wherein d is the smallest positive integer, and d is calculated so that the above formula holds;

3.3.2 numbering each virtual leaf node sector, numbering each sector uniformly by adopting natural numbers respectively, from 0, 1, 2, …, f-1;

3.3.3 for a certain virtual machine vm_iFor any virtual node s, there is a corresponding weight w_is＝h(vm_i,s)，h(vm_iS) includes a hash function of a convention, such as hash32, hash64, etc., that passes through h (vm) at each level of the virtual hierarchy_iS) calculating the weight of each virtual node, selecting the node with the highest score to continue layering downwards until the real host node cluster Cx to the bottom layer is selected;

3.3.4 virtual machine vm_iSelecting a real host node cluster C_xLater, when real node selection is carried out, the fact that any real node cluster C exists is assumed_xHost node cpm of (1)_xz+jAll have a corresponding weight score W_i(xz+j)＝H(vm_i,cpm_xz+j)*Tag_i(xz+j)If Tag is false, the value is 0, and if true, the value is 1; wherein the virtual machine vm is used_iCpm assigned to host_xz+jThen, H (vm)_i,cpm_xz+j) For the same period of T, E_oldAnd assigning virtual machines vm_iReal host node cluster C_xRatio of total power consumption of to the host pm_xz+jResource utilization rate U_xz+jProduct of the difference with 1 and the corresponding weight constant and sum of host availability and coefficient product:

wherein E_xz+jFor the cpm of the host in the T time period_xz+jEnergy consumption of E_oldMeans that in the same T time period, when no new virtual machine is allocated, the virtual machine is realHost node cluster C_xEnergy consumption of (2); α, β, γ are weights representing the three; a. the_xz+jCpm for the host_xz+jAvailability of (2);

3.3.5 loop through steps 3.3.3-3.3.4 until all virtual machines vm_iFinally, the weighting score W is selected_i(xz+j)The highest host node completes the allocation.

Still further, in step 4.2, the process of selecting operation is as follows:

4.2.1 when H1, for each individual, there is a virtual machine placement case, so its availability is calculated as a single placement, i.e. for individual P, its placement availability

Wherein the individual comprises n genes;

4.2.2 when H >1, availability for that H individual needs to be handled in a computationally intensive manner for protected placement;

4.2.2.1 for arbitrary virtual machines vm_iE.g. VM, each of which is placed a group p_iThe virtual machines are placed on H different nodes, i is more than or equal to 1 and less than or equal to H, namely for H individuals, all the virtual machines on each gene are different, and under the condition, the usability of the H placement groups is processed according to a calculation mode of complete protection placement;

4.2.2.2 virtual machine vm if present_iE VM, virtual machine VM in two or more individuals_iPlaced on the same gene, in which case the availability of the H placement groups is handled in a computational manner of partial protection placement;

4.2.3 combine data center energy consumption optimization to give the fitness function of the following genetic algorithm:

wherein E is_minThe minimum value of the energy consumption of the data center in the T time period is obtained;

energy consumption for a single individual; when H ═ 1, availability is calculated in a single placement manner; when H is present>1, the availability calculation is divided into full protection placement and partial protection placement, wherein x is an individual group with the number of H; for an individual or an individual group x, if the lower the energy consumption of the data center and the higher the availability, the larger the value of the fitness function is, and the higher the selected probability that the good genes are inherited to the next generation is.

Furthermore, in step 4.3, the process of the interleaving operation is as follows:

4.3.1 first choose two parents to be mated, named X, Y, randomly choose a certain node cluster containing one or more genes in the X parent as a part to be crossed, insert the node cluster, namely all the genes in the node cluster, into the cross point of the Y parent, and then generate new filial generation containing X, Y parent genes;

4.3.2 after completing gene insertion, because the invention uses the chromosome grouping coding mode based on the host cluster, the same host cluster may appear, if the situation appears, the inserted gene is merged into the original host cluster;

4.3.3 if the same two virtual machines exist on different host nodes, temporarily removing the host nodes which previously contain the same virtual machines from the chromosome codes;

4.3.4, the host nodes are temporarily eliminated, and virtual machine nodes which are not deployed by other hosts are possibly included, for the case, the virtual machines need to be recoded into the host nodes, and the genes which meet the constraint conditions and have the lowest energy consumption and the highest availability in the chromosome are preferably selected to complete the distribution;

4.3.5 if all genes do not meet the requirements, regenerating a new gene segment according to the VHAM-R model, interchanging two parent individuals, and repeating the crossing process to generate a second filial generation individual.

In the step 4.4, the mutation process includes:

4.4.1 determining the individual chromosome gene to be mutated by the mutation function, as shown in the following formula:

wherein U is_j-pes、U_j-ramRespectively the CPU and the internal memory utilization ratio of the host,

is a set parameter;

4.4.2 selection of f_c(j) The smaller gene is deleted, so that the gene with lower utilization rate is deleted every time,

4.4.3 inserting the virtual machine on the gene into other genes by a 4.3 cross operation method.

The invention has the beneficial effects that: and improving the encoding mode, selection, mating process and variation mode of the VHAM-R model based on the VHAM-R model, thereby improving the execution efficiency of the algorithm and the optimization of the finally obtained solution set.

Drawings

FIG. 1 is a VHAM-R model of the invention.

FIG. 2 shows the host cluster-based chromosomal coding of the present invention.

FIG. 3 is a schematic diagram of the crossover operation of the improved genetic algorithm of the present invention.

FIG. 4 is a schematic diagram of the variant operation of the improved genetic algorithm of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 to 4, a virtual machine placement genetic optimization method based on a VHAM-R model in a cloud computing environment includes the following steps:

1.1 define the placement environment, and the data center has a physical host set PM ═ PM₁,pm₂,…,pm_nN, the number of hosts is n, and a virtual machine set VM ═ VM needs to be placed₁,vm₂,…,vm_mAnd where the number of virtual machines is m, and assuming that the number of virtual machines m is greater than or equal to the number of hosts n, defining a set of virtual machine placement groups P ═ { P ═ P₁,p₂,…,p_hH is the number of placed groups.

1.2 defining resource states, vm for a given virtual machine_iDefinition of

For virtual machines vm_iThe required resources of the CPU are used,

for virtual machines vm_iRequired memory resource, V_i-pesFor virtual machines vm_iCPU utilization of W_i-ramFor virtual machines vm_iThe memory utilization ratio. For a given host pm_jDefinition of

Is a main engine pm_jThe current resources of the CPU are free from the CPU,

U_j＝αU_j-pes+βU_j-ram

0< α <1,0< β <1, and α + β ═ 1.

1.3 host availability, availability of a node means the probability of a node working at any time during the whole service time, for any networkNetwork component i, availability A thereof_iCan be calculated by the following formula:

wherein c is_jFor static energy consumption marking, f_j(t) is time t host pm_jCPU frequency, CPU utilization of U_j-pes(t), k is a constant coefficient, namely the power consumption is a linear model based on the CPU utilization rate to a certain extent;

1.5 defining virtual machine Placement, VM set by Placement group p_kE, selecting a host in a corresponding physical host set to complete placement mapping, wherein the host belongs to P, and various constraint conditions in the placement process need to be met as much as possible; defining a virtual machine placement matrix M_k[i][j]If M is present_k[i][j]1 denotes the placement group p_kPlace virtual machine j on physical host i, otherwise, if M is_k[i][j]0 denotes the placement group p_kIn (1), virtual machine j is not placed on physical host i.

2.1 regarding the virtual machine placement problem in the cloud environment, not only needs to consider meeting the requirements of virtual machine resources, but also needs to consider how to reduce the energy consumption of a data center and efficiently utilize the resources, and in addition, needs to consider the availability problem of the placement request; therefore, constraints that need to be considered are as long as: the maximum use number of the server nodes is minimum, the energy consumption is minimum, the load is balanced, and the availability of placing the request is high; for the above points, the following constraints are proposed:

2.1.1 placing constraints, arbitrary virtual machine vm_iUnder the same placement group, it can be and can only be placed on one server node.

The constraints represent:

for the

In which a group p is placed_k∈P；

Description of the drawings: generally, in the same placement group, a single virtual machine is considered to be deployed and operated on only one server node;

2.1.2 resource constraint, for any server node, the consumption of each resource type should not exceed the upper limit, and the server resources are generally divided into CPU, memory, network bandwidth, disk resources, etc.; considering the resource condition of CPU and memory, defining server pm_jRespectively, of CPU and memory capacity

And

and (4) showing.

The constraints represent:

for the

Is provided with

Description of the drawings: wherein, the parameter r (less than or equal to 1) is a constant coefficient, a part of resources need to be reserved by the server node to ensure the normal operation of the server node, and the normal value is 0.8;

2.2 the optimization targets of the virtual machine placement problem are numerous, and typically comprise energy consumption optimization, network flow optimization, resource allocation optimization, availability optimization, performance optimization and the like; selecting two aspects of availability and energy consumption to carry out optimization research on the virtual machine placement problem;

2.2.1 usability optimization

Suppose a user request consists of a virtual machine between n different VM pairs with associated communication requirements (considering communication reachability), which is placed on the same server node (say pm)_j) Not more than once, the usability of the placement cannot be improved because when pm_jAt failure, all placement is at pm_jThe virtual machines on will fail at the same time; therefore, it is necessary to try to make vm as small as possible_iPlaced on different nodes to increase availability. By H_iTo represent placement of virtual machines vm_iThe maximum number of nodes, i.e. H_iRepresents vm_iMaximum number of server nodes that can be placed, define

The node number of the n virtual machines is H at most;

the availability definition and calculation of virtual machine placement can be divided into three categories: single placement, fully protected placement, partially protected placement;

2.2.1.1 Single Placement

2.2.1.2 complete protective Placement

Full protection placement refers to placement of arbitrary virtual machines

Are all placed in group p_i(1. ltoreq. i.ltoreq.H) are placed on H different nodes. Therefore, it can be considered that a full protection placement solution P may be composed of H single placement solutions, and within each single placement solution, placement, resource, and communication reachability constraints should be satisfied between pairs of virtual machines;

2.2.1.3 partial protection Placement

Partial protection placement means presence of virtual machine vm_iE.g. VM, placed on less than H different nodes (i.e. two or more placement groups will virtual machine VM)_iPlaced on the same node) and there is a certain virtual machine vm_jBelongs to VM, so that H is more than 1; under the condition of partial protection placement, if one virtual machine is placed on less than H nodes, the virtual machine can be considered to be placed together by a plurality of placement groups; its availability cannot be calculated directly by the formula in 2.2.1.2 because the availability of the server node where the shared virtual machine is placed would be calculated twice. To handle this type of placement, operators are redefined. Suppose there are n nodes pm₁,pm₂,…,pm_nTheir availability is A₁,A₂,…,A_n(ii) a For availability of A_xNode pm_xThe following definitions are given for the operators:

then according to the above formula, define

For operations between different sets, the availability of partial protection placement can be calculated by the following formula:

2.2.2 energy consumption optimization

In a cloud computing environment, energy consumption generated by a data center is mainly energy consumption of various devices, including a server, a storage device, a network communication device and the like; the server accounts for the vast majority of energy consumption, the virtual machine placement is optimized from the energy consumption perspective, and a certain optimization target can be achieved by directly or indirectly reducing the starting operation number of the server mainly by reducing the energy consumption of server equipment;

Expressed as:

therefore, the following formula shows that the total server energy consumption E of the data center is obtained in the T period^TThe sum of the energy consumption of each running server;

the third step: creating a model

Based on the constraint conditions and the optimization target of virtual machine placement given in the second step, a virtual hierarchical structure model (VHAM-R) based on a Rendervous hash algorithm is established for optimizing and deciding the selection process of the virtual machine to the host, and the basic idea of the Rendervous hash algorithm is that for each site S_jAnd each object thereof O_iThe corresponding weight can be calculated by the appointed hash function, and for each site S_jSelecting the object O with the greatest weight_mAnd is combined with O_mTo station S_j；

3.1 initializing a host set PM, a virtual machine set VM, the maximum number of groups H placed by the virtual machines and an availability set A of host nodes, and entering a step 3.2 if the number n of the host machines is less than 4, or entering a step 3.3 if the number n of the host machines is not more than 4. Referring to fig. 1, the number of hosts n in the figure is 108, and step 3.3 is entered;

3.2 when the number n of hosts is less than 4, i.e. the number 2 x 2 of the minimum bi-level virtual structure cannot be constructed, defining that the host has a set W of assigned weight scores corresponding to each host_i＝{w_i1,w_i2,…,w_ik(where k.gtoreq.n). Definition of w_ijFor virtual machines vm_iAt host pm_jA weight score of (1), w_ij＝h(vm_i,pm_j). The function h (vm, pm) includes a predetermined hash function, such as hash32, hash64, etc. . Wherein the function h (vm, pm) is a hash function appointed in the Renderzwaus hash algorithm, and then passes through h (vm)_i,pm_j) Function-to-virtual machine vm_iIs assigned to a weight w_ijMaximum host pm_j. If the host pm_kIs h times that of other hosts, pm_kIs divided equally into h portions, e.g. pm_k1,pm_k2,…,pm_kh. Obviously, the probability of the virtual machine being allocated to the host is h times that of other hosts;

3.3 the number n of the host computers is more than or equal to 4, and the cluster groups of the host computers are divided. First, a constant z is selected, i.e. the number of hosts in each cluster is z, which is known as 4 in fig. 1. The host set is changed to ceil (108/4) to 27 according to c (ceil (n/z) ═ ceil (108/4) (where the ceil function indicates that the value of n divided by z is upRounded to the nearest integer). C₀＝{cpm₁， cpm₂，…，cpm_z}，C₁＝{cpm_z+1，cpm_z+2，…，cpm_2zAnd … until each host belongs to a cluster. Each cluster is a bottommost node in the virtual hierarchical structure;

3.3.1 virtual leaf node sectors and virtual hierarchy depth determination. The leaf number f of each child node sector in the virtual hierarchical structure is selected, wherein f is a one-bit integer generally, and the selection of proper f and z can make the obtained algorithm benefit, load balance degree and the like closer to the expectation. As shown in fig. 1, the leaf number of each child node sector is 3, and according to the leaf number of the node sector 3 and the number of host clusters 27, the depth h of the virtual hierarchical structure can be obtained:

f^d≥c

wherein d is the minimum positive integer, and d is solved to be 3;

3.3.2 each virtual leaf node sector number. Generally, each sector is numbered uniformly by adopting natural numbers, namely 0, 1, 2, … and f-1;

3.3.3 for a certain virtual machine vm_iFor any virtual node s, there is a corresponding weight w_is＝h(vm_i,s)，h(vm_iS) includes a hash function of a convention, such as hash32, hash64, etc., which may be h (vm) at each level of leaf sectors of the virtual hierarchy_iS) calculate the virtual node weights, as in FIG. 1, starting from the root node because h (vm)_i,0)>max{h(vm_i,1),h(vm_i2), thus selecting (0)₁The node continues to go down; in the second level of three child nodes, because of h (vm)_i,00)>max{h(vm_i,01),h(vm_i02) } thus select (00)₁The node continues to go down; in the third layer of three child nodes, because h (vm)_i,000)>max{h(vm_i,001),h(vm_i002) }, thus selecting (000)₁Nodes, i.e. C₀The true position is a node cluster, and then the next step of selection is carried out;

3.3.4 virtual machine vm_iSelecting a real host node cluster C₀Then, inWhen selecting the real nodes, the method assumes that the real nodes are randomly in the real node cluster C₀Host node cpm of (1)_j(1. ltoreq. j. ltoreq.z), all have a corresponding weight score W_ij＝H(vm_i,cpm_j)*Tag_ij(if Tag is false, it is 0, and if true, it is 1). Wherein the virtual machine vm is used_iCpm assigned to host_jThen, H (vm)_i,cpm_j) For the same period of T, E_oldAnd assigning virtual machines vm_iReal host node cluster C₀Ratio of total power consumption of to the host pm_jResource utilization rate U_jProduct of the difference with 1 and the corresponding weight constant and sum of host availability and coefficient product:

wherein E_jFor the cpm of the host in the T time period_jEnergy consumption of E_oldMeans that when no new virtual machine is allocated in the same T time period, the real host node cluster C₀Energy consumption of (2). Alpha, beta, and gamma are weights representing the three, A_jIs a main unit pm_jHost availability. The 2 nodes in fig. 1 are the nodes selected finally;

3.3.5 loop through steps 3.3.3-3.3.4 until all virtual machines vm_iFinally, the weighting score W is selected_i(xz+j)The highest host node completes the distribution;

the fourth step: genetic algorithm operation improvement based on VHAM-R model

An improved genetic algorithm is provided for processing the virtual machine placement problem, and the encoding mode, the selection, the mating process and the variation mode of the virtual machine placement problem are improved based on a VHAM-R model, so that the execution efficiency of the algorithm and the optimization of a finally obtained solution set are improved;

4.1 the coding mode of the chromosome has important influence on the search effect and the algorithm efficiency of the genetic algorithm, the realization process of the coding is the process of mapping the problem solution to the chromosome, and the invention provides the process of solving the virtual machine and placing the virtual machine to the server node based on the hostAnd (4) a grouping coding mode of the node cluster. P represents a placing group, C_iIt means that the host clusters correspond to chromosomes, and the host on each host cluster corresponds to genes. FIG. 2 is a specific example of chromosomal encoding;

the data center in fig. 2 contains two placement groups p1 and p2, the placement request contains 8 virtual machines and a plurality of physical hosts, each placement group is composed of a plurality of host node clusters, each host node cluster has different numbers of host nodes, and each host node has different numbers of virtual machines. As in the left half of the figure, containing host clusters c1, c2, c4, host node E in host cluster c1, host node F in host cluster c2, and host node D in host cluster c 4.

Virtual machines

2, 5, 4 are placed in host node E,

virtual machines

3, 6, 7 are placed in host node F, and

virtual machines

1, 8 are placed in host node D, the placement set corresponding to 3 chromosome basis factors, i.e., chromosome EFD. As in the right half of the figure, containing host clusters C0, C3, C5, host node A, B in host cluster C0, host node G in host cluster C3, and host node C in host cluster C5.

Virtual machines

7, 6 are placed in host node a,

virtual machines

2, 4 are placed in host node B,

virtual machines

1, 5 are placed in host node G, and

virtual machines

3, 8 are placed in host node C, the placement group corresponding to a chromosome base factor of 4, i.e., chromosome ABGC. Changing an operation object from a single virtual machine to a host cluster containing a virtual machine group by using a chromosome grouping coding mode based on the host cluster;

4.2 the selection operation of the genetic algorithm is to simulate the excellence and the disadvantage of organisms in the nature, and the selection operation can select individuals with higher adaptability and ensure that better genes can be smoothly inherited to the next generation;

selecting individuals containing excellent genes in the population through a fitness function set by the selection operation, and taking the energy consumption of a data center and the placement availability of a virtual machine as optimization targets;

Wherein the individual comprises n genes (host nodes);

4.2.2 when H >1, the availability for that H individuals (placement group) needs to be handled in a way that protected placement is calculated;

4.2.2.1 for arbitrary virtual machines vm_iE.g. VM, each of which is placed a group p_i(i is more than or equal to 1 and less than or equal to H) are placed on H different nodes, namely for H individuals, all virtual machines on each gene are different, and in this case, the availability of the H placement groups is processed according to a calculation mode of complete protection placement;

4.2.2.2 virtual machine vm if present_iE.g. VM, two or more individuals (placement group), virtual machine VM_iPlaced on the same gene, in which case the availability of the H placement groups is handled in a computational manner of partial protection placement;

energy consumption for a single individual; when H ═ 1, availability is calculated in a single placement manner; when H is present>1, the availability calculations are divided into full protection placement and partial protection placement, where x is the number H of individual clusters. For an individual or an individual group x, if the energy consumption of the data center is lower and the availability is higher, the value of the fitness function is larger, and the selected probability of the good genes being inherited to the next generation is higher;

4.3 crossover operation in genetic algorithm is to simulate the process of mating between individuals in the biological world, and achieve recombination of genes between parents through mating, so that offspring can obtain new chromosomes containing excellent genes to generate more excellent offspring, which is specifically described with reference to fig. 3;

4.3.1 first pick two parents to be mated, named X, Y in fig. 3, randomly pick a certain node cluster containing one or more genes in the X parent as a part to be crossed, insert the node cluster, i.e. all genes in it, into the Y parent cross point, and at this time, generate a new offspring containing X, Y parent genes;

4.3.3 if the same two virtual machines exist on different host nodes, temporarily removing the host nodes which previously contain the same virtual machines from the chromosome codes, wherein virtual machines with repeated numbers appear on B, E and G in the figure 3, and removing the host machines B and G;

4.3.4, are temporarily dropped from the host node, may contain virtual machine nodes that are not deployed by other hosts,

in FIG. 3, the No. 1 virtual needs to be recoded into the host node, and the gene A is selected according to the constraint condition to complete the distribution;

4.3.5 if all genes do not meet the requirements, regenerating a new gene fragment according to the VHAM-R model. Interchanging two parent individuals, and repeating the intersection process to generate a second child individual;

4.4 mutation operation in the genetic algorithm is also obtained by simulating the evolution process of nature, the mutation is an important link in the genetic algorithm, and genes (including high-quality genes and poor-quality genes) which never appear in the parent individuals can be obtained through mutation with small probability, and the detailed description of the mutation operation is combined with the figure 4;

4.4.1 determining the individual chromosome gene to be mutated by the mutation function, as shown in the following formula.

Wherein U is_j-pes、U_j-ramRespectively the CPU and the memory utilization rate of the host;

4.4.2 selection of f_c(j) Deleting smaller genes to ensure that each time of deletion is poorer genes with lower utilization rate, wherein the variation function of the gene C in the figure 4 is smaller, and the gene C is removed;

4.4.3 the virtual machine No. 3 and the virtual machine No. 8 on the gene C in the figure 4 are coded again by a 4.3 cross operation method, the virtual machine No. 3 is inserted into the gene A, and the virtual machine No. 8 is inserted into the gene G.

Claims

1. A virtual machine placement genetic optimization method based on a VHAM-R model in a cloud computing environment is characterized by comprising the following steps:

1.1 define the placement environment, and the data center has a physical host set PM ═ PM₁，pm₂，...，pm_nN, the number of hosts is n, and a virtual machine set VM ═ VM needs to be placed₁，vm₂，...，vm_mAnd where the number of virtual machines is m, and assuming that the number of virtual machines m is greater than or equal to the number of hosts n, defining a set of virtual machine placement groups P ═ { P ═ P₁，p₂，...，p_hH is the number of the placing groups;

1.2 defining resource states, vm for a given virtual machine_iDefinition of

For virtual machines vm_iThe required resources of the CPU are used,

Is a main engine pm_jThe current resources of the CPU are free from the CPU,

U_j＝αU_j-pes+βU_j-ram

alpha is more than 0 and less than 1, beta is more than 0 and less than 1, and alpha + p is 1;

definition of Tag_ijAt the current time t, the host pm_jSatisfying virtual machine vm_iResource requirements of, i.e.

1.5 defining virtual machine Placement, VM set by Placement group p_kE.g. P, selecting a host in the corresponding physical host set to complete placement mapping, and defining a virtual machine placement matrix M by meeting various constraint conditions in the placement process_k[i][j]If M is present_k[i][j]1 denotes the placement group p_kPlace virtual machine j on physical host i, otherwise, if M is_k[i][j]0 denotes the placement group p_kIn, virtual machine j is not placed on physical host i;

2.1.1 placing constraints, arbitrary virtual machine vm_iUnder the same placing group, the server node can be placed on only one server node;

the constraints represent:

for the

In which a group p is placed_k∈P；

2.1.2 resource constraints, for any server nodeIn other words, the consumption of each resource type should not exceed an upper limit, defining a server pm_jRespectively, of CPU and memory capacity

And

represents;

the constraints represent:

for the

Is provided with

The parameter r is a constant coefficient, a part of resources are required to be reserved by the server node to ensure the normal operation of the server node, and r is less than or equal to 1;

2.2.1 usability optimization

The node number of the n virtual machines is H at most;

2.2.1.1 Single Placement

Single placement means that each virtual machine is placed on only one server node, i.e. H ═ 1, in the case of single placement, if the availability of n server nodes is a respectively₁，A₂，...，A_nK virtual machines (n ≦ k) are placed on the n nodes, then the availability of the virtual machine placement scheme may be used as A_pExpressed, defined as follows:

2.2.1.2 complete protective Placement

Full protection placement refers to placement of arbitrary virtual machines

Are all placed in group p_i(i is more than or equal to 1 and less than or equal to H) are placed on H different nodes; therefore, it can be considered that a full protection placement solution P may be composed of H single placement solutions, and within each single placement solution, placement, resource, and communication reachability constraints should be satisfied between pairs of virtual machines;

2.2.1.3 partial protection Placement

Partial protection placement means presence of virtual machine vm_iE.g. VM, placed on less than H different nodes, i.e. two or more placement groups will virtual machine VM_iPlaced on the same node and having a certain virtual machine vm_jE.g. VM, so that H is greater than 1, and under the condition of placement of partial protection, if a virtual machine is placed on less than H nodes, the virtual machine can be considered to be placed by a plurality of placement groups together; the availability of the server node cannot be directly calculated by the formula in 2.2.1.2 because the availability of the server node where the shared virtual machine is placed is calculated twice; to handle this type of placement, redefine the operator-; suppose there are n nodes pm₁，pm₂，...，pm_nTheir availability is A₁，A₂，...，A_n(ii) a For availability of A_xNode pm_xThe following definition is given for the operator · s:

then according to the formula in the above, define

For the operation between different sets, the availability of protection placement is calculated by the following formula:

2.2.2 energy consumption optimization

Expressed as:

the third step: creating a model

Based on the constraint conditions and the optimization target of virtual machine placement given in the second step, a virtual hierarchical structure model VHAM-R based on a Rendervous hash algorithm is established for optimizing and deciding the selection process of the virtual machine to the host, and the steps are as follows:

3.2 when the number n of hosts is less than 4, i.e. the number 2 x 2 of the minimum bi-level virtual structure cannot be constructed, defining that the host has a set W of assigned weight scores corresponding to each host_i＝{w_i1，w_i2，...，w_ikWhere k.gtoreq.n defines w_ijFor virtual machines vm_iAt host pm_jA weight score of (1), w_ij＝h(vm_i，pm_j) Wherein the function h (vm, pm) is a hash function agreed in the Renderkvous hash algorithm, and then passes through h (vm_i，pm_j) Virtual machine vm_iIs assigned to a weight w_ijMaximum host pm_jIf the host pm_kIs h times that of other hosts, pm_kThe equal share is divided into h shares, and the probability of the virtual machine distributed to the host is h times of that of other hosts;

3.3 the number n of the hosts is more than or equal to 4, the cluster of the hosts is divided, firstly, a constant z is selected, namely the number of the hosts in each cluster is z, and the set of the hosts is c-ceiling (n/z), wherein the ceiling function represents dividing n by zRounding up the value to the nearest integer, C₀＝{cpm₁，cpm₂，...，cpm_z}，C₁＝{cpm_z+1，cpm_z+2，...，cpm_2zUntil each host belongs to a cluster, each cluster is the lowest node in the virtual hierarchical structure;

4.1 for solving the process of placing the virtual machine in the server node, a grouping coding mode based on the host node cluster is provided, P represents an individual for the placing group, C_iRepresents a host cluster C_iCorresponding to chromosomes, and corresponding to genes on the host computer on each host computer cluster;

4.2, selecting individuals containing good genes in the population through a fitness function set by the selection operation, and taking the energy consumption of the data center and the placement availability of the virtual machines as optimization targets; the process of the selection operation is as follows:

Wherein the individual comprises n genes;

energy consumption for a single individual; when H ═ 1, availability is calculated in a single placement manner; when H >1, the availability calculations are divided into full protection placements and partial protection placements, where x is the number H of individual groups; for an individual or an individual group x, if the energy consumption of the data center is lower and the availability is higher, the value of the fitness function is larger, and the selected probability of the good genes being inherited to the next generation is higher;

4.3 crossover operation in genetic algorithm is to simulate the mating process between individuals in the biological world, and achieve gene recombination between parents through mating, so that the offspring obtains a new chromosome containing an excellent gene and generates more excellent offspring; the process of the crossover operation is as follows:

4.3.4, the host nodes are temporarily eliminated, and virtual machine nodes which are not deployed by other hosts are possibly included, for the situation, the virtual machines need to be recoded into the host nodes, and genes which meet the constraint conditions and have the lowest energy consumption and the highest availability in the chromosome are selected to be distributed;

4.3.5 if all genes do not meet the requirements, regenerating a new gene segment according to the VHAM-R model, interchanging two parent individuals, and repeating the crossing process to generate a second filial generation individual;

4.4 mutation operation in genetic algorithm, the process of mutation operation is:

is a set parameter;

2. The method for genetic optimization of placement of virtual machines based on VHAM-R model in cloud computing environment according to claim 1, wherein in said step 3.3, the process of clustering the hosts is:

3.3.1 determining virtual leaf node sectors and the depth of a virtual hierarchical structure, selecting the leaf number f of each sub-node sector in the virtual hierarchical structure, wherein f is an integer, and selecting proper f and z can enable the obtained algorithm benefit and load balance degree to be approximate to the expectation; according to the leaf number f of the node sector and the number c of the host clusters, the depth d of the virtual hierarchical structure can be obtained:

f^d≥c

3.3.2 numbering each virtual leaf node sector, wherein each sector is uniformly numbered by adopting a natural number from 0, 1, 2.

3.3.3 for a certain virtual machine vm_iFor any virtual node s, there is a corresponding weight w_is＝h(vm_i，s)，h(vm_iS) contains a hash function agreed upon, which passes through h (vm) at each level of the virtual hierarchy_iS) calculating the weight of each virtual node, selecting the node with the highest score to continue layering downwards until the real host node cluster Cx to the bottom layer is selected;

3.3.4 virtual machine vm_iSelecting a real host node cluster C_xLater, when real node selection is carried out, the fact that any real node cluster C exists is assumed_xHost node cpm of (1)_xz+jAll have a corresponding weight score W_i(xz+j)＝H(vm_i，cpm_xz+j)*Tag_i(xz+j)If Tag is false, the value is 0, and if true, the value is 1; wherein the virtual machine vm is used_iCpm assigned to host_xz+jThen, H (vm)_i，cpm_xz+j) For the same period of T, E_oldAnd assigning virtual machines vm_iReal host node cluster C_xRatio of total power consumption of to the host pm_xz+jResource utilization rate U_xz+jProduct of the difference with 1 and the corresponding weight constant and sum of host availability and coefficient product:

wherein E_xz+jFor the cpm of the host in the T time period_xz+jEnergy consumption of E_oldMeans that when no new virtual machine is allocated in the same T time period, the real host node cluster C_xEnergy consumption of (2); α, β, γ are weights representing the three; a. the_xz+jCpm for the host_xz+jAvailability of (2);