CN112306641B

CN112306641B - Training method for virtual machine migration model

Info

Publication number: CN112306641B
Application number: CN202011293834.8A
Authority: CN
Inventors: 余显; 李振宇; 孙胜; 张广兴; 刁祖龙; 谢高岗
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2023-07-21
Anticipated expiration: 2040-11-18
Also published as: CN112306641A

Abstract

The invention provides a training method for a virtual machine migration model, which comprises the following steps: s1, constructing an initial migration model, and randomly initializing model parameters; s2, acquiring initial environment states, and tensorizing and representing the initial environment states corresponding to each virtual machine; s3, performing round training on the initialized migration model by taking all tensor representations corresponding to the initial environment state as starting points until a preset training round is reached; s4, dynamically acquiring virtual machine environment state information of the data center, and carrying out online training on the migration model after round training until the migration model converges. The invention can abstract the multi-objective virtual machine dynamic migration optimization problem into a training and reasoning process of a reinforcement learning model, an optimized virtual machine dynamic migration model can be obtained through training, the long-time low-energy consumption and high-service quality targets of the data center are realized, and the obtained migration model can be flexibly applied to various general cloud data center virtual machine management systems and business environments.

Description

Training method for virtual machine migration model

Technical Field

The invention relates to the field of cloud computing, in particular to the field of virtual machine integration in a cloud data center scene, and more particularly relates to a training method of a virtual machine migration model based on reinforcement learning, a virtual machine migration method and the like.

Background

Virtual machine integration is a very popular virtual machine resource management method, which can obtain more idle hosts by sensing load running conditions of host nodes and virtual machines in real time and adopting a virtual machine migration technology, and then can effectively reduce the overall energy consumption by closing the hosts.

For convenience of research, most research works currently split the virtual machine dynamic resource integration problem into four different sub-problems to perform research respectively: host overload detection, host underload detection, virtual machine selection, virtual machine reassignment (or virtual machine migration).

1. Host overload detection refers to determining when a host is considered to be overloaded, for example: a fixed threshold may be defined and the host is considered overloaded when its total resource occupancy exceeds that threshold (e.g., 100%). However, unreasonable thresholds or static thresholds may result in performance loss due to dynamic changes in virtual machine load.

2. Host underload detection and host overload detection are exactly opposite, i.e., determine when a host is considered underloaded. The purpose of the underrun detection is to determine a low load host as the object of shutdown.

3. Virtual machine selection refers to determining which virtual machines to select on an overloaded host to migrate, thereby reducing the impact of host resource starvation on quality of service. For energy saving purposes, the underloaded host needs to be shut down, so all virtual machines on the host that are determined to be underloaded will be migrated.

4. The virtual machine reassignment is to reselect the host position for the virtual machine to be migrated, and the unreasonable migration mechanism is not only unfavorable for energy saving, but also can further generate huge migration overhead and cause service quality problems.

The key to virtual machine integration is how to efficiently guarantee user quality of service (Quality of Service, qoS). QoS may be further expressed as a service level agreement (Service Level Agreement, SLA). The protocol broadly describes features such as maximum throughput, minimum response time, system processing delay, failure time/frequency, etc. Therefore, the main purpose of virtual machine resource integration is to ensure the SLA and simultaneously reduce the energy consumption of the system as much as possible. To achieve this goal, most of the existing work is to focus on a specific sub-problem (such as host overload detection or virtual machine reassignment) in the dynamic resource integration process of the virtual machine. The method only considers a single sub-problem, and not only can not consider the mutual influence among all the sub-problems, but also can not realize optimal virtual machine resource integration; moreover, for cloud system management personnel, very specialized knowledge is required to configure an optimal strategy for each sub-problem separately, and the configured strategy may change at any time due to the change of service scenes, thereby generating huge costs such as manpower, time and the like. Therefore, a virtual machine resource integration method with low energy consumption, high service quality and strong self-adaption capability is still lacking at present.

The existing virtual machine allocation method mainly comprises static allocation and dynamic allocation. The virtual machine static allocation method is mainly used for solving the problem of first deployment of the virtual machine, does not consider the load condition of the virtual machine during running, and cannot be directly applied to the dynamic management process of the virtual machine. However, in the existing dynamic allocation method of the virtual machine, in order to facilitate the splitting of the problem into four sub-problems of host overload detection, host underload detection, virtual machine selection and virtual machine redistribution for research, any single solution cannot achieve the objective of optimizing energy efficiency or service quality, and is difficult to flexibly apply to different business scenarios, which clearly increases management and configuration difficulties for administrators. Although some work is done to study the dynamic management problem of the virtual machine from a theoretical point of view, these methods are either too complex to solve the problem, and the actual availability is poor; or a plurality of greedy or approximate algorithms are adopted for solving, and the energy consumption and service quality long-term optimization target is difficult to achieve.

In summary, the existing virtual machine dynamic allocation method is difficult to realize the long-term optimization of energy consumption and service quality targets, is limited to specific application scenes, has poor self-adaptive capacity and high configuration management cost.

Disclosure of Invention

It is therefore an object of the present invention to overcome the above-mentioned drawbacks of the prior art and to provide a virtual machine migration solution with a high self-adaptation capability and a low configuration management cost.

According to a first aspect of the present invention, there is provided a training method for a virtual machine migration model, the training method comprising the steps of:

s1, constructing an initial migration model and randomly initializing model parameters, wherein the initial migration model is a neural network.

S2, acquiring initial environment states, and tensorizing the initial environment states corresponding to each virtual machine, wherein the environment states of each tensorizing representation comprise virtual machine information, host position distribution information of the virtual machine and resource occupation conditions of the virtual machine in the host.

S3, performing round training on the initialized migration model by taking all tensor representations corresponding to the initial environment state as starting points until a preset training round is achieved, wherein the preset round number is at least 500, each round training comprises performing iterative training on the migration model for a plurality of times until the migration model converges or reaches a preset iteration number, and each iterative training comprises: s31, taking all tensor representations corresponding to the environmental state at the current moment as the input of a model to obtain migration actions required by all virtual machines; s32, searching for a new action with the current searching probability to replace the migration action of the virtual machine obtained in step S31, preferably, searching for the new action is performed in one of the following ways: the first search, randomly generating new migration actions; the second exploration is that fine adjustment is carried out on the basis of the action generated by the migration model according to the preset distribution probability to obtain a new migration action; s33, obtaining the environment state of the next moment after the migration action is executed based on the migration action of the virtual machine at the current moment; s34, calculating a comprehensive rewarding value of the migration action executed at the current moment according to the environmental states at the current moment and the next moment; and S35, updating parameters of the migration model and exploring the probability based on the comprehensive reward value of the migration model so as to perform the next iteration training.

S4, dynamically acquiring virtual machine environment state information of the data center, and carrying out online training on the migration model after round training until the migration model converges.

Preferably, the initial environmental state acquired in the step S2 is one of an initial environmental state of manual simulation and a real-time online initial environmental state. In some embodiments of the present invention, the initial environmental state of the manual simulation is obtained by: s21, randomly initializing the resource sizes of all resource dimensions of each host in the current environment; s22, randomly initializing the load sizes of all resource dimensions of each virtual machine; s23, randomly distributing the virtual machine to a host, wherein the resources required by the virtual machine do not exceed the upper limit of the host resources where the virtual machine is located.

Preferably, the integrated reward values of the migration model include at least a weighted sum of the energy consumption reward value, the overload reward value, and the migration reward value, wherein:

the energy consumption prize value is calculated as follows:

overload prize values are calculated as follows:

the migration prize value is calculated as follows:

the comprehensive prize value is calculated as follows:

r _t (s _t ,a _t )＝α·r _power (s _t ,a _t )+β·r _overload (s _t ,a _t )+γ·r _migrs (s _t ,a _t )

wherein idle_rate, over_rate, and migrs represent the active host count duty cycle, the overloaded host count duty cycle, and the virtual machine migration count, respectively, α, β, γ are weights of the energy consumption rewards value, the overloaded rewards value, and the migration rewards value, respectively, and α+β+γ=1.

In each round of training, ending the training of the current round when the preset iteration times or the convergence of the migration model are reached;

wherein the preset iteration number is set to 200 or more;

the migration model convergence means that the cumulative average value of the comprehensive prize value of the migration model after a certain training is not increased or the ratio of the cumulative average value to the cumulative average value is lower than that before the training is less than 0.1%.

In the step S35, the search probability is updated as follows:

ε _t+1 ＝Δ·ε _t

wherein ε _t Represents the search probability, ε, at time t _t+1 The updated search probability at time t+1 is represented by delta being a decay factor, and preferably, the search probability is updated to a preset minimum search probability and is not updated.

Preferably, the step S4 includes the steps of sampling the environment state information of the virtual machine of the data center at regular intervals and executing the following steps during each sampling: s41, obtaining migration actions of the virtual machine according to the environment state information of the virtual machine obtained by current sampling by using a migration model; s42, performing hot migration on the virtual machines in the data center according to the migration action obtained in the step S41; s43, counting the whole energy consumption change value of the current data center host and the service quality of the service in the virtual machine before the next sampling; s44, calculating a comprehensive rewarding value corresponding to the thermal migration according to the integral energy consumption change value of the host and the service quality of the service in the virtual machine, and updating parameters of the migration model according to the comprehensive rewarding value.

According to a second aspect of the present invention, there is provided a data center virtual machine dynamic migration method, the migration method comprising: q1, collecting environment state information of a virtual machine of a data center; q2, calculating migration actions of the virtual machine by adopting a migration model trained by the method according to the first aspect of the invention; and Q3, executing the migration action corresponding to the virtual machine calculated in the step Q2 on each virtual machine.

According to a third aspect of the present invention, there is provided a data center comprising a host, a virtual machine, and a controller, wherein the controller comprises a virtual machine migration model trained by the method according to the first aspect of the present invention.

Compared with the prior art, the invention has the advantages that: the invention can abstract the multi-objective virtual machine dynamic migration optimization problem into a training and reasoning process of a reinforcement learning model, an optimized virtual machine dynamic migration model can be obtained through training, the long-time low-energy consumption and high-service quality targets of the data center are realized, and the obtained migration model can be flexibly applied to various general cloud data center virtual machine management systems and business environments.

Drawings

Embodiments of the invention are further described below with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of a virtual machine migration model training process framework according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an offline training process of a virtual machine migration model according to an embodiment of the present invention;

FIG. 3 is a schematic representation of tensor representation during virtual machine migration model training according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a round training process of a virtual machine migration model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an online training process of a virtual machine migration model according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a data center framework according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by means of specific examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The invention aims at providing a self-adaptive virtual machine integration method and a self-adaptive virtual machine integration system based on reinforcement learning to dynamically manage virtual machine resources, aiming at improving service quality, reducing energy consumption, enhancing flexibility and self-adaptive capacity of a virtual machine management method and saving management configuration cost of an administrator.

Among them, so-called reinforcement learning (Reinforcement Learning, RL) is a very special machine learning method, and has been widely used in various types of resource scheduling, service optimization, energy consumption management and other problems. In the reinforcement learning method, an agent perceives an environment and selects one action in each state, and after each action is performed, the agent receives feedback given by the environment after the action is applied. The ultimate goal of the agent is to learn a strategy to select the best operation among all possible operations. Reinforcement learning benefits from excellent automated learning and expressive power, and can deal very well with some very complex control decision problems. While some work has applied reinforcement learning to host overload detection, virtual machine selection, and selection of host state power saving states, they either do not consider quality of service or simply study a single sub-problem in the virtual machine resource integration process, failing to achieve the goals of low energy consumption, high quality of service, and high flexibility simultaneously. Therefore, the invention mainly aims at the problem of virtual machine dynamic management and provides an end-to-end virtual machine resource integration method based on reinforcement learning. In the method, the intelligent agent can directly determine which virtual machines are migrated without concern about the specific details of each sub-problem in the virtual machine resource integration problem. In the process of interaction with the cloud system, the intelligent agent can learn the influence on the system performance (energy consumption, SLA and the like) according to each migration action, so that the optimal virtual machine resource integration can be realized under the condition of dynamic change of the virtual machine load. In addition, the method can be combined with the service characteristic acceleration training of the application scene, the optimal virtual machine dynamic migration scheme is automatically obtained according to the service scene or the requirement change, and the intermediate process of virtual machine resource integration is not needed to be considered. Thus, it has higher application flexibility.

In the research process of dynamic management of virtual machine resources, the inventor discovers that many existing works simplify the research on the dynamic management problem of the virtual machine by analyzing a plurality of different dynamic allocation methods of the virtual machine, and abstracts the dynamic management problem into four different sub-problems of host overload detection, host underload detection, virtual machine selection and virtual machine redistribution. And the strategies corresponding to the sub-problems sequentially execute dynamic management of the virtual machine resources in the actual environment. The inventor analyzes and discovers that the interaction relation among the strategies of the sub-problems is very strong, and the effect of each strategy of the next time node can be influenced by the result after the final virtual machine is redistributed. However, the existing method does not take this point into consideration, so that it is difficult to achieve the optimal target, and the research method in the form of this sub-problem also increases the management burden of the administrator, and reduces the flexibility of dynamic management of the virtual machine. In addition, the inventor also analyzes some virtual machine dynamic allocation methods based on theoretical models such as shaping planning, convex optimization, markov decision process and the like, but because a cloud data center of an actual virtual machine application is very complex, the theoretical models are difficult to directly solve and determine migration positions of the virtual machine, and therefore, most of the works adopt some approximate algorithms to approach an optimization target on the basis of the theoretical models.

Considering that the reinforcement learning model is a typical end-to-end model, the problem of dynamic allocation of the virtual machine is solved based on reinforcement learning without concern about each sub-link in the virtual machine integration process, so that the configuration management difficulty of an administrator is greatly reduced; the action-feedback mechanism in reinforcement learning can well solve the problem that a dynamic allocation strategy cannot perceive a migration decision result; the optimization target can be ensured to be realized as far as possible by matching with an utilization-exploration mechanism in the training process of the reinforcement learning model; in addition, the reinforcement learning model can automatically complete interaction and training processes of the intelligent agent and the environment, and the self-adaption capability or flexibility of the virtual machine dynamic allocation method is further enhanced.

Therefore, the invention mainly considers how to apply the reinforcement learning model (particularly the reinforcement learning model based on deterministic strategy gradient) to the dynamic allocation problem of the virtual machine, namely how to solve the dynamic migration problem of the virtual machine through the reinforcement learning model, and aims to improve the energy efficiency and the service quality of the data center, reduce the management cost and optimize the energy consumption and the service quality of the data center.

According to an embodiment of the present invention, the problem of dynamic migration of a virtual machine is solved by reinforcement learning technology, as shown in fig. 1, which is mainly divided into three links of offline training, online training and online use of a dynamic migration model (hereinafter, referred to as migration model for short) of the virtual machine. Firstly, obtaining an initialized virtual machine dynamic migration model through off-line training by simulating the load states of a data center host machine and a virtual machine, and migration actions and migration rewards of the virtual machine; then putting the model into a production environment of an actual data center, and performing reinforcement learning on the original migration model by utilizing on-line host computer and virtual machine load data and real energy consumption and service quality data so as to reinforce the adaptability of the model to dynamic migration of the virtual machine in the actual environment of the data center; when the training of the online model reaches a convergence state, the model is put into use. Each specific link will be described in detail below.

The first link, the offline training process of the migration model:

in order to shorten the time difference from initialization to actual use of the migration model and reduce the influence on the online environment, the invention firstly trains by an offline simulation training technology to obtain an initialized migration model, as shown in fig. 2, and the offline training process comprises the following steps:

step 11: the network structure for constructing the migration model is initialized by using a neural network (the neural network can be a convolutional neural network, a cyclic neural network, a fully-connected neural network and the like, and the specific network structure is not limited), and the parameter value of the network structure is initialized (the initialization method can adopt random initialization, normal distribution initialization and the like).

Step 12: the host state is simulated and constructed, and the resource sizes of all resource dimensions of the host can be randomly initialized (for example, the CPU resource capacity of the host can be simulated and initialized to be 10GHz, and the memory capacity of the host can be simulated and initialized to be 32 GB).

Step 13: the virtual machine state is simulated and constructed, and the load size of all resource dimensions of the virtual machine can be initialized randomly (for example, the CPU load of the virtual machine can be simulated and initialized to be 1.2GHz, and the memory load size of the initialized host is 2.5 GB).

Step 14: the host position distribution of the virtual machine is simulated, the host position mapping of the virtual machine can be randomly established, namely the virtual machine is randomly distributed to any host, and the resources required by all the virtual machines on the host are ensured not to exceed the upper limit of the host resources.

Step 15: tensor environmental state representation, namely tensor representation is performed on an initial environmental state corresponding to each virtual machine, wherein the environmental state of each tensor representation comprises virtual machine information, host position distribution information of the virtual machine and a certain resource occupation condition of the virtual machine on the host, in this embodiment, three-dimensional Zhang Lianghua is taken as an example to represent one resource state of one virtual machine on the host, and multiple resource states correspond to multiple tensor representations, but in practical application, the tensor representation is not limited to three-dimensional, and any needed multidimensional can be used for representing all resource states of one virtual machine on the host. Taking three-dimensional Zhang Lianghua as an example, firstly, calculating the percentage of host resources occupied by all virtual machines, wherein the percentage represents the load state of the virtual machine on the current host (for example, the CPU consumption of the virtual machine a on the host 1 is 50%, that is, the CPU resources required by the virtual machine are 50% of the CPU capacity of the host); then, the current environment state is constructed through the virtual machine load state and the corresponding host allocation position, as shown in fig. 3, the three-dimensional Zhang Lianghua representation structure of the environment state is shown, wherein X, Y, Z axes correspond to the virtual machine number, the resource dimension number and the host number, respectively, and any point (e.g. S in the figure _p,v,d ) Meaning of representation is the size of the resources occupied by the virtual machine v on the host p on the d-th dimension resource. Thus, the environmental state representation means that the host location distribution and the corresponding resource occupation situation of all virtual machines in the data center are located, one resource state of one virtual machine corresponds to one three-dimensional tensor representation, and different resource states correspond to a plurality of Zhang Lianghua representations, for example, the environmental state of one virtual machine v includes three resources (d=1, 2, 3) on the host p, which are respectively a first-dimensional resource CPU, a second-dimensional resource network, a third-dimensional resource memory, and a virtual machine, and then the environmental state tensor representation of the virtual machine v includes S _p,v,1 、S _p,v,2 、S _p,v,3 And respectively representing the occupation conditions of CPU, network and memory resources of the virtual machine v on the host p.

Step 16: and taking the initialized environment state obtained by the simulation structure as initial input to enter a round-trip training process, and starting offline training of the migration model.

Step 17: ending the current round of training.

Step 18: checking whether the current training round number exceeds a specified number of times, if so, ending the offline training process and outputting a migration model; otherwise, step 19 is entered. It should be noted that the predetermined number of rounds of training is generally set by experimental experience, for example, the predetermined number of rounds may be set to 500 or more.

Step 19: the new round of training continues.

It should be noted that the initialized environmental status in the steps 12-14 may be actually collected environmental status data of the data center.

The round-robin training process simulates the change condition of the environmental state under different virtual machine migration actions by inputting the initialized environmental state to train to obtain the optimal migration model, and according to one embodiment of the present invention, as shown in fig. 4, the round-robin training process includes the following steps:

step 161: the round training starts.

Step 162: the current environmental state at time t is entered and is represented by the tensor environmental state in fig. 3.

Step 163: and calculating the migration action executed by the state, namely taking the environmental state at the current t moment as the input of a migration model, and calculating the dynamic migration action required by all virtual machines by the migration model according to the network structure of the migration model, namely determining which new host each virtual machine is to be migrated to.

Step 164: the new action is searched for with a certain probability α, which is the current search probability, so that the migration action to be finally executed is determined by the action calculated in step 163 together with the search action. According to an embodiment of the present invention, the manner of motion exploration is not limited to randomly generating new migration motion to directly replace the motion obtained by model calculation, and may also perform certain fine tuning based on the motion generated by model calculation according to a certain distribution probability, for example: the result of model calculation shows that the virtual machine a needs to be migrated to the host 5, but the last migration position of the virtual machine can be enabled to be fluctuated by the numbering host near the host 5 according to a certain normal distribution probability through action exploration.

Step 165: the load size of the virtual machine at the new time t+1 of the virtual machine is simulated, and according to one embodiment of the invention, the load size of the virtual machine can be randomly generated or estimated through the historical load size distribution of the virtual machine. For example, the CPU utilization rate of the virtual machine is generated according to the uniform distribution of 0-1; or the CPU utilization rate of the virtual machine is obtained by analyzing the historical CPU utilization rate of the virtual machine and basically obeys a normal distribution, and then the CPU utilization rate of the virtual machine is randomly generated according to the normal distribution.

Step 166: the new environmental state is calculated, i.e. a tensor representation corresponding to the environmental state at the new moment, i.e. at time t +1, is generated, the method being the same as in step 15 of fig. 2.

Step 167: and calculating the comprehensive rewarding value of the action according to the environmental state at the time t+1, the environmental state at the time t and the migration action which is simulated and executed at the time t. The underlying principle of prize value given is: if the action performed causes a reduced number of hosts or overall power consumption, then the given prize value is positive and vice versa; if the migration action performed results in an increase in quality of service, then the given prize value is positive, otherwise a negative prize. Specifically, the aggregate prize value for the action includes three components, namely an energy prize value, an overload prize value, and a migration prize value, and according to one embodiment of the present invention, the following are calculated:

prize value for energy consumption

Overload prize value

3) Migration reward value

Where idle_rate, over_rate, and migrs represent the active host count duty cycle, overloaded host count duty cycle, and virtual machine migration number, respectively. The final composite prize value may be obtained by, but is not limited to, weighted prizes for various different objectives (such as energy consumption, quality of service-including overload conditions and virtual machine migration conditions, and other non-listed performance and quality of service objectives, etc.) as listed in the present invention, and then from the three classes of prize values defined above, the current action final prize value may be calculated as:

α+β+γ＝1

wherein α, β, γ are the weights of the energy consumption prize value, the overload prize value, and the migration prize value, respectively.

Step 168: checking whether the migration model is converged, if so, exiting the whole offline training process, and outputting the trained migration model; otherwise, step 169 is entered. The basis of convergence is that when the accumulated average comprehensive reward value basically keeps unchanged or the variation amplitude is smaller than a preset threshold, that is, the variation amplitude of the average value of the action reward value obtained by calculation of accumulated training each time is smaller than a specified threshold (for example, the value is 1, the specific value is required to be determined according to the magnitude of the reward value of actual training, or the variation amplitude is required to be smaller than 0.1 percent, for example), the model is considered to be converged, the accumulated average reward value is not promoted any more, and no continuous training is necessary.

Step 169: checking whether the number of iterations in the current round is exceeded or not, according to one embodiment of the invention, the number of iterations in each round is at least 200, and if the number of iterations in each round is exceeded, ending the current training round and entering a new training round; otherwise, step 1610 is entered.

Step 1610: updating the network parameters of the model according to the action rewards.

Step 1611: the search probability is updated. According to one embodiment of the invention, the exploration probability is updated by: firstly, defining a exploration probability attenuation factor delta (delta is more than or equal to 0 and less than or equal to 1) at the beginning of round training, and assuming that the exploration probability is represented by epsilon, setting the updated exploration probability as delta epsilon. After each update, the value of the exploration probability is reduced until the value is less than or equal to a minimum exploration probability epsilon _min When the exploration probability is no longer updated, i.e. epsilon is maintained _min The size continues to participate in the training. In the process, the attenuation factor delta can also be dynamically changed within a specified range according to actual requirements.

Step 1612: and continuing the current round training to execute new iterative training.

The second link, the online training process of the migration model:

in order to further make up the gap between the offline simulation environment and the online environment and improve the reasoning accuracy of the virtual machine dynamic migration model in the actual application process, the migration model obtained through offline training is further put into the actual application environment for training. According to one embodiment of the invention, as shown in FIG. 5, a hierarchical architecture of centralized control of the basic host-virtual machines of a data center is illustrated. Wherein the bottom layer represents the physical infrastructure resources, i.e., physical hosts or physical servers; the virtual machine layer is arranged upwards, namely, virtual machines are sequentially hosted on different physical hosts through a virtualization technology; the uppermost layer is a controller layer, and the controller layer is responsible for supervising all virtual machines and host states (including resource capacity, actual use resource state, load size and the like), uniformly collecting the data in corresponding databases, and dynamically determining the host position of the virtual machine according to a certain virtual machine migration model.

The virtual machine migration model obtained through offline training is responsible for providing a virtual machine dynamic migration scheme through deployment on a controller layer. According to one embodiment of the present invention, as shown in FIG. 6, the process of online training includes:

step 21: data center environmental state acquisition, the data center state is acquired through a controller at regular time (for example, the acquisition time interval is set to be 5 minutes), and the acquisition content comprises: host resource capacity, resource occupation information; dynamic load size of each resource dimension of the virtual machine; the host position of the virtual machine; all data are stored in a designated database in a unified way.

Step 22: and the virtual machine migration model calculates virtual machine migration actions according to the environmental state obtained by current sampling.

Step 23: and the controller calls a virtual machine migration engine at the bottom layer to execute the virtual machine hot migration according to a migration scheme generated by the migration model.

Step 24: after the execution of the virtual machine thermal migration is finished, the controller waits for the next sampling (if the sampling period is 5min, then the controller continues to wait for 5 min).

Step 25: the controller continuously counts the overall energy consumption change condition of the host computer and the service quality condition of the service in the virtual machine in the current waiting period interval;

step 26: when the next sampling time point arrives, the controller calculates the comprehensive rewarding value of the virtual machine migration action of the last sampling point according to the energy consumption change situation and the service quality change situation obtained through statistics, and the setting of the comprehensive rewarding value only needs to follow the principle of step 167.

Step 27: the controller updates the migration model according to the calculated comprehensive rewarding result and the corresponding state and action information, and judges whether the migration model is converged or not, and the judgment can be carried out according to the change condition of using the accumulated average rewarding value but not limited to.

Step 29: ending the online training process if the migration model has converged; otherwise, step 210 is entered.

Step 210: at this point, the next sampling point is already present, the sampling is continued to obtain the environmental state, and the process from step 21 to step 29 is looped until the model converges.

Third link, online use process of migration model:

step 31: the data center state is collected at regular time through the controller, and the collection content comprises: host resource capacity, resource occupation information; dynamic load size of each resource dimension of the virtual machine; the host position of the virtual machine; all data are stored in a designated database in a unified way.

Step 32: and the virtual machine migration model calculates virtual machine migration actions according to the environmental state obtained by current sampling.

Step 33: and the controller calls a virtual machine migration engine at the bottom layer to execute the virtual machine hot migration according to a migration scheme generated by the migration model.

Step 34: after the execution of the virtual machine thermomigration is finished, the controller waits for the next sampling.

The online use of the migration model, namely step 21-step 24 in the online training process, the controller periodically collects relevant states of the host and the virtual machine, so as to dynamically determine whether the virtual machine needs thermal migration.

As can be seen from the description of the foregoing embodiments, the present invention utilizes reinforcement learning technology to realize dynamic integration of virtual machines, and uses action exploration mechanism to cover all possible dynamic previous schemes of virtual machines as much as possible, and uses rewarding feedback mechanism to determine whether all virtual machine migration schemes are good or bad. By means of the learning mode of action exploration and rewarding feedback, how virtual machine dynamic migration should be carried out in any environment state can be obtained, and therefore the problem that service quality and energy efficiency of a data center are difficult to realize long-term optimization due to dynamic change of virtual machine load is effectively solved. Wherein the reinforcement learning technique is not limited to the reinforcement learning model specifically used. Firstly, obtaining an initialized virtual machine dynamic migration model through offline simulation of the load states of a data center host and a virtual machine and virtual machine migration action training; and then putting the model into a real data center environment for online enhancement training, thereby improving the accuracy of model reasoning. According to the invention, the environment state and the virtual machine migration action are respectively abstract modeled into tensor and vector expression, and the interrelation between the state and the action is learned through the neural network, so that the learning efficiency of the virtual machine dynamic migration model can be effectively improved. According to the method and the device, the distribution condition of the virtual machine load is obtained by utilizing the historical load information of the virtual machine, and the load of the virtual machine at the next moment is estimated by using the distribution, so that the dynamic migration model of the virtual machine can be ensured to be quickly and stably converged. Therefore, the invention can abstract the multi-objective virtual machine dynamic migration optimization problem into a training and reasoning process of a reinforcement learning model, an optimized virtual machine dynamic migration model can be obtained through training, the long-time low-energy consumption and high-service quality targets of the data center are realized, and the obtained migration model can be flexibly applied to various general cloud data center virtual machine management systems and business environments.

It should be noted that, although the steps are described above in a specific order, it is not meant to necessarily be performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order, as long as the required functions are achieved.

The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.

The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing.

The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A training method for a virtual machine migration model, the training method comprising the steps of:

s1, constructing an initial migration model and randomly initializing model parameters, wherein the initial migration model is a neural network;

s2, acquiring initial environment states, and tensorizing the initial environment states corresponding to each virtual machine, wherein the environment states of each tensorizing representation comprise virtual machine information, host position distribution information of the virtual machine and resource occupation conditions of the virtual machine in a host;

s3, performing round training on the initialized migration model by taking all tensor representations corresponding to the initial environment state as starting points until a preset training round is reached, wherein each round training comprises performing iterative training on the migration model for a plurality of times until the migration model converges or reaches a preset iteration number, and each iterative training comprises:

s31, taking all tensor representations corresponding to the environmental state at the current moment as the input of the model,

obtaining migration actions required to be carried out by all virtual machines;

s32, searching a new action by using the current searching probability to replace the migration action of the virtual machine obtained in the step S31;

s33, obtaining the environment state of the next moment after the migration action is executed based on the migration action of the virtual machine at the current moment;

s34, calculating a comprehensive rewarding value of the migration action executed at the current moment according to the environmental states at the current moment and the next moment; wherein the integrated rewards value of the migration model comprises a weighted sum of the energy consumption rewards value, the overload rewards value and the migration rewards value, wherein:

the energy consumption prize value is calculated as follows:

overload prize values are calculated as follows:

the migration prize value is calculated as follows:

the comprehensive prize value is calculated as follows:

wherein s is _t Representing data center virtual machine environment state information, a _t Indicating migration actions of the virtual machine, wherein idle_rate, over_rate and migrs respectively indicate active host number duty ratio, overload host number duty ratio and virtual machine migration number, alpha, beta and gamma are weights of an energy consumption reward value, an overload reward value and a migration reward value respectively, and alpha+beta+gamma=1; s35, updating parameters of the migration model and exploring probability based on the comprehensive reward value of the migration model so as to perform next iteration training;

2. The training method for virtual machine migration model according to claim 1, wherein the initial environmental state acquired in the step S2 is one of an initial environmental state simulated manually and a real-time online initial environmental state.

3. The training method for virtual machine migration model according to claim 2, wherein the initial environment state of the manual simulation is obtained by:

s21, randomly initializing the resource sizes of all resource dimensions of each host in the current environment;

s22, randomly initializing the load sizes of all resource dimensions of each virtual machine;

s23, randomly distributing the virtual machine to a host, wherein the resources required by the virtual machine do not exceed the upper limit of the host resources where the virtual machine is located.

4. A training method for a virtual machine migration model according to claim 3, wherein the step S32 performs the exploration of the new actions in one of the following ways:

the first search, randomly generating new migration actions;

and secondly, exploring, namely fine-tuning on the basis of the action generated by the migration model by using preset distribution probability to obtain a new migration action.

5. The training method for a migration model of a virtual machine according to claim 4, wherein in each round of training, the training of the current round is ended when a preset number of iterations or convergence of the migration model is reached;

wherein the preset iteration number is set to 200 times;

6. The training method for virtual machine migration model according to claim 5, wherein in the step S35, the exploration probability is updated by:

ε _t+1 ＝Δ·ε _t

wherein ε _t Represents the search probability, ε, at time t _t+1 The search probability at time t+1 after updating is shown, and Δ is a decay factor.

7. The method according to claim 6, wherein the search probability is updated to a predetermined minimum search probability and is not updated.

8. The method according to claim 1, wherein the step S4 includes the steps of sampling the data center virtual machine environment state information at predetermined sampling intervals and performing the following steps at each sampling:

s41, obtaining migration actions of the virtual machine according to the environment state information of the virtual machine obtained by current sampling by using a migration model;

s42, performing hot migration on the virtual machines in the data center according to the migration action obtained in the step S41;

s43, counting the whole energy consumption change value of the current data center host and the service quality of the service in the virtual machine before the next sampling;

s44, calculating a comprehensive rewarding value corresponding to the thermal migration according to the integral energy consumption change value of the host and the service quality of the service in the virtual machine, and updating parameters of the migration model according to the comprehensive rewarding value.

9. A data center virtual machine dynamic migration method, the migration method comprising:

q1, collecting environment state information of a virtual machine of a data center;

q2, calculating migration actions of the virtual machine by adopting a migration model trained by the method according to any one of claims 1-8;

and Q3, executing the migration action corresponding to the virtual machine calculated in the step Q2 on each virtual machine.

10. A data center comprising a host, a virtual machine, and a controller, wherein the controller comprises a virtual machine migration model trained using the method of any one of claims 1-8.

11. A computer readable storage medium having embodied thereon a computer program executable by a processor to perform the steps of the method of any of claims 1 to 9.

12. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to perform the steps of the method of any of claims 1-9.