CN113641462B

CN113641462B - Virtual network hierarchical distributed deployment method and system based on reinforcement learning

Info

Publication number: CN113641462B
Application number: CN202111195085.XA
Authority: CN
Inventors: 陈曦; 吴涛; 邓伟健; 黄�俊
Original assignee: Southwest Minzu University
Current assignee: Southwest Minzu University
Priority date: 2021-10-14
Filing date: 2021-10-14
Publication date: 2021-12-21
Anticipated expiration: 2041-10-14
Also published as: CN113641462A

Abstract

The invention discloses a virtual network hierarchical distributed deployment method and system based on reinforcement learning, which are oriented to network simulation based on Docker containerized virtual network, aim at optimizing block cutting and deployment problems of the virtual network, and design a reward mechanism giving consideration to both physical host machine resource consumption and virtual network block cutting cross-host machine communication performance loss through a reinforcement learning framework under the distributed environment of physical host machines with limited calculation, network and storage resources

According to

Continuously calculating the supply state of each physical host machine in different resources

Take different actions

Long term benefits of

Make the algorithm according to

Dynamically and autonomously continuously learning the optimized virtual network block size and the appropriate deployment time, and introducing certain dynamic randomness through a randomness strategy to avoid excessively rigid selection

The maximum action causes the problem of resource over consumption or block over fragmentation, thereby achieving the purpose of layering and distributed optimal deployment of the virtual network.

Description

Virtual network hierarchical distributed deployment method and system based on reinforcement learning

Technical Field

The invention relates to the technical field of network virtualization, in particular to a virtual network hierarchical distributed deployment method and system based on reinforcement learning.

Background

Network simulation is a key support for computer network architecture, protocol and algorithm research. Because the Docker container reserves a basically complete TCP/IP protocol stack, and has higher start efficiency and less performance overhead compared with a virtual machine, the containerization technology is gradually popular, and a new idea is provided for network simulation, namely, the Docker container is used as a core to construct virtual network elements (such as virtual routers, virtual switches, virtual end systems and the like), and technologies such as veth-pair, OVS (Open vSwitch), vxlan (virtual eXtensible Local Area network) and the like are used in cooperation to generate virtual links for connection, so that a virtual network is formed to be deployed on a physical host machine for simulation. The Docker container runs a TCP/IP protocol stack of a Linux kernel, is efficient and low in consumption, and has an open programming interface, so that network simulation based on the Docker container has the characteristics of high fidelity and easiness in programming. In a physical host with limited computing, network and storage resources, to implement large-scale deployment of containerized virtual networks based on technologies such as Docker and OVS, reasonable, automatic and efficient mapping and necessary block deployment need to be performed between the virtual networks and the computing clusters of the physical host, so that the resource demand of the virtual networks and the resource supply of the physical host are relatively balanced in a distributed scene, and the performance of network simulation is improved. Therefore, virtual network distributed optimization deployment is the key to network simulation based on the Docker technology.

The academic community has related research on related problems, namely, Virtual Network mapping (VNE), and the solution of the problem has relatively high complexity and even NP difficulty. Early learners solve the problem by a pure heuristic method, but local optimal solutions can be obtained by the pure heuristic method, and the problem of the local optimal solutions can be effectively solved by using meta-heuristic solving, for example, FAJJARI et al propose an expandable mapping strategy based on an ant colony meta-heuristic algorithm; also, for example, ARA Ú JO et al propose a hybrid algorithm incorporating meta-heuristics, and an online policy that takes into account the execution speed of the virtual network map to ensure minimal latency, providing a fast solution in a multi-domain environment. However, the mapping algorithm in the existing literature is mainly designed for application scenarios based on virtual machines, and mainly from the aspects of virtual machine resource allocation efficiency and mapping success rate. The virtual network is constructed by the Docker technology, and besides the consideration of the high efficiency of resource allocation and the success rate of mapping, the mapping algorithm needs to be designed and optimized by combining the technical characteristics of the Docker: (1) virtual network elements simulated by the Docker container are represented as low-overhead processes on a host machine, the granularity is finer, the time variation is more obvious, and therefore the mapping algorithm is required to have better dynamic property and adaptability and needs to be sensitive and agile to resource consumption and change; (2) docker is used as a lightweight virtualization technology, on one hand, the Docker is beneficial to constructing a virtual network with a larger scale, on the other hand, the Docker is expected to be deployed on a plurality of low-profile X86 host machines, and both Docker and Docker require full consideration of the resource limitation characteristics of the host machines and flexibly perform automatic optimization, block cutting and deployment on the virtual network; (3) after the virtual network is cut into blocks and is respectively deployed on a plurality of host machines, cross-host machine communication needs to be achieved through modes such as OVS + VxLAN and the like so as to transparently present a uniform virtual network to a user, therefore, the mapping algorithm needs to be cooperatively optimized by combining technical characteristics of virtual switches such as OVS + VxLAN tunnels and the like, and under the condition of considering host machine resource consumption, the number of virtual network cuts is reduced as much as possible so as to control performance loss caused by cross-host machine communication among the cuts.

Specifically, fig. 1 is a macro process for performing blocking, mapping, and deployment on a virtual network, which includes the following steps:

1. topology description of virtual networks: assuming that a user wants to deploy a virtual network as shown in the top of fig. 1 (the size of the virtual network may be large, and the embodiment of the present invention is illustrated for convenience, and only a topology of 2 end systems and 2 routers is drawn), the virtual network may be described in JSON file format.

2. And (3) slicing and mapping of the virtual network: (1) inputting an algorithm: the JSON file is used as the input of the algorithm, after the topological structure of the virtual network is read by the algorithm, whether the virtual network is cut into blocks or mapped is determined according to the conditions of the residual calculation, network and storage resources of the existing physical hosts (if a single host can accommodate the whole virtual network, the cutting and mapping are not needed). (2) And (3) outputting an algorithm: if dicing and mapping are required, several diced JSON files are generated, as shown in the "dice a" and "dice B" portions of fig. 1.

3. Deployment of the virtual network: and each physical host machine receives the block JSON file, and generates various virtual network elements by using techniques such as Docker and OVS according to the JSON description. This involves network virtualization, and mainly includes two aspects: node virtualization and link virtualization. Node virtualization: simulating equipment such as an end system, a router and the like by using a Docker container; the OVS technique was used to simulate a two-layer switching device. Link virtualization: and connecting various virtual network elements obtained by node virtualization by using a veth-pair technology.

4. Reconnection of virtual network: after the virtual network is subjected to blocking and mapping, different blocks are respectively deployed on different host machines, and the original topological structure of the virtual network is damaged on some links. Therefore, it is necessary to reconnect the original topology across hosts, mainly implemented by using OVS + VxLAN through tunneling technique, as shown by OVS + VxLAN between "cut-block a" and "cut-block B" in fig. 1. Considering that the OVS + VxLAN tunnel causes certain network performance loss, in order to enable the virtual network to have higher fidelity, the algorithm design enables the virtual network deployed at the same time to be compact as much as possible, and the number of physical host links crossing the bottom layer is reduced, namely the original virtual network is not easy to be cut into pieces.

In fact, summarizing, the key to the problem lies in how to optimally block and deploy the virtual network, which faces the following technical difficulties: (a) the block is too large, which easily consumes the resources of the deployed physical hosts, and the residual resources of some physical hosts are supplied too little after long-term operation, so that the virtual network or the block cannot be deployed effectively. The supply among the physical host machines is unbalanced, and when a new virtual network needs to be deployed, the balanced deployment is difficult. (b) The block cutting is too small, so that the virtual network is easily cut to be more than fragmented, the number of blocks is too many, and when the virtual network is distributed and deployed on a plurality of physical host machines (especially under the condition that the physical host machines need to be communicated with each other through multiple hops), the OVS + VxLAN tunnel-based cross-host machine communication causes too much performance loss, and the simulation effect and the fidelity are influenced. The requirement of the blocking and deployment method needs to be able to dynamically and autonomously learn, adapt to the resource consumption requirement of the virtual network and the resource supply situation of the physical host, form a virtual network block with a suitable scale, and perform hierarchical and distributed optimized deployment based on the network block. However, the existing work mostly abstracts the distributed deployment of the virtual network into a mathematical programming problem (generally NP is difficult), and a heuristic method is utilized to balance the solving efficiency and the optimization degree. However, the heuristic method has no advantages in the aspects of dynamic and timeliness characteristics, and the self-adaptive capability and the learning and evolution capability in the face of a complex network environment are also weak.

Disclosure of Invention

The invention aims at the problems existing in the prior artThe utility model provides a virtual network hierarchical distributed deployment method and system based on reinforcement learning. The method is oriented to network simulation based on Docker containerized virtual network, aims at the problem of optimizing block cutting and deployment of the virtual network, and designs a reward mechanism which gives consideration to resource consumption of physical host machines and communication performance loss of virtual network block cutting and host machine crossing through a reinforcement learning framework under the distributed environment of the physical host machines with limited computing, network and storage resources

According to

Continuously calculating different actions taken by each physical host machine under different resource supply states s

Long term benefits of

Make the algorithm according to

The specific technical scheme of the invention is as follows:

a virtual network hierarchical distributed deployment method based on reinforcement learning comprises the following steps:

step 1: according to each physical host machine

Establishing an action value function to form an action value function table;

wherein,

represents a physical host machine, the superscript p represents physical, the subscript r represents the number of the physical host machine, and the value range is

And R is the total number of the physical host machines.

Step 2: waiting for a new virtual network deployment request, and jumping to the step 3 when the new virtual network deployment request arrives;

and step 3: based on observations of resource supply of physical hosts

Finding the physical host with the largest resource supply

；

And 4, step 4: judging the physical host machine

Whether or not the virtual network can be accommodated,

if so, jumping to step 5,

if the data can not be accommodated, jumping to step 6;

and 5: direct deployment, setting current actions as

Deploying and skipping to step 8;

step 6: deployment of blocks according to action cost function

Selecting an action if the action is

Deploying, skipping to step 8, if the action is

Expanding and jumping to step 7;

and 7: virtual network element with maximum out-degree in undeployed part of virtual network

As the center, the expansion of the blocks is carried out, and the virtual network element set in the virtual network blocks is gradually constructed

Skipping to step 8;

wherein,

representing a virtual network element, superscript

Represents a local, subscript

The number representing the virtual network element has a value range of

，IIs the total number of the virtual network elements.

The virtual network block is represented by a virtual network block, the superscript b represents a block, the subscript m represents the serial number of the virtual network block, the total block number of the virtual network block is undetermined, and the virtual network block is dynamically determined by an algorithm according to the resource supply of a physical host machine and other multi-aspect conditions.

And 8: calculating the prize according to the formula

：

In the formula,

at time t, the physical host

The number of deployed virtual network tiles,

for cutting blocks from virtual networks

The sum of the multi-dimensional resources consumed,

is the largest physical host

Observations of resource provisioning;

and step 9: according to the reward

Updating action cost function in current action cost function table

：

Step 10: judging whether the current action is

Deploying actions, if yes, jumping to the step 11; if not, skipping to the step 3;

step 11: deploying virtual network elements in a current virtual network

Or virtual network element set in virtual network tiles

To the currently selected physical host

Updating the state S of the physical host according to the attribute value;

step 12: judging whether the virtual network is completely deployed or not, and if so, skipping to the step 2; if not, skipping to the step 3.

Preferably, in said step 3, according to

Finding the physical host with the largest resource supply

。

Preferably, in step 7, the virtual network element with the largest out-degree in the undeployed part of the virtual network is used

As a center, the block is expanded, and a virtual network element set in the virtual network block is gradually constructed by adopting breadth-first search

And skipping to step 8.

Preferably, in step 9: updating the action cost function in the current action cost function table according to the following formula

：

The formula is expressed as:

wherein the prize is awarded

Acting on behalf of the current state s

The short-term benefit is obtained by the method,

representing all optional actions in the current state s

Maximum long term benefit obtainable in (1)

Indicates that the action is selected

After that, a jump is made to a new state, max denotes taking the maximum value,

represents the summation of the short-term benefit and the long-term benefit, and is the subsequent maximum benefit which can be obtained in the current state, wherein

For discount rate, representing long-term benefit

The influence rate of the benefit in the current state is closer to 1, which means that the long-term benefit is emphasized more, and conversely, the short-term benefit is emphasized more,

indicates this iteration to select a new action

With prime mover

A return gain formed therebetween, wherein

The learning rate represents the speed of reinforcement learning, and the closer to 1 represents the faster learning, and the slower learning is vice versa; the whole formula

The representative continuously updates each action taken in each state s by iteratively calculating the return gain

Long term benefits that can be obtained

Thereby enabling the system to autonomously select the optimal action by learning.

Preferably, in the step 8,

denoted by physical host at time t

The provided multidimensional resources mainly comprise CPU resources of a processor, RAM resources of a memory and DISK resources of a DISK.

A reinforcement learning based hierarchical distributed deployment system for a virtual network, comprising:

the action value function table building module: for according to each physical host

Establishing an action value function to form an action value function table;

the virtual network deployment request processing module: the system comprises a physical host searching module, a resource allocation module and a resource allocation module, wherein the physical host searching module is used for sending a signal to control the resource allocation module to work when a new virtual network allocation request arrives;

the physical host search module with the largest resource supply connected with the virtual network deployment request processing module: for observing resource supply according to physical host machine

Finding the physical host with the largest resource supply

；

And a first judgment module of the searching module of the physical host with the largest resource supply: a physical host for determining that the resource supply is maximum

Whether or not the virtual network can be accommodated,

if the module can be accommodated, the direct deployment module is controlled to work,

if the block can not be accommodated, controlling the block deployment module to work;

the direct deployment module is connected with the first judgment module: for direct deployment of virtual networks, setting current actions as

Deploying, and sending a signal to control the calculation module to start working;

the dicing deployment module is connected with the first judgment module: for selecting an action according to an action cost function if the action is

Deploying, and sending a signal to control the computing module to start working, if the action is

Expanding and sending a signal to control a virtual network element set building module to start working;

a virtual network element set constructing module connected with the block deployment module:for maximizing virtual network element in undeployed part of virtual network

Sending a signal to control the calculation module to start working;

the computing module is connected with the direct deployment module and the virtual network element set constructing module: for calculating a prize according to the formula

：

In the formula,

the physical host with maximum resource supply at the time of t

The number of deployed virtual network tiles,

for cutting blocks from virtual networks

The sum of the multi-dimensional resources consumed,

is the largest physical host

Observations of resource provisioning;

and the placeThe updating module is connected with the action value function table building module: for according to the reward

Updating action cost function in current action cost function table

：

And the second judgment module is connected with the action value function table construction module: for judging whether the current action is

Deploying, if so, sending a signal to control the deployment processing module to start working; if not, sending a signal to control the physical host search module with the maximum resource supply to start working;

a deployment processing module connected to the second determination module: method for deploying virtual network elements in current virtual network

Or virtual network element set in virtual network tiles

To the currently selected physical host

Updating the state S of the physical host according to the attribute value;

a third judgment module connected with the deployment processing module, the virtual network deployment request processing module and the physical host search module with the maximum resource supply: judging whether the virtual network is completely deployed, if so, sending a signal to control the virtual network deployment request processing module to work; if not, sending a signal to control the physical host search module with the maximum resource supply to start working.

Preferably, the physical host search module with the largest resource supplyA block according to

Finding the physical host with the largest resource supply

。

Preferably, the updating module is configured to update the action cost function in the current action cost function table according to the following formula

：

The formula is expressed as:

wherein the prize is awarded

Acting on behalf of the current state s

The short-term benefit is obtained by the method,

representing all optional actions in the current state s

Maximum long term benefit obtainable in (1)

Indicates that the action is selected

For discount rate, representing long-term benefit

indicates this iteration to select a new action

With prime mover

A return gain formed therebetween, wherein

Long term benefits that can be obtained

Has the advantages that:

the invention aims at network simulation based on Docker containerized virtual network, aims at the problems of optimizing, cutting and deploying virtual network, and achieves the purpose of layering and optimizing and deploying virtual network by a reinforcement learning framework under the distributed environment of physical host machines with limited computing, network and storage resources. The beneficial effects mainly include:

the method is based on the lightweight virtualization technology: the deployment algorithm enables the reinforcement learning framework to be effectively adapted to the hierarchical and distributed deployment scenes of the virtual network according to the characteristics of low consumption and fine granularity of Docker and the light-weight characteristics of network virtualization technologies such as OVS and VxLAN.

Autonomous dynamic learning: the process of determining the block size of the virtual network mainly comprises the long-term benefit of the system according to the state and the action

The system is determined by the algorithm which is not designed in advance basically, so that subjective interference of artificial algorithm design is less, the system has better dynamic and autonomous learning capability, the resource supply of a physical host machine and the resource consumption requirement of a virtual network are dynamically matched, and optimal block cutting and deployment are realized.

Resource consumption balancing: the design of reinforcement learning reward considers the resource consumption of a physical host and the communication performance loss of the virtual network block across the hosts, on one hand, the number of blocks of the virtual network block is controlled as much as possible, and on the other hand, the scale of the virtual network block is controlled as much as possible. Macroscopically, the difference of the scales of the virtual networks is not too large, so that the resource consumption of the physical host machine is balanced.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

Fig. 1 is a schematic view of a hierarchical distributed deployment of a virtual network in the present invention;

FIG. 2 is a framework of a virtual network hierarchical distributed deployment system based on reinforcement learning in the present invention;

FIG. 3 is a flow of hierarchical distributed deployment of a reinforcement learning-based virtual network in the present invention;

fig. 4 is schematic diagrams of two virtual networks to be deployed in the present invention, and in fig. 4, (a) shows a smaller-scale virtual network to be deployed and (b) shows a larger-scale virtual network to be deployed.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that all the directional indicators (such as upper, lower, left, right, front and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.

In addition, the descriptions related to "first", "second", etc. in the present invention are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention will now be further described with reference to the accompanying drawings.

In order to facilitate the subsequent introduction of a specific technical scheme, concepts used in the invention are defined, and the hierarchical distributed deployment problem of the virtual network is modeled.

Logical virtual network: is an overall logic description of the virtual network to be deployed, input by the user, and directly faces the user, and is an undirected graph

Wherein

A set of virtual network elements is represented,

a set of virtual links is represented as a set of virtual links,

can also be recorded as

Representing virtual network elements

And

a virtual link between them. As shown at the top of fig. 1.

Virtual network dicing: when the logic virtual network can not be integrally deployed on a host machine, the logic virtual network is cut into a plurality of virtual network blocks according to a certain algorithm

That is, the virtual network partition is a local description of the logical virtual network after being partitioned, and is transparent to the user. The mth virtual network is cut into undirected graphs

Wherein

A set of virtual network elements is represented,

a set of virtual links is represented as a set of virtual links,

can also be recorded as

Representing virtual network elements

And

a virtual link between them. As shown in the middle "cut a" and "cut B" of fig. 1.

Physical network: is a network formed by physical hosts for deploying virtual networks and is an undirected graph

Wherein

Represents a collection of physical host machines that are,

a set of physical links is represented as,

can also be recorded as

Denotes a physical host

And

the physical link between them. As shown at the bottom of fig. 1.

Additional auxiliary networks: obviously, the virtual network segment may include a plurality of additional virtual network elements and virtual links transparent to the user, and is used for purposes of virtual network segment cross-host communication, and the like, so the following mathematical relationship holds:

. The additional auxiliary network is an auxiliary virtual network for connecting the virtual network blocks

Transparent to the user, automatically generated by an algorithm on demand, wherein

A set of additional virtual network elements is represented,

representing a set of additional virtual links. As shown in fig. 1, OVS1, OVS2 are additional virtual network elements, and veth-pair2, veth-pair3, VxLAN are additional virtual links.

Resource supply: at time t, by a physical host

The provided multidimensional resources mainly comprise resources such as a CPU (central processing unit), an RAM (memory access memory), a DISK (DISK drive), and the like, and are marked as

And by physical links

The provided multidimensional resources mainly comprise resources such as bandwidth BW and the like, and are marked as

。

Resource consumption: at the time of t, the virtual network elements in the virtual network block are collected

The sum of the consumed multidimensional resources mainly includes resources such as a processor CPU, a memory RAM, a DISK DISK and the like, and is marked as

And the virtual link set in the virtual network block

The sum of the consumed multidimensional resources, mainly including bandwidth BW and other resources, is marked as

。

Distributed deployment of virtual networks: the method can be modeled as a 0-1 planning problem, the optimization goal is to minimize the number M of virtual network blocks so as to reduce the performance loss caused by cross-host communication, and the constraint condition is that all virtual network blocks of a logical virtual network are mapped and deployed on a certain physical host and the relationship between resource supply and resource consumption is correctly matched. Solved by

Is a variable of 0-1 and is used for determining virtual network blocks

Whether or not to be deployed in a physical host

And (4) the following steps. The mathematical expression is as follows:

the invention combines virtualization according to a reinforcement learning frameworkThe concrete requirement of hierarchical and distributed deployment of the network, the quintuple for strengthening the learning

Modeling and design were performed, where:

s: finite set of states, here observations of the supply of physical host resources, i.e.

Due to the fact that

The contained attribute of each dimension is a continuous space, on one hand, fitting can be carried out by matching a Deep reinforcement learning framework such as DQN (Deep Q-Network) with a convolution neural Network, and on the other hand, fitting can be carried out by matching the Deep reinforcement learning framework such as DQN (Deep Q-Network) with the convolution neural Network

The continuous state space of the multi-dimensional attributes is discretized, finite states are constructed based on the attributes, and a lightweight Q-learning algorithm can be adopted for solving. The invention adopts a second, more lightweight method, and the discretization is specifically referred to the subsequent step 1.

A: a finite set of movements, consisting essentially of two movements, (1)

Deployment action: deploying entire logical virtual network or current virtual network tiles

Physical hosts with maximum resource supply

Wherein

And will correspond to

1, placing; (2)

expanding action: continuing to use the virtual network element with the largest out-degree in the undeployed part of the virtual network

Is a center (wherein

) Expanding by breadth-first search to construct virtual network blocks

。

P: the finite set of transition probabilities between states, which is not involved in the algorithm, is model-free reinforcement learning and can be ignored.

R: the concrete modeling calculation mode of the set of the rewards corresponding to each action is shown in the subsequent step 8.

: the discount factor is a factor of the discount,

and indicates the influence degree of the reward of the follow-up action on the current action.

The method system framework constructed in this way is shown in fig. 2, and comprises 4 core modules and 1 core database: the system comprises a virtual network analysis module, a cutting judgment module, an optimized cutting module, an optimized deployment module and a Q table database.

The specific technical scheme and flow of the virtual network hierarchical distributed deployment based on reinforcement learning are as follows, and refer to fig. 3.

Step 1 is that each physical host machine

Establishing independent action pricesThe value function table (i.e., the Q table, whose structure is shown in Table 1), row state, and column action. Each row state represents resource provisioning of a physical host

A number of subrows are included, i.e. the state is a linear combination of attributes. Because each dimension attribute is a continuous state space, the dimension attributes are discretized according to a certain rule (for example, the memory RAM can be segmented on the basis of 4 GB), and finite states are constructed on the basis of the attribute segments. Within a cell is a corresponding action cost function

And is initialized to 0.

Table 1 physical host machine

Structure of (Q) table

And 2, entering a main loop of the virtual network deployment algorithm, and waiting for a new virtual network deployment request. And when a new virtual network deployment request arrives, skipping to the step 3.

Step 3, finding out the physical host with the maximum resource supply

Wherein

And switching to the corresponding Q table.

Step 4, judging the physical host machine

Whether the virtual network can be accommodated. If the data can be accommodated, jumping to the step 5; if not, jumping to step 6.

Step 5, starting a direct deployment processSet the current action to

And deploying and skipping to the step 8.

Step 6, starting a dicing deployment process by adopting

Algorithm according to

The selection of the action is made in such a way that,

is a smaller value (e.g. of

) The method is used for encouraging the network to be expanded as much as possible and reducing the number of blocks. If the action is

Deploying and skipping step 8; if the action is

Expanding and jumping to step 7; and setting the current action according to the selection result.

Step 7, continuing to use the virtual network element with the largest out-degree in the undeployed part of the virtual network

Is a center (wherein

) Expanding the blocks by adopting breadth-first search and gradually constructing virtual network blocks

And skipping to step 8.

Step 8 calculates the prize according to the following formula

。

a) The left side of the right half of the formula can be understood as: the dicing deployment reward encourages the dicing to be as large as possible, so that the dicing quantity is as small as possible, and the extra performance loss of cross-host communication is reduced. Wherein

At time t, the physical host

The number of deployed virtual network tiles. This part is a positive number, so it can be seen that there is a reward for successful deployment, but the more virtual networks deployed, the faster the reward decays, thus suppressing the excessive number of blocks.

b) The right side of the right half of the formula can be understood as: and the block expansion reward encourages the block to be as small as possible, and ensures that the physical host consumes resources as few as possible so as to accommodate the deployment of other subsequent virtual networks. It can be seen that the more resources are consumed by the virtual network in blocks, the faster the reward is decayed. When the cut piece is over-expanded, i.e.

A deployment failure will result and a negative reward is formed to inhibit the dice from over-expanding.

The two parts of rewards are mutually restricted, and the virtual network block size and the deployment mode matched with the existing system resource supply are learned through the continuous operation of the reinforcement learning framework.

Step 9, updating the current Q table according to the following formula according to the Q-learning algorithm

：

The formula is expressed as:

wherein

Is the maximum gain that can be obtained in the current state, often called target Q,

the accumulated reward is now, and the two are subtracted, so that the reward gain, that is, the TD deviation (temporal difference error), is obtained. Wherein

To learn the rate, show the gain in return for

The degree of influence of (c);

specifically, awards

Acting on behalf of the current state s

The short-term benefit is obtained by the method,

representing all optional actions in the current state s

Maximum long term benefit obtainable in (1)

Indicates that the action is selected

For discount rate, representing long-term benefit

indicates this iteration to select a new action

With prime mover

A return gain formed therebetween, wherein

Long term benefits that can be obtained

Step 10 of judging whether the current action is

Deploying actions, if yes, jumping to the step 11; if not, skipping to the step 3.

Step 11 deploys the current virtual network

Or cutting into pieces

To the currently selected physical host

And updating the state S of the physical host according to the attribute value.

Step 12 determines whether the virtual network has been completely deployed. If yes, skipping to the step 2; if not, skipping to the step 3.

This example will deploy two virtual networks topo1 and topo2 onto physical hosts H1 and H2 as in fig. 4. To simplify the discussion, the multidimensional index is simplified into a one-dimensional index of the RAM. H1 and H2 are physical hosts of the same configuration, and the RAM supply is 16 GB, i.e.

. The RAM is segmented into 4 gears, and four states are constructed

To

H1 and H2 form a Q table as shown in table 2, and the Q value is initialized to 0. The memory consumption of each virtual network element in the virtual network is 0.5 GB, namely

. According to the aboveConfiguration, see

That is, when topo1 is deployed first, no cutting may be used, and when topo2 is deployed next, a cutting must be made. The parameters for the Q-learning reinforcement learning calculation are:

。

TABLE 2Q-Table Structure of physical hosts H1 or H2

First, deploying topo 1: the host H1 with the largest resource supply at this time is found to be

State (see Table 2), the virtual network can be accommodated, selected according to step 5

Skipping to step 8 to obtain the corresponding reward

Is calculated according to step 9

Fill its Q table and finally deploy directly to H1. After deployment, the number of virtual network blocks on the host

As shown in table 3, line 1.

Second, topo2 is next deployed.

a) Deploying a first block of virtual network blocks: the host H2 with the largest resource supply at this time is found to be

Status (see Table 2), unable to accommodate the virtual network, jump to step 6, based on

Algorithm due to

Selecting with great probability

Act of

And expanding the existing network blocks by using the R3 virtual network element with the maximum output as a center and adopting breadth-first search. Then skipping to step 8 to obtain the corresponding reward

Is calculated according to step 9

Fill in its Q table. Resource consumption at this point due to no actual deployment

As shown in table 3, line 2. Subsequent rounds based on

And

(it is noted that,

since the accumulation of each round of awards changes gradually), continuously and repeatedly selecting

Continuing to expand existing network tiles, similar to Table 3 line 2, with specific changes such asTable 3, lines 3-11. The latter round according to

(Note that

) It is possible to select those having a non-maximum Q value

Skipping to step 8 to obtain the corresponding reward

Is calculated according to step 9

Fill its Q table and finally deploy directly to H2. After deployment, the number of virtual network blocks on the host

As shown in table 3, line 10. And finishing the deployment of the first block of virtual network blocks. It should be noted that although a virtual network block of 10 virtual network elements is constructed and deployed here (rows 3-12 of table 3), how many network elements a particular block contains is composed of

(reflecting the long-term benefits of a particular action in a particular state) and

(introduction of the randomness to avoid the action selection rigidity) is determined together, in the specific implementation, the number of the virtual network elements contained in the virtual network block is not necessarily 10, and 10 are only used as examples here.

b) Deploying a second virtual network block: after the deployment of the first virtual network partition is completed, the entire topo2 is not yet deployed, and the algorithm needs to continue to run. The host H1 with the largest resource supply at this time is found to be

Algorithm due to

Selecting with great probability

Act of

Is calculated according to step 9

As shown in table 3, line 13. Subsequent rounds based on

And

(it is noted that,

due to progressive release of the award in each round), continuously and repeatedly selecting

The existing network cut was expanded continuously, similar to table 3 line 13, with specific changes as shown in table 3 lines 14-18. The latter round according to

(Note that

) It is possible to select

Not of maximum value

Skipping to step 8 to obtain the corresponding reward

Is calculated according to step 9

As shown in table 3, line 19. And finishing the deployment of the second virtual network block. It should be noted that although a virtual network block of 7 virtual network elements is constructed and deployed here (rows 13-19 of table 3), how many network elements a particular block contains is composed of

(introduction of the randomness to avoid the action selection rigidity) is determined together, in the specific implementation, the number of the virtual network elements contained in the virtual network block is not necessarily 7, and the 7 are only used as examples here.

c) Transition of state: after the second virtual network block is deployed, two blocks (17 virtual network elements) are deployed in the entire topo2, and the algorithm needs to be continuously run if the deployment is not completed. The host H2 with the largest resource supply at this time is found, and since the resource consumption reaches the critical condition of state transition after the virtual network is actually deployed, the state is changed from the original state

Switch to

(see Table 2) to iteratively calculate the new state based on how the first and second virtual network tiles are deployed

Corresponding to

And

to thereby optimally select actions for subsequent processing

Or

The optimization cuts and deployments are made to provide a numerical basis, as shown in rows 20-21 of table 3. Not all subsequent iteration steps are listed here, limited to space.

Table 3 deployment virtual network topo1 and topo2 examples

The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A virtual network hierarchical distributed deployment method based on reinforcement learning is characterized by comprising the following steps:

step 1: for each physical host

Establishing an independent action value function table, wherein corresponding action value functions are arranged in cells

And is initialized to 0, wherein,

R is the total number of physical host machines; s represents a state in reinforcement learning,

representing actions in reinforcement learning, action cost function

Taking actions on behalf of a state s in reinforcement learning

Long term benefits of;

and step 3: based on observations of resource supply of physical hosts

Finding the physical host with the largest resource supply

(ii) a Wherein,

representing at time t, by a physical host

The multi-dimensional resources that are provided,

supplying the number of the largest physical host to the resource;

and 4, step 4: judging the physical host machine

Whether or not the virtual network can be accommodated,

if so, jumping to step 5,

if the data can not be accommodated, jumping to step 6;

and 5: direct deployment, setting current actions as

Deploying and skipping to step 8;

step 6: deployment of blocks according to action cost function

SelectingAction if the action is

Deploying, skipping to step 8, if the action is

Expanding and jumping to step 7;

Skipping to step 8; wherein,

for virtual network elements in a virtual network, superscripts

Represents a local, subscript

The number representing the virtual network element has a value range of

，IThe total number of the virtual network elements;

the number of the most out-dated virtual network element,

representing the virtual network blocks, the superscript b representing the block, the subscript m representing the number of the virtual network blocks, and the valueIn the range of

Dynamically determining the total number of blocks in the execution process;

and 8: calculating the prize according to the formula

：

In the formula,

is composed of

Physical host with maximum resource supply at any moment

The number of deployed virtual network tiles,

is represented in𝑡At the moment, the virtual network elements in the virtual network blocks are gathered

The sum of the multi-dimensional resources consumed,

is the largest physical host

Observations of resource provisioning;

and step 9: according to the reward

Updating action cost function in current action cost function table

：

Step 10: judging whether the current action is

step 11: deploying a current entire virtual network

Or virtual network element set in virtual network tiles

To the currently selected physical host

And updating the state of the physical host according to the attribute value

(ii) a Wherein

Superscript on behalf of the current entire virtual network

Represents local;

2. The method of claim 1, wherein in step 3, the rootAccording to

Finding the physical host with the largest resource supply

And the superscript R is the total number of the physical hosts.

3. The method according to claim 1, wherein in step 7, the most out-of-range virtual network element in the undeployed part of the virtual network is taken as the virtual network element

Is a center in which

Upper label ofIThe total number of the virtual network elements is subjected to block expansion, breadth-first search is adopted, and a virtual network element set in the virtual network blocks is gradually constructed

And skipping to step 8.

4. A method according to claim 1, characterized in that in step 9: updating the action cost function in the current action cost function table according to the following formula

：

The formula is expressed as:

wherein the prize is awarded

Representing the action taken in the current state sMaking

The short-term benefit is obtained by the method,

representing all optional actions in the current state s

Maximum long term benefit obtainable in (1)

Indicates that the action is selected

For discount rate, representing long-term benefit

indicates this iteration to select a new action

With prime mover

A return gain formed therebetween, wherein

Long term benefits that can be obtained

5. The method of claim 1, wherein, in step 8,

is shown as

From time to time, by physical host

6. A virtual network hierarchical distributed deployment system based on reinforcement learning is characterized by comprising:

the action value function table building module: for each physical host

And is initialized to 0, wherein,

R is the total number of physical host machines; s represents a state (state) in reinforcement learning,

representing actions in reinforcement learning, action cost function

Acting on a particular state s in a representation reinforcement study

Long term benefits of;

Finding the physical host with the largest resource supply

(ii) a Wherein,

representing at time t, by a physical host

The multi-dimensional resources that are provided,

supplying the number of the largest physical host to the resource;

Whether or not the virtual network can be accommodated,

a virtual network element set constructing module connected with the block deployment module: for maximizing virtual network element in undeployed part of virtual network

Sending a signal to control the calculation module to start working;

：

In the formula,

is composed of

Physical host with maximum resource supply at any moment

The number of deployed virtual network tiles,

for cutting blocks from virtual networks

Aggregate of multidimensional resources consumedAnd,

is the largest physical host

Observations of resource provisioning;

the updating module is connected with the action value function table building module and comprises: for according to the reward

Updating action cost function in current action cost function table

：

Deploying, if so, sending a signal to control a deployment processing module to start working; if not, sending a signal to control the physical host search module with the maximum resource supply to start working;

Or virtual network element set in virtual network tiles

To the currently selected physical host

Updating the state S of the physical host according to the attribute value;

7. The reinforcement learning-based virtual network hierarchical distributed deployment system according to claim 6, wherein the resource-supply-maximum physical host search module is based on

Finding the physical host with the largest resource supply

。

8. The reinforcement learning-based virtual network hierarchical distributed deployment system according to claim 6, wherein the updating module is configured to update the action cost function in the current action cost function table according to the following formula