CN113641462B - Virtual network hierarchical distributed deployment method and system based on reinforcement learning - Google Patents
Virtual network hierarchical distributed deployment method and system based on reinforcement learning Download PDFInfo
- Publication number
- CN113641462B CN113641462B CN202111195085.XA CN202111195085A CN113641462B CN 113641462 B CN113641462 B CN 113641462B CN 202111195085 A CN202111195085 A CN 202111195085A CN 113641462 B CN113641462 B CN 113641462B
- Authority
- CN
- China
- Prior art keywords
- virtual network
- physical host
- module
- action
- deployment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000002787 reinforcement Effects 0.000 title claims abstract description 36
- 230000009471 action Effects 0.000 claims abstract description 101
- 230000008901 benefit Effects 0.000 claims abstract description 55
- 238000005520 cutting process Methods 0.000 claims abstract description 17
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 33
- 230000007774 longterm Effects 0.000 claims description 24
- 238000012545 processing Methods 0.000 claims description 16
- 230000009191 jumping Effects 0.000 claims description 14
- 230000000875 corresponding effect Effects 0.000 claims description 10
- 238000013468 resource allocation Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 2
- 230000001276 controlling effect Effects 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 abstract description 29
- 238000004088 simulation Methods 0.000 abstract description 10
- 238000004891 communication Methods 0.000 abstract description 9
- 238000013461 design Methods 0.000 abstract description 6
- 238000003860 storage Methods 0.000 abstract description 5
- 238000013467 fragmentation Methods 0.000 abstract description 2
- 238000006062 fragmentation reaction Methods 0.000 abstract description 2
- 230000007246 mechanism Effects 0.000 abstract description 2
- 238000013507 mapping Methods 0.000 description 14
- 238000005516 engineering process Methods 0.000 description 9
- 238000005457 optimization Methods 0.000 description 5
- 241000720945 Hosta Species 0.000 description 4
- 230000000903 blocking effect Effects 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 235000008694 Humulus lupulus Nutrition 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000005641 tunneling Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5077—Logical partitioning of resources; Management or configuration of virtualized resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45595—Network integration; Enabling network access in virtual machine instances
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a virtual network hierarchical distributed deployment method and system based on reinforcement learning, which are oriented to network simulation based on Docker containerized virtual network, aim at optimizing block cutting and deployment problems of the virtual network, and design a reward mechanism giving consideration to both physical host machine resource consumption and virtual network block cutting cross-host machine communication performance loss through a reinforcement learning framework under the distributed environment of physical host machines with limited calculation, network and storage resourcesAccording toContinuously calculating the supply state of each physical host machine in different resourcesTake different actionsLong term benefits ofMake the algorithm according toDynamically and autonomously continuously learning the optimized virtual network block size and the appropriate deployment time, and introducing certain dynamic randomness through a randomness strategy to avoid excessively rigid selectionThe maximum action causes the problem of resource over consumption or block over fragmentation, thereby achieving the purpose of layering and distributed optimal deployment of the virtual network.
Description
Technical Field
The invention relates to the technical field of network virtualization, in particular to a virtual network hierarchical distributed deployment method and system based on reinforcement learning.
Background
Network simulation is a key support for computer network architecture, protocol and algorithm research. Because the Docker container reserves a basically complete TCP/IP protocol stack, and has higher start efficiency and less performance overhead compared with a virtual machine, the containerization technology is gradually popular, and a new idea is provided for network simulation, namely, the Docker container is used as a core to construct virtual network elements (such as virtual routers, virtual switches, virtual end systems and the like), and technologies such as veth-pair, OVS (Open vSwitch), vxlan (virtual eXtensible Local Area network) and the like are used in cooperation to generate virtual links for connection, so that a virtual network is formed to be deployed on a physical host machine for simulation. The Docker container runs a TCP/IP protocol stack of a Linux kernel, is efficient and low in consumption, and has an open programming interface, so that network simulation based on the Docker container has the characteristics of high fidelity and easiness in programming. In a physical host with limited computing, network and storage resources, to implement large-scale deployment of containerized virtual networks based on technologies such as Docker and OVS, reasonable, automatic and efficient mapping and necessary block deployment need to be performed between the virtual networks and the computing clusters of the physical host, so that the resource demand of the virtual networks and the resource supply of the physical host are relatively balanced in a distributed scene, and the performance of network simulation is improved. Therefore, virtual network distributed optimization deployment is the key to network simulation based on the Docker technology.
The academic community has related research on related problems, namely, Virtual Network mapping (VNE), and the solution of the problem has relatively high complexity and even NP difficulty. Early learners solve the problem by a pure heuristic method, but local optimal solutions can be obtained by the pure heuristic method, and the problem of the local optimal solutions can be effectively solved by using meta-heuristic solving, for example, FAJJARI et al propose an expandable mapping strategy based on an ant colony meta-heuristic algorithm; also, for example, ARA Ú JO et al propose a hybrid algorithm incorporating meta-heuristics, and an online policy that takes into account the execution speed of the virtual network map to ensure minimal latency, providing a fast solution in a multi-domain environment. However, the mapping algorithm in the existing literature is mainly designed for application scenarios based on virtual machines, and mainly from the aspects of virtual machine resource allocation efficiency and mapping success rate. The virtual network is constructed by the Docker technology, and besides the consideration of the high efficiency of resource allocation and the success rate of mapping, the mapping algorithm needs to be designed and optimized by combining the technical characteristics of the Docker: (1) virtual network elements simulated by the Docker container are represented as low-overhead processes on a host machine, the granularity is finer, the time variation is more obvious, and therefore the mapping algorithm is required to have better dynamic property and adaptability and needs to be sensitive and agile to resource consumption and change; (2) docker is used as a lightweight virtualization technology, on one hand, the Docker is beneficial to constructing a virtual network with a larger scale, on the other hand, the Docker is expected to be deployed on a plurality of low-profile X86 host machines, and both Docker and Docker require full consideration of the resource limitation characteristics of the host machines and flexibly perform automatic optimization, block cutting and deployment on the virtual network; (3) after the virtual network is cut into blocks and is respectively deployed on a plurality of host machines, cross-host machine communication needs to be achieved through modes such as OVS + VxLAN and the like so as to transparently present a uniform virtual network to a user, therefore, the mapping algorithm needs to be cooperatively optimized by combining technical characteristics of virtual switches such as OVS + VxLAN tunnels and the like, and under the condition of considering host machine resource consumption, the number of virtual network cuts is reduced as much as possible so as to control performance loss caused by cross-host machine communication among the cuts.
Specifically, fig. 1 is a macro process for performing blocking, mapping, and deployment on a virtual network, which includes the following steps:
1. topology description of virtual networks: assuming that a user wants to deploy a virtual network as shown in the top of fig. 1 (the size of the virtual network may be large, and the embodiment of the present invention is illustrated for convenience, and only a topology of 2 end systems and 2 routers is drawn), the virtual network may be described in JSON file format.
2. And (3) slicing and mapping of the virtual network: (1) inputting an algorithm: the JSON file is used as the input of the algorithm, after the topological structure of the virtual network is read by the algorithm, whether the virtual network is cut into blocks or mapped is determined according to the conditions of the residual calculation, network and storage resources of the existing physical hosts (if a single host can accommodate the whole virtual network, the cutting and mapping are not needed). (2) And (3) outputting an algorithm: if dicing and mapping are required, several diced JSON files are generated, as shown in the "dice a" and "dice B" portions of fig. 1.
3. Deployment of the virtual network: and each physical host machine receives the block JSON file, and generates various virtual network elements by using techniques such as Docker and OVS according to the JSON description. This involves network virtualization, and mainly includes two aspects: node virtualization and link virtualization. Node virtualization: simulating equipment such as an end system, a router and the like by using a Docker container; the OVS technique was used to simulate a two-layer switching device. Link virtualization: and connecting various virtual network elements obtained by node virtualization by using a veth-pair technology.
4. Reconnection of virtual network: after the virtual network is subjected to blocking and mapping, different blocks are respectively deployed on different host machines, and the original topological structure of the virtual network is damaged on some links. Therefore, it is necessary to reconnect the original topology across hosts, mainly implemented by using OVS + VxLAN through tunneling technique, as shown by OVS + VxLAN between "cut-block a" and "cut-block B" in fig. 1. Considering that the OVS + VxLAN tunnel causes certain network performance loss, in order to enable the virtual network to have higher fidelity, the algorithm design enables the virtual network deployed at the same time to be compact as much as possible, and the number of physical host links crossing the bottom layer is reduced, namely the original virtual network is not easy to be cut into pieces.
In fact, summarizing, the key to the problem lies in how to optimally block and deploy the virtual network, which faces the following technical difficulties: (a) the block is too large, which easily consumes the resources of the deployed physical hosts, and the residual resources of some physical hosts are supplied too little after long-term operation, so that the virtual network or the block cannot be deployed effectively. The supply among the physical host machines is unbalanced, and when a new virtual network needs to be deployed, the balanced deployment is difficult. (b) The block cutting is too small, so that the virtual network is easily cut to be more than fragmented, the number of blocks is too many, and when the virtual network is distributed and deployed on a plurality of physical host machines (especially under the condition that the physical host machines need to be communicated with each other through multiple hops), the OVS + VxLAN tunnel-based cross-host machine communication causes too much performance loss, and the simulation effect and the fidelity are influenced. The requirement of the blocking and deployment method needs to be able to dynamically and autonomously learn, adapt to the resource consumption requirement of the virtual network and the resource supply situation of the physical host, form a virtual network block with a suitable scale, and perform hierarchical and distributed optimized deployment based on the network block. However, the existing work mostly abstracts the distributed deployment of the virtual network into a mathematical programming problem (generally NP is difficult), and a heuristic method is utilized to balance the solving efficiency and the optimization degree. However, the heuristic method has no advantages in the aspects of dynamic and timeliness characteristics, and the self-adaptive capability and the learning and evolution capability in the face of a complex network environment are also weak.
Disclosure of Invention
The invention aims at the problems existing in the prior artThe utility model provides a virtual network hierarchical distributed deployment method and system based on reinforcement learning. The method is oriented to network simulation based on Docker containerized virtual network, aims at the problem of optimizing block cutting and deployment of the virtual network, and designs a reward mechanism which gives consideration to resource consumption of physical host machines and communication performance loss of virtual network block cutting and host machine crossing through a reinforcement learning framework under the distributed environment of the physical host machines with limited computing, network and storage resourcesAccording toContinuously calculating different actions taken by each physical host machine under different resource supply states sLong term benefits ofMake the algorithm according toDynamically and autonomously continuously learning the optimized virtual network block size and the appropriate deployment time, and introducing certain dynamic randomness through a randomness strategy to avoid excessively rigid selectionThe maximum action causes the problem of resource over consumption or block over fragmentation, thereby achieving the purpose of layering and distributed optimal deployment of the virtual network.
The specific technical scheme of the invention is as follows:
a virtual network hierarchical distributed deployment method based on reinforcement learning comprises the following steps:
step 1: according to each physical host machineEstablishing an action value function to form an action value function table;
wherein,represents a physical host machine, the superscript p represents physical, the subscript r represents the number of the physical host machine, and the value range isAnd R is the total number of the physical host machines.
Step 2: waiting for a new virtual network deployment request, and jumping to the step 3 when the new virtual network deployment request arrives;
and step 3: based on observations of resource supply of physical hostsFinding the physical host with the largest resource supply;
And 4, step 4: judging the physical host machineWhether or not the virtual network can be accommodated,
if so, jumping to step 5,
if the data can not be accommodated, jumping to step 6;
step 6: deployment of blocks according to action cost functionSelecting an action if the action isDeploying, skipping to step 8, if the action isExpanding and jumping to step 7;
and 7: virtual network element with maximum out-degree in undeployed part of virtual networkAs the center, the expansion of the blocks is carried out, and the virtual network element set in the virtual network blocks is gradually constructedSkipping to step 8;
wherein,representing a virtual network element, superscriptRepresents a local, subscriptThe number representing the virtual network element has a value range of,IIs the total number of the virtual network elements.The virtual network block is represented by a virtual network block, the superscript b represents a block, the subscript m represents the serial number of the virtual network block, the total block number of the virtual network block is undetermined, and the virtual network block is dynamically determined by an algorithm according to the resource supply of a physical host machine and other multi-aspect conditions.
In the formula,at time t, the physical hostThe number of deployed virtual network tiles,for cutting blocks from virtual networksThe sum of the multi-dimensional resources consumed,is the largest physical hostObservations of resource provisioning;
and step 9: according to the rewardUpdating action cost function in current action cost function table:
Step 10: judging whether the current action isDeploying actions, if yes, jumping to the step 11; if not, skipping to the step 3;
step 11: deploying virtual network elements in a current virtual networkOr virtual network element set in virtual network tilesTo the currently selected physical hostUpdating the state S of the physical host according to the attribute value;
step 12: judging whether the virtual network is completely deployed or not, and if so, skipping to the step 2; if not, skipping to the step 3.
Preferably, in step 7, the virtual network element with the largest out-degree in the undeployed part of the virtual network is usedAs a center, the block is expanded, and a virtual network element set in the virtual network block is gradually constructed by adopting breadth-first searchAnd skipping to step 8.
Preferably, in step 9: updating the action cost function in the current action cost function table according to the following formula:
The formula is expressed as:
wherein the prize is awardedActing on behalf of the current state sThe short-term benefit is obtained by the method,representing all optional actions in the current state sMaximum long term benefit obtainable in (1)Indicates that the action is selectedAfter that, a jump is made to a new state, max denotes taking the maximum value,represents the summation of the short-term benefit and the long-term benefit, and is the subsequent maximum benefit which can be obtained in the current state, whereinFor discount rate, representing long-term benefitThe influence rate of the benefit in the current state is closer to 1, which means that the long-term benefit is emphasized more, and conversely, the short-term benefit is emphasized more,indicates this iteration to select a new actionWith prime moverA return gain formed therebetween, whereinThe learning rate represents the speed of reinforcement learning, and the closer to 1 represents the faster learning, and the slower learning is vice versa; the whole formulaThe representative continuously updates each action taken in each state s by iteratively calculating the return gainLong term benefits that can be obtainedThereby enabling the system to autonomously select the optimal action by learning.
Preferably, in the step 8,denoted by physical host at time tThe provided multidimensional resources mainly comprise CPU resources of a processor, RAM resources of a memory and DISK resources of a DISK.
A reinforcement learning based hierarchical distributed deployment system for a virtual network, comprising:
the action value function table building module: for according to each physical hostEstablishing an action value function to form an action value function table;
the virtual network deployment request processing module: the system comprises a physical host searching module, a resource allocation module and a resource allocation module, wherein the physical host searching module is used for sending a signal to control the resource allocation module to work when a new virtual network allocation request arrives;
the physical host search module with the largest resource supply connected with the virtual network deployment request processing module: for observing resource supply according to physical host machineFinding the physical host with the largest resource supply;
And a first judgment module of the searching module of the physical host with the largest resource supply: a physical host for determining that the resource supply is maximumWhether or not the virtual network can be accommodated,
if the module can be accommodated, the direct deployment module is controlled to work,
if the block can not be accommodated, controlling the block deployment module to work;
the direct deployment module is connected with the first judgment module: for direct deployment of virtual networks, setting current actions asDeploying, and sending a signal to control the calculation module to start working;
the dicing deployment module is connected with the first judgment module: for selecting an action according to an action cost function if the action isDeploying, and sending a signal to control the computing module to start working, if the action isExpanding and sending a signal to control a virtual network element set building module to start working;
a virtual network element set constructing module connected with the block deployment module:for maximizing virtual network element in undeployed part of virtual networkAs the center, the expansion of the blocks is carried out, and the virtual network element set in the virtual network blocks is gradually constructedSending a signal to control the calculation module to start working;
the computing module is connected with the direct deployment module and the virtual network element set constructing module: for calculating a prize according to the formula:
In the formula,the physical host with maximum resource supply at the time of tThe number of deployed virtual network tiles,for cutting blocks from virtual networksThe sum of the multi-dimensional resources consumed,is the largest physical hostObservations of resource provisioning;
and the placeThe updating module is connected with the action value function table building module: for according to the rewardUpdating action cost function in current action cost function table:
And the second judgment module is connected with the action value function table construction module: for judging whether the current action isDeploying, if so, sending a signal to control the deployment processing module to start working; if not, sending a signal to control the physical host search module with the maximum resource supply to start working;
a deployment processing module connected to the second determination module: method for deploying virtual network elements in current virtual networkOr virtual network element set in virtual network tilesTo the currently selected physical hostUpdating the state S of the physical host according to the attribute value;
a third judgment module connected with the deployment processing module, the virtual network deployment request processing module and the physical host search module with the maximum resource supply: judging whether the virtual network is completely deployed, if so, sending a signal to control the virtual network deployment request processing module to work; if not, sending a signal to control the physical host search module with the maximum resource supply to start working.
Preferably, the physical host search module with the largest resource supplyA block according toFinding the physical host with the largest resource supply。
Preferably, the updating module is configured to update the action cost function in the current action cost function table according to the following formula:
The formula is expressed as:
wherein the prize is awardedActing on behalf of the current state sThe short-term benefit is obtained by the method,representing all optional actions in the current state sMaximum long term benefit obtainable in (1)Indicates that the action is selectedAfter that, a jump is made to a new state, max denotes taking the maximum value,represents the summation of the short-term benefit and the long-term benefit, and is the subsequent maximum benefit which can be obtained in the current state, whereinFor discount rate, representing long-term benefitThe influence rate of the benefit in the current state is closer to 1, which means that the long-term benefit is emphasized more, and conversely, the short-term benefit is emphasized more,indicates this iteration to select a new actionWith prime moverA return gain formed therebetween, whereinThe learning rate represents the speed of reinforcement learning, and the closer to 1 represents the faster learning, and the slower learning is vice versa; the whole formulaThe representative continuously updates each action taken in each state s by iteratively calculating the return gainLong term benefits that can be obtainedThereby enabling the system to autonomously select the optimal action by learning.
Has the advantages that:
the invention aims at network simulation based on Docker containerized virtual network, aims at the problems of optimizing, cutting and deploying virtual network, and achieves the purpose of layering and optimizing and deploying virtual network by a reinforcement learning framework under the distributed environment of physical host machines with limited computing, network and storage resources. The beneficial effects mainly include:
the method is based on the lightweight virtualization technology: the deployment algorithm enables the reinforcement learning framework to be effectively adapted to the hierarchical and distributed deployment scenes of the virtual network according to the characteristics of low consumption and fine granularity of Docker and the light-weight characteristics of network virtualization technologies such as OVS and VxLAN.
Autonomous dynamic learning: the process of determining the block size of the virtual network mainly comprises the long-term benefit of the system according to the state and the actionThe system is determined by the algorithm which is not designed in advance basically, so that subjective interference of artificial algorithm design is less, the system has better dynamic and autonomous learning capability, the resource supply of a physical host machine and the resource consumption requirement of a virtual network are dynamically matched, and optimal block cutting and deployment are realized.
Resource consumption balancing: the design of reinforcement learning reward considers the resource consumption of a physical host and the communication performance loss of the virtual network block across the hosts, on one hand, the number of blocks of the virtual network block is controlled as much as possible, and on the other hand, the scale of the virtual network block is controlled as much as possible. Macroscopically, the difference of the scales of the virtual networks is not too large, so that the resource consumption of the physical host machine is balanced.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
Fig. 1 is a schematic view of a hierarchical distributed deployment of a virtual network in the present invention;
FIG. 2 is a framework of a virtual network hierarchical distributed deployment system based on reinforcement learning in the present invention;
FIG. 3 is a flow of hierarchical distributed deployment of a reinforcement learning-based virtual network in the present invention;
fig. 4 is schematic diagrams of two virtual networks to be deployed in the present invention, and in fig. 4, (a) shows a smaller-scale virtual network to be deployed and (b) shows a larger-scale virtual network to be deployed.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that all the directional indicators (such as upper, lower, left, right, front and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.
In addition, the descriptions related to "first", "second", etc. in the present invention are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention will now be further described with reference to the accompanying drawings.
In order to facilitate the subsequent introduction of a specific technical scheme, concepts used in the invention are defined, and the hierarchical distributed deployment problem of the virtual network is modeled.
Logical virtual network: is an overall logic description of the virtual network to be deployed, input by the user, and directly faces the user, and is an undirected graphWhereinA set of virtual network elements is represented,a set of virtual links is represented as a set of virtual links,can also be recorded asRepresenting virtual network elementsAnda virtual link between them. As shown at the top of fig. 1.
Virtual network dicing: when the logic virtual network can not be integrally deployed on a host machine, the logic virtual network is cut into a plurality of virtual network blocks according to a certain algorithmThat is, the virtual network partition is a local description of the logical virtual network after being partitioned, and is transparent to the user. The mth virtual network is cut into undirected graphsWhereinA set of virtual network elements is represented,a set of virtual links is represented as a set of virtual links,can also be recorded asRepresenting virtual network elementsAnda virtual link between them. As shown in the middle "cut a" and "cut B" of fig. 1.
Physical network: is a network formed by physical hosts for deploying virtual networks and is an undirected graphWhereinRepresents a collection of physical host machines that are,a set of physical links is represented as,can also be recorded asDenotes a physical hostAndthe physical link between them. As shown at the bottom of fig. 1.
Additional auxiliary networks: obviously, the virtual network segment may include a plurality of additional virtual network elements and virtual links transparent to the user, and is used for purposes of virtual network segment cross-host communication, and the like, so the following mathematical relationship holds:. The additional auxiliary network is an auxiliary virtual network for connecting the virtual network blocksTransparent to the user, automatically generated by an algorithm on demand, whereinA set of additional virtual network elements is represented,representing a set of additional virtual links. As shown in fig. 1, OVS1, OVS2 are additional virtual network elements, and veth-pair2, veth-pair3, VxLAN are additional virtual links.
Resource supply: at time t, by a physical hostThe provided multidimensional resources mainly comprise resources such as a CPU (central processing unit), an RAM (memory access memory), a DISK (DISK drive), and the like, and are marked asAnd by physical linksThe provided multidimensional resources mainly comprise resources such as bandwidth BW and the like, and are marked as。
Resource consumption: at the time of t, the virtual network elements in the virtual network block are collectedThe sum of the consumed multidimensional resources mainly includes resources such as a processor CPU, a memory RAM, a DISK DISK and the like, and is marked asAnd the virtual link set in the virtual network blockThe sum of the consumed multidimensional resources, mainly including bandwidth BW and other resources, is marked as。
Distributed deployment of virtual networks: the method can be modeled as a 0-1 planning problem, the optimization goal is to minimize the number M of virtual network blocks so as to reduce the performance loss caused by cross-host communication, and the constraint condition is that all virtual network blocks of a logical virtual network are mapped and deployed on a certain physical host and the relationship between resource supply and resource consumption is correctly matched. Solved byIs a variable of 0-1 and is used for determining virtual network blocksWhether or not to be deployed in a physical hostAnd (4) the following steps. The mathematical expression is as follows:
the invention combines virtualization according to a reinforcement learning frameworkThe concrete requirement of hierarchical and distributed deployment of the network, the quintuple for strengthening the learningModeling and design were performed, where:
s: finite set of states, here observations of the supply of physical host resources, i.e.Due to the fact thatThe contained attribute of each dimension is a continuous space, on one hand, fitting can be carried out by matching a Deep reinforcement learning framework such as DQN (Deep Q-Network) with a convolution neural Network, and on the other hand, fitting can be carried out by matching the Deep reinforcement learning framework such as DQN (Deep Q-Network) with the convolution neural NetworkThe continuous state space of the multi-dimensional attributes is discretized, finite states are constructed based on the attributes, and a lightweight Q-learning algorithm can be adopted for solving. The invention adopts a second, more lightweight method, and the discretization is specifically referred to the subsequent step 1.
A: a finite set of movements, consisting essentially of two movements, (1)Deployment action: deploying entire logical virtual network or current virtual network tilesPhysical hosts with maximum resource supplyWhereinAnd will correspond to1, placing; (2)expanding action: continuing to use the virtual network element with the largest out-degree in the undeployed part of the virtual networkIs a center (wherein) Expanding by breadth-first search to construct virtual network blocks。
P: the finite set of transition probabilities between states, which is not involved in the algorithm, is model-free reinforcement learning and can be ignored.
R: the concrete modeling calculation mode of the set of the rewards corresponding to each action is shown in the subsequent step 8.
: the discount factor is a factor of the discount,and indicates the influence degree of the reward of the follow-up action on the current action.
The method system framework constructed in this way is shown in fig. 2, and comprises 4 core modules and 1 core database: the system comprises a virtual network analysis module, a cutting judgment module, an optimized cutting module, an optimized deployment module and a Q table database.
The specific technical scheme and flow of the virtual network hierarchical distributed deployment based on reinforcement learning are as follows, and refer to fig. 3.
Step 1 is that each physical host machineEstablishing independent action pricesThe value function table (i.e., the Q table, whose structure is shown in Table 1), row state, and column action. Each row state represents resource provisioning of a physical hostA number of subrows are included, i.e. the state is a linear combination of attributes. Because each dimension attribute is a continuous state space, the dimension attributes are discretized according to a certain rule (for example, the memory RAM can be segmented on the basis of 4 GB), and finite states are constructed on the basis of the attribute segments. Within a cell is a corresponding action cost functionAnd is initialized to 0.
And 2, entering a main loop of the virtual network deployment algorithm, and waiting for a new virtual network deployment request. And when a new virtual network deployment request arrives, skipping to the step 3.
Step 3, finding out the physical host with the maximum resource supplyWhereinAnd switching to the corresponding Q table.
Step 4, judging the physical host machineWhether the virtual network can be accommodated. If the data can be accommodated, jumping to the step 5; if not, jumping to step 6.
Step 5, starting a direct deployment processSet the current action toAnd deploying and skipping to the step 8.
Step 6, starting a dicing deployment process by adoptingAlgorithm according toThe selection of the action is made in such a way that,is a smaller value (e.g. of) The method is used for encouraging the network to be expanded as much as possible and reducing the number of blocks. If the action isDeploying and skipping step 8; if the action isExpanding and jumping to step 7; and setting the current action according to the selection result.
Step 7, continuing to use the virtual network element with the largest out-degree in the undeployed part of the virtual networkIs a center (wherein) Expanding the blocks by adopting breadth-first search and gradually constructing virtual network blocksAnd skipping to step 8.
a) The left side of the right half of the formula can be understood as: the dicing deployment reward encourages the dicing to be as large as possible, so that the dicing quantity is as small as possible, and the extra performance loss of cross-host communication is reduced. WhereinAt time t, the physical hostThe number of deployed virtual network tiles. This part is a positive number, so it can be seen that there is a reward for successful deployment, but the more virtual networks deployed, the faster the reward decays, thus suppressing the excessive number of blocks.
b) The right side of the right half of the formula can be understood as: and the block expansion reward encourages the block to be as small as possible, and ensures that the physical host consumes resources as few as possible so as to accommodate the deployment of other subsequent virtual networks. It can be seen that the more resources are consumed by the virtual network in blocks, the faster the reward is decayed. When the cut piece is over-expanded, i.e.A deployment failure will result and a negative reward is formed to inhibit the dice from over-expanding.
The two parts of rewards are mutually restricted, and the virtual network block size and the deployment mode matched with the existing system resource supply are learned through the continuous operation of the reinforcement learning framework.
Step 9, updating the current Q table according to the following formula according to the Q-learning algorithm:
whereinIs the maximum gain that can be obtained in the current state, often called target Q,the accumulated reward is now, and the two are subtracted, so that the reward gain, that is, the TD deviation (temporal difference error), is obtained. WhereinTo learn the rate, show the gain in return forThe degree of influence of (c);
specifically, awardsActing on behalf of the current state sThe short-term benefit is obtained by the method,representing all optional actions in the current state sMaximum long term benefit obtainable in (1)Indicates that the action is selectedAfter that, a jump is made to a new state, max denotes taking the maximum value,represents the summation of the short-term benefit and the long-term benefit, and is the subsequent maximum benefit which can be obtained in the current state, whereinFor discount rate, representing long-term benefitThe influence rate of the benefit in the current state is closer to 1, which means that the long-term benefit is emphasized more, and conversely, the short-term benefit is emphasized more,indicates this iteration to select a new actionWith prime moverA return gain formed therebetween, whereinThe learning rate represents the speed of reinforcement learning, and the closer to 1 represents the faster learning, and the slower learning is vice versa; the whole formulaThe representative continuously updates each action taken in each state s by iteratively calculating the return gainLong term benefits that can be obtainedThereby enabling the system to autonomously select the optimal action by learning.
Step 10 of judging whether the current action isDeploying actions, if yes, jumping to the step 11; if not, skipping to the step 3.
Step 11 deploys the current virtual networkOr cutting into piecesTo the currently selected physical hostAnd updating the state S of the physical host according to the attribute value.
Step 12 determines whether the virtual network has been completely deployed. If yes, skipping to the step 2; if not, skipping to the step 3.
This example will deploy two virtual networks topo1 and topo2 onto physical hosts H1 and H2 as in fig. 4. To simplify the discussion, the multidimensional index is simplified into a one-dimensional index of the RAM. H1 and H2 are physical hosts of the same configuration, and the RAM supply is 16 GB, i.e.. The RAM is segmented into 4 gears, and four states are constructedToH1 and H2 form a Q table as shown in table 2, and the Q value is initialized to 0. The memory consumption of each virtual network element in the virtual network is 0.5 GB, namely. According to the aboveConfiguration, seeThat is, when topo1 is deployed first, no cutting may be used, and when topo2 is deployed next, a cutting must be made. The parameters for the Q-learning reinforcement learning calculation are:。
TABLE 2Q-Table Structure of physical hosts H1 or H2
First, deploying topo 1: the host H1 with the largest resource supply at this time is found to beState (see Table 2), the virtual network can be accommodated, selected according to step 5Skipping to step 8 to obtain the corresponding rewardIs calculated according to step 9Fill its Q table and finally deploy directly to H1. After deployment, the number of virtual network blocks on the hostAs shown in table 3, line 1.
Second, topo2 is next deployed.
a) Deploying a first block of virtual network blocks: the host H2 with the largest resource supply at this time is found to beStatus (see Table 2), unable to accommodate the virtual network, jump to step 6, based onAlgorithm due toSelecting with great probabilityAct ofAnd expanding the existing network blocks by using the R3 virtual network element with the maximum output as a center and adopting breadth-first search. Then skipping to step 8 to obtain the corresponding rewardIs calculated according to step 9Fill in its Q table. Resource consumption at this point due to no actual deploymentAs shown in table 3, line 2. Subsequent rounds based onAnd(it is noted that,since the accumulation of each round of awards changes gradually), continuously and repeatedly selectingContinuing to expand existing network tiles, similar to Table 3 line 2, with specific changes such asTable 3, lines 3-11. The latter round according to(Note that) It is possible to select those having a non-maximum Q valueSkipping to step 8 to obtain the corresponding rewardIs calculated according to step 9Fill its Q table and finally deploy directly to H2. After deployment, the number of virtual network blocks on the hostAs shown in table 3, line 10. And finishing the deployment of the first block of virtual network blocks. It should be noted that although a virtual network block of 10 virtual network elements is constructed and deployed here (rows 3-12 of table 3), how many network elements a particular block contains is composed of(reflecting the long-term benefits of a particular action in a particular state) and(introduction of the randomness to avoid the action selection rigidity) is determined together, in the specific implementation, the number of the virtual network elements contained in the virtual network block is not necessarily 10, and 10 are only used as examples here.
b) Deploying a second virtual network block: after the deployment of the first virtual network partition is completed, the entire topo2 is not yet deployed, and the algorithm needs to continue to run. The host H1 with the largest resource supply at this time is found to beStatus (see Table 2), unable to accommodate the virtual network, jump to step 6, based onAlgorithm due toSelecting with great probabilityAct ofAnd expanding the existing network blocks by using the R3 virtual network element with the maximum output as a center and adopting breadth-first search. Then skipping to step 8 to obtain the corresponding rewardIs calculated according to step 9Fill in its Q table. Resource consumption at this point due to no actual deploymentAs shown in table 3, line 13. Subsequent rounds based onAnd(it is noted that,due to progressive release of the award in each round), continuously and repeatedly selectingThe existing network cut was expanded continuously, similar to table 3 line 13, with specific changes as shown in table 3 lines 14-18. The latter round according to(Note that) It is possible to selectNot of maximum valueSkipping to step 8 to obtain the corresponding rewardIs calculated according to step 9Fill its Q table and finally deploy directly to H1. After deployment, the number of virtual network blocks on the hostAs shown in table 3, line 19. And finishing the deployment of the second virtual network block. It should be noted that although a virtual network block of 7 virtual network elements is constructed and deployed here (rows 13-19 of table 3), how many network elements a particular block contains is composed of(reflecting the long-term benefits of a particular action in a particular state) and(introduction of the randomness to avoid the action selection rigidity) is determined together, in the specific implementation, the number of the virtual network elements contained in the virtual network block is not necessarily 7, and the 7 are only used as examples here.
c) Transition of state: after the second virtual network block is deployed, two blocks (17 virtual network elements) are deployed in the entire topo2, and the algorithm needs to be continuously run if the deployment is not completed. The host H2 with the largest resource supply at this time is found, and since the resource consumption reaches the critical condition of state transition after the virtual network is actually deployed, the state is changed from the original stateSwitch to(see Table 2) to iteratively calculate the new state based on how the first and second virtual network tiles are deployedCorresponding toAndto thereby optimally select actions for subsequent processingOrThe optimization cuts and deployments are made to provide a numerical basis, as shown in rows 20-21 of table 3. Not all subsequent iteration steps are listed here, limited to space.
Table 3 deployment virtual network topo1 and topo2 examples
The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.
Claims (8)
1. A virtual network hierarchical distributed deployment method based on reinforcement learning is characterized by comprising the following steps:
step 1: for each physical hostEstablishing an independent action value function table, wherein corresponding action value functions are arranged in cellsAnd is initialized to 0, wherein,represents a physical host machine, the superscript p represents physical, the subscript r represents the number of the physical host machine, and the value range isR is the total number of physical host machines; s represents a state in reinforcement learning,representing actions in reinforcement learning, action cost functionTaking actions on behalf of a state s in reinforcement learningLong term benefits of;
step 2: waiting for a new virtual network deployment request, and jumping to the step 3 when the new virtual network deployment request arrives;
and step 3: based on observations of resource supply of physical hostsFinding the physical host with the largest resource supply(ii) a Wherein,representing at time t, by a physical hostThe multi-dimensional resources that are provided,supplying the number of the largest physical host to the resource;
and 4, step 4: judging the physical host machineWhether or not the virtual network can be accommodated,
if so, jumping to step 5,
if the data can not be accommodated, jumping to step 6;
step 6: deployment of blocks according to action cost functionSelectingAction if the action isDeploying, skipping to step 8, if the action isExpanding and jumping to step 7;
and 7: virtual network element with maximum out-degree in undeployed part of virtual networkAs the center, the expansion of the blocks is carried out, and the virtual network element set in the virtual network blocks is gradually constructedSkipping to step 8; wherein,for virtual network elements in a virtual network, superscriptsRepresents a local, subscriptThe number representing the virtual network element has a value range of,IThe total number of the virtual network elements;the number of the most out-dated virtual network element,representing the virtual network blocks, the superscript b representing the block, the subscript m representing the number of the virtual network blocks, and the valueIn the range ofDynamically determining the total number of blocks in the execution process;
In the formula,is composed ofPhysical host with maximum resource supply at any momentThe number of deployed virtual network tiles,is represented in𝑡At the moment, the virtual network elements in the virtual network blocks are gatheredThe sum of the multi-dimensional resources consumed,is the largest physical hostObservations of resource provisioning;
and step 9: according to the rewardUpdating action cost function in current action cost function table:
Step 10: judging whether the current action isDeploying actions, if yes, jumping to the step 11; if not, skipping to the step 3;
step 11: deploying a current entire virtual networkOr virtual network element set in virtual network tilesTo the currently selected physical hostAnd updating the state of the physical host according to the attribute value(ii) a WhereinSuperscript on behalf of the current entire virtual networkRepresents local;
step 12: judging whether the virtual network is completely deployed or not, and if so, skipping to the step 2; if not, skipping to the step 3.
3. The method according to claim 1, wherein in step 7, the most out-of-range virtual network element in the undeployed part of the virtual network is taken as the virtual network elementIs a center in whichUpper label ofIThe total number of the virtual network elements is subjected to block expansion, breadth-first search is adopted, and a virtual network element set in the virtual network blocks is gradually constructedAnd skipping to step 8.
4. A method according to claim 1, characterized in that in step 9: updating the action cost function in the current action cost function table according to the following formula:
wherein the prize is awardedRepresenting the action taken in the current state sMakingThe short-term benefit is obtained by the method,representing all optional actions in the current state sMaximum long term benefit obtainable in (1)Indicates that the action is selectedAfter that, a jump is made to a new state, max denotes taking the maximum value,represents the summation of the short-term benefit and the long-term benefit, and is the subsequent maximum benefit which can be obtained in the current state, whereinFor discount rate, representing long-term benefitThe influence rate of the benefit in the current state is closer to 1, which means that the long-term benefit is emphasized more, and conversely, the short-term benefit is emphasized more,indicates this iteration to select a new actionWith prime moverA return gain formed therebetween, whereinThe learning rate represents the speed of reinforcement learning, and the closer to 1 represents the faster learning, and the slower learning is vice versa; the whole formulaThe representative continuously updates each action taken in each state s by iteratively calculating the return gainLong term benefits that can be obtainedThereby enabling the system to autonomously select the optimal action by learning.
6. A virtual network hierarchical distributed deployment system based on reinforcement learning is characterized by comprising:
the action value function table building module: for each physical hostEstablishing an independent action value function table, wherein corresponding action value functions are arranged in cellsAnd is initialized to 0, wherein,represents a physical host machine, the superscript p represents physical, the subscript r represents the number of the physical host machine, and the value range isR is the total number of physical host machines; s represents a state (state) in reinforcement learning,representing actions in reinforcement learning, action cost functionActing on a particular state s in a representation reinforcement studyLong term benefits of;
the virtual network deployment request processing module: the system comprises a physical host searching module, a resource allocation module and a resource allocation module, wherein the physical host searching module is used for sending a signal to control the resource allocation module to work when a new virtual network allocation request arrives;
the physical host search module with the largest resource supply connected with the virtual network deployment request processing module: for observing resource supply according to physical host machineFinding the physical host with the largest resource supply(ii) a Wherein,representing at time t, by a physical hostThe multi-dimensional resources that are provided,supplying the number of the largest physical host to the resource;
and a first judgment module of the searching module of the physical host with the largest resource supply: a physical host for determining that the resource supply is maximumWhether or not the virtual network can be accommodated,
if the module can be accommodated, the direct deployment module is controlled to work,
if the block can not be accommodated, controlling the block deployment module to work;
the direct deployment module is connected with the first judgment module: for direct deployment of virtual networks, setting current actions asDeploying, and sending a signal to control the calculation module to start working;
the dicing deployment module is connected with the first judgment module: for selecting an action according to an action cost function if the action isDeploying, and sending a signal to control the computing module to start working, if the action isExpanding and sending a signal to control a virtual network element set building module to start working;
a virtual network element set constructing module connected with the block deployment module: for maximizing virtual network element in undeployed part of virtual networkAs the center, the expansion of the blocks is carried out, and the virtual network element set in the virtual network blocks is gradually constructedSending a signal to control the calculation module to start working;
the computing module is connected with the direct deployment module and the virtual network element set constructing module: for calculating a prize according to the formula:
In the formula,is composed ofPhysical host with maximum resource supply at any momentThe number of deployed virtual network tiles,for cutting blocks from virtual networksAggregate of multidimensional resources consumedAnd,is the largest physical hostObservations of resource provisioning;
the updating module is connected with the action value function table building module and comprises: for according to the rewardUpdating action cost function in current action cost function table:
And the second judgment module is connected with the action value function table construction module: for judging whether the current action isDeploying, if so, sending a signal to control a deployment processing module to start working; if not, sending a signal to control the physical host search module with the maximum resource supply to start working;
a deployment processing module connected to the second determination module: method for deploying virtual network elements in current virtual networkOr virtual network element set in virtual network tilesTo the currently selected physical hostUpdating the state S of the physical host according to the attribute value;
a third judgment module connected with the deployment processing module, the virtual network deployment request processing module and the physical host search module with the maximum resource supply: judging whether the virtual network is completely deployed, if so, sending a signal to control the virtual network deployment request processing module to work; if not, sending a signal to control the physical host search module with the maximum resource supply to start working.
8. The reinforcement learning-based virtual network hierarchical distributed deployment system according to claim 6, wherein the updating module is configured to update the action cost function in the current action cost function table according to the following formula:
The formula is expressed as:
wherein the prize is awardedActing on behalf of the current state sThe short-term benefit is obtained by the method,representing all optional actions in the current state sMaximum long term benefit obtainable in (1)Indicates that the action is selectedAfter that, a jump is made to a new state, max denotes taking the maximum value,represents the summation of the short-term benefit and the long-term benefit, and is the subsequent maximum benefit which can be obtained in the current state, whereinFor discount rate, representing long-term benefitThe influence rate of the benefit in the current state is closer to 1, which means that the long-term benefit is emphasized more, and conversely, the short-term benefit is emphasized more,indicates this iteration to select a new actionWith prime moverA return gain formed therebetween, whereinThe learning rate represents the speed of reinforcement learning, and the closer to 1 represents the faster learning, and the slower learning is vice versa; the whole formulaThe representative continuously updates each action taken in each state s by iteratively calculating the return gainLong term benefits that can be obtainedThereby enabling the system to autonomously select the optimal action by learning.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111195085.XA CN113641462B (en) | 2021-10-14 | 2021-10-14 | Virtual network hierarchical distributed deployment method and system based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111195085.XA CN113641462B (en) | 2021-10-14 | 2021-10-14 | Virtual network hierarchical distributed deployment method and system based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113641462A CN113641462A (en) | 2021-11-12 |
CN113641462B true CN113641462B (en) | 2021-12-21 |
Family
ID=78426774
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111195085.XA Active CN113641462B (en) | 2021-10-14 | 2021-10-14 | Virtual network hierarchical distributed deployment method and system based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113641462B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114827783B (en) * | 2022-07-01 | 2022-10-14 | 西南民族大学 | Aggregation tree-based bandwidth scheduling method for cross-domain distributed machine learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106411749A (en) * | 2016-10-12 | 2017-02-15 | 国网江苏省电力公司苏州供电公司 | Path selection method for software defined network based on Q learning |
CN109947567A (en) * | 2019-03-14 | 2019-06-28 | 深圳先进技术研究院 | A kind of multiple agent intensified learning dispatching method, system and electronic equipment |
CN110022230A (en) * | 2019-03-14 | 2019-07-16 | 北京邮电大学 | The parallel dispositions method of service chaining and device based on deeply study |
CN110365514A (en) * | 2019-05-24 | 2019-10-22 | 北京邮电大学 | SDN multistage mapping method of virtual network and device based on intensified learning |
CN110365568A (en) * | 2019-06-18 | 2019-10-22 | 西安交通大学 | A kind of mapping method of virtual network based on deeply study |
CN110995619A (en) * | 2019-10-17 | 2020-04-10 | 北京邮电大学 | Service quality aware virtual network mapping method and device |
CN111147307A (en) * | 2019-12-30 | 2020-05-12 | 重庆邮电大学 | Service function chain reliable deployment method based on deep reinforcement learning |
JP2020127182A (en) * | 2019-02-06 | 2020-08-20 | 日本電信電話株式会社 | Control device, control method, and program |
-
2021
- 2021-10-14 CN CN202111195085.XA patent/CN113641462B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106411749A (en) * | 2016-10-12 | 2017-02-15 | 国网江苏省电力公司苏州供电公司 | Path selection method for software defined network based on Q learning |
JP2020127182A (en) * | 2019-02-06 | 2020-08-20 | 日本電信電話株式会社 | Control device, control method, and program |
CN109947567A (en) * | 2019-03-14 | 2019-06-28 | 深圳先进技术研究院 | A kind of multiple agent intensified learning dispatching method, system and electronic equipment |
CN110022230A (en) * | 2019-03-14 | 2019-07-16 | 北京邮电大学 | The parallel dispositions method of service chaining and device based on deeply study |
CN110365514A (en) * | 2019-05-24 | 2019-10-22 | 北京邮电大学 | SDN multistage mapping method of virtual network and device based on intensified learning |
CN110365568A (en) * | 2019-06-18 | 2019-10-22 | 西安交通大学 | A kind of mapping method of virtual network based on deeply study |
CN110995619A (en) * | 2019-10-17 | 2020-04-10 | 北京邮电大学 | Service quality aware virtual network mapping method and device |
CN111147307A (en) * | 2019-12-30 | 2020-05-12 | 重庆邮电大学 | Service function chain reliable deployment method based on deep reinforcement learning |
Non-Patent Citations (6)
Title |
---|
Low-Latency and Resource-Efficient Service Function Chaining Orchestration in Network Function Virtualization;Gang Sun等;《 IEEE Internet of Things Journal》;20190823;第7卷(第7期);第5760-5772页 * |
MUVINE: Multi-Stage Virtual Network Embedding in Cloud Data Centers Using Reinforcement Learning-Based Predictions;Hiren Kumar Thakkar等;《 IEEE Journal on Selected Areas in Communications》;20200408;第38卷(第6期);第1058-1074页 * |
Optimizing NFV Chain Deployment in Software-Defined Cellular Core;Jiaqi Zheng等;《IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS》;20191231;第38卷(第2期);第248-262页 * |
基于强化学习和QoS感知的虚拟网络映射算法设计与实现;李蒙;《中国优秀硕士学位论文全文数据库 信息科技辑》;20210515;I139-8 * |
基于改进深度强化学习的虚拟网络功能部署优化算法;唐伦等;《电子与信息学报》;20210615;第43卷(第6期);第1724-1732页 * |
基于时变资源的容器化虚拟网络映射算法;邓伟健等;《https://kns.cnki.net/kcms/detail/51.1307.TP.20210628.1335.017.html》;20210628;第1-9页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113641462A (en) | 2021-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111147307B (en) | Service function chain reliable deployment method based on deep reinforcement learning | |
US10025892B2 (en) | Simulation systems and methods | |
US10878146B2 (en) | Handover techniques for simulation systems and methods | |
CN112600717B (en) | Satellite network management and control protocol semi-physical test device based on SDN | |
EP2629490A1 (en) | Optimizing traffic load in a communications network | |
CN114050961B (en) | Large-scale network simulation system and resource dynamic scheduling and distributing method | |
CN113341712B (en) | Intelligent hierarchical control selection method for unmanned aerial vehicle autonomous control system | |
CN105515987A (en) | SDN framework based virtual optical network oriented mapping method | |
CN113641462B (en) | Virtual network hierarchical distributed deployment method and system based on reinforcement learning | |
Lu et al. | A cluster-tree-based energy-efficient routing protocol for wireless sensor networks with a mobile sink | |
Bouzidi et al. | Dynamic clustering of software defined network switches and controller placement using deep reinforcement learning | |
WO2024077881A1 (en) | Scheduling method and system for neural network training, and computer-readable storage medium | |
CN115329985B (en) | Unmanned cluster intelligent model training method and device and electronic equipment | |
WO2023179180A1 (en) | Network virtualization system structure and virtualization method | |
WO2023089350A1 (en) | An architecture for a self-adaptive computation management in edge cloud | |
CN117707795B (en) | Graph-based model partitioning side collaborative reasoning method and system | |
Afrasiabi et al. | Reinforcement learning-based optimization framework for application component migration in NFV cloud-fog environments | |
Tyagi et al. | GM-WOA: a hybrid energy efficient cluster routing technique for SDN-enabled WSNs | |
CN115879543A (en) | Model training method, device, equipment, medium and system | |
CN106709597A (en) | Parallel TSP problem optimizing method and device based on artificial bee colony algorithm | |
Shooshtarian et al. | A maximally robustness embedding algorithm in virtual data centers with multi-attribute node ranking based on TOPSIS | |
Adewale | Adaptive and Scalable Controller Placement in Software-Defined Networking | |
CN112234599A (en) | Advanced dynamic self-adaptive partitioning method and system for multi-element complex urban power grid | |
Sun et al. | Wireless sensor network path optimization based on hybrid algorithm | |
Steffenel et al. | A framework for adaptive collective communications for heterogeneous hierarchical computing systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |