CN114355775A

CN114355775A - Multi-controller deployment method and system based on SDN (software defined network) and deep reinforcement learning

Info

Publication number: CN114355775A
Application number: CN202111641069.9A
Authority: CN
Inventors: 尤龙; 陈佳; 王冲; 王夏菁; 廖晨茜; 刘上; 王艳广
Original assignee: Aerospace Science And Technology Network Information Development Co ltd
Current assignee: Aerospace Science And Technology Network Information Development Co ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-04-15

Abstract

The invention provides a multi-controller deployment method based on an SDN network and deep reinforcement learning, which comprises the following steps: acquiring a first performance optimization index of multi-controller deployment according to an SDN network structure; the first performance optimization indicator comprises: average propagation delay between the switch and the controller, average propagation delay between the controllers, synchronization overhead between the controllers, and minimum security of synchronization between the controllers; establishing a first objective function according to the first performance optimization index; acquiring a first constraint condition of multi-controller deployment; the first constraint condition comprises controller load constraint, mapping relation constraint of the switch and the controller and control layer synchronous link bandwidth constraint; constructing a multi-controller deployment model according to the first objective function and the first constraint condition; and solving the multi-controller deployment model by utilizing a deep reinforcement learning algorithm based on a Markov decision model to obtain a multi-controller deployment scheme.

Description

Multi-controller deployment method and system based on SDN (software defined network) and deep reinforcement learning

Technical Field

The invention relates to the technical field of controller deployment, in particular to a multi-controller deployment method and a multi-controller deployment system based on an SDN network and deep reinforcement learning.

Background

With the increase of network flow and the continuous expansion of network scale, the inherent defects of a single controller, such as single point failure, limited controller resources and the like, become increasingly prominent, which will increase the communication consumption of a control link. In addition, if a plurality of controllers are used to manage the network, network congestion or paralysis may occur when unreasonable deployment of the plurality of controllers meets the network service requirements, which may greatly affect the scalability of the SDN network, and thus the deployment problem of the plurality of controllers is particularly important. Deployment of the controller may also have a significant impact on the performance, reliability, and network cost of the SDN network. Therefore, the invention provides a multi-controller deployment method and system based on an SDN network and deep reinforcement learning.

Disclosure of Invention

The invention aims to provide a multi-controller deployment method and a multi-controller deployment system based on an SDN (software defined network) network and deep reinforcement learning.

In order to achieve the purpose, the invention provides the following scheme:

a multi-controller deployment method based on an SDN network and deep reinforcement learning comprises the following steps:

acquiring a first performance optimization index of multi-controller deployment according to an SDN network structure; the first performance optimization indicator comprises: average propagation delay between the switch and the controller, average propagation delay between the controllers, synchronization overhead between the controllers, and minimum security of synchronization between the controllers;

establishing a first objective function according to the first performance optimization index;

acquiring a first constraint condition of multi-controller deployment; the first constraint condition comprises controller load constraint, mapping relation constraint of the switch and the controller and control layer synchronous link bandwidth constraint;

constructing a multi-controller deployment model according to the first objective function and the first constraint condition;

and solving the multi-controller deployment model by utilizing a deep reinforcement learning algorithm based on a Markov decision model to obtain a multi-controller deployment scheme.

A system based on an SDN network and a deep reinforcement learning multi-controller deployment method comprises the following steps:

the first performance optimization index acquisition module is used for acquiring a first performance optimization index deployed by the multiple controllers according to the SDN network structure; the first performance optimization indicator comprises: average propagation delay between the switch and the controller, average propagation delay between the controllers, synchronization overhead between the controllers, and minimum security of synchronization between the controllers;

the first objective function establishing module is used for establishing a first objective function according to the first performance optimization index;

the first constraint condition acquisition module is used for acquiring a first constraint condition of multi-controller deployment; the first constraint condition comprises controller load constraint, mapping relation constraint of the switch and the controller and control layer synchronous link bandwidth constraint;

the multi-controller deployment model building module is used for building a multi-controller deployment model according to the first objective function and the first constraint condition;

and the solving module is used for solving the multi-controller deployment model by utilizing a deep reinforcement learning algorithm based on a Markov decision model to obtain a multi-controller deployment scheme.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention provides a multi-controller deployment method and a multi-controller deployment system based on an SDN network and deep reinforcement learning, wherein the method comprises the following steps: acquiring a first performance optimization index of multi-controller deployment according to an SDN network structure; the first performance optimization indicator comprises: average propagation delay between the switch and the controller, average propagation delay between the controllers, synchronization overhead between the controllers, and minimum security of synchronization between the controllers; establishing a first objective function according to the first performance optimization index; acquiring a first constraint condition of multi-controller deployment; the first constraint condition comprises controller load constraint, mapping relation constraint of the switch and the controller and control layer synchronous link bandwidth constraint; constructing a multi-controller deployment model according to the first objective function and the first constraint condition; and solving the multi-controller deployment model by utilizing a deep reinforcement learning algorithm based on a Markov decision model to obtain a multi-controller deployment scheme. Aiming at special application scenes such as a complex SDN (software defined network) network, a battlefield and the like, a multi-controller deployment mechanism is provided to reduce time delay, improve network performance and avoid the problem of control node breakdown caused by frequent service flow issuing of control nodes. And the field format of the synchronous data packet of the control layer is flexibly designed according to the interactive information of the control layer in the special application scene. The invention establishes an optimized deployment model of the cluster, so that when one controller in the network is damaged and stops working, other controllers can take over all nodes under the control of the fault controller, thereby ensuring that the communication of all nodes is not interrupted and enhancing the survivability of the control nodes. Meanwhile, a deep reinforcement learning algorithm is applied to establish a reliable and stable data transmission channel, so that accurate management of the network is realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart of a multi-controller deployment method based on an SDN network and deep reinforcement learning according to embodiment 1 of the present invention;

fig. 2 is a flowchart of synchronization between controllers according to embodiment 1 of the present invention;

fig. 3 is a diagram of a neural network structure provided in embodiment 1 of the present invention;

fig. 4 is a block diagram of a multi-controller deployment system based on an SDN network and deep reinforcement learning according to embodiment 2 of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Models established in the prior art mostly adopt single-target optimization as a main part, and the multi-target optimization model takes time delay between a switch and a controller and time delay between controllers as optimization indexes and takes controller load as a constraint condition. The existing model which takes the overhead of synchronous information between controllers as an optimization index is less, and although the method is simple to implement, the optimization effect is poor. Meanwhile, most deployment schemes do not consider the optimized deployment of the cluster, the existing technology for deployment of the ONOMIs assisted by Atomix is less, and the optimization is not carried out aiming at the deployment mode that Atomix nodes are physically separated from the ONOS controller. The multi-controller deployment problem is an NP-hard problem and computationally very time consuming. The existing solving algorithms mainly comprise an integer linear programming algorithm, a heuristic algorithm and the like, and the algorithms have the problems of high complexity, poor expandability, easy falling into local optimization and the like. Therefore, the distributed coordination framework Atomix is adopted to assist the ONOS controller to establish the cluster in the research of the invention. The invention provides a method for establishing a full-network control layer by adopting reasonable deployment of distributed controllers under complex network environments such as SDN scenes, variable battlefields and the like. The invention aims at the design of the control layer synchronous data packet format and reasonably and efficiently realizes the deployment of the distributed controller under the constraint of the control layer synchronous overhead. The design scheme mainly comprises three aspects of control layer synchronous message design, establishment of a flat multi-controller deployment model and establishment of a cluster deployment optimization model of a distributed coordination framework Atomix.

Example 1

As shown in fig. 1, the present embodiment provides a multi-controller deployment method based on an SDN network and deep reinforcement learning, including:

s1: acquiring a first performance optimization index of multi-controller deployment according to an SDN network structure; the first performance optimization indicator comprises: average propagation delay between the switch and the controller, average propagation delay between the controllers, synchronization overhead between the controllers, and minimum security of synchronization between the controllers;

specifically, the expression of the average propagation delay between the switch and the controller is as follows:

wherein N is the number of switches, d_ijIs the shortest link delay, x, between the switch and the controller_ijIs binary number, and when the value is 1, the successful connection between the switch i and the controller j is represented; s_iFor switch i, S is a set of switches, c_jIs controller j and C is a set of controllers.

The expression of the average propagation delay between the controllers is as follows:

c_kk, the number of controllers,

the shortest link delay between controllers;

the expression of the synchronization overhead among the controllers is as follows:

wherein l_jkFor controller j packet length, p, synchronized between controllers k_sjkIs the frequency of synchronization between controller j and controller k.

S2: establishing a first objective function according to the first performance optimization index;

specifically, the first objective function is:

min imize(αT_scavg+βT_ccavg+ρC_cc)+μ·K

wherein, T_scavgRepresenting the average propagation delay between the switch and the controller; t is_ccavgAverage propagation delay among controllers; inter-controller synchronization overhead; mu is a safety factor for realizing the minimum safety of the synchronization between the controllers; α, β, ρ are each first performance optimization index weight, and α + β + ρ is 1.

S3: acquiring a first constraint condition of multi-controller deployment; the first constraint condition comprises controller load constraint, mapping relation constraint of the switch and the controller and control layer synchronous link bandwidth constraint;

specifically, the controller load constraint is:

wherein L is_optThe desired optimal controller load number; l is_cIs the actual controller load number; l is_jFor the load number of controller j, LBI is defined as the load bias for measuring the load number difference among all controllers in the networkA difference index; the delta LB is the difference value of the number of the exchangers managed by each controller and is defined as the average value of the load difference; LB_CA threshold of Δ LB;

the mapping relation constraint of the switch and the controller is as follows:

the control layer synchronous link bandwidth constraint is as follows:

wherein, B_WAnd provides bandwidth for controlling layer physical links.

S4: constructing a multi-controller deployment model according to the first objective function and the first constraint condition;

s5: and solving the multi-controller deployment model by utilizing a deep reinforcement learning algorithm based on a Markov decision model to obtain a multi-controller deployment scheme.

Wherein, step S5 specifically includes:

acquiring a state space of the Markov decision model according to the controller placement condition and the network topology information of each node in the network at the current time t;

acquiring an action space of the Markov decision model according to the number of controllers required to be deployed in a network, the positions of nodes deployed by the controllers and a mapping relation between a switch and the controllers;

acquiring the state transition probability of the Markov decision model according to the probability of transition to the next state after executing a certain action in the current state;

acquiring a reward function of the Markov decision model according to the average communication time delay between the switch and the controller, the average communication time delay between the controllers, the synchronous overhead between the controllers and the minimum safety of the controllers;

obtaining a multi-controller deployment scenario based on the state space, the action space, the state transition probability, and the reward function of the Markov decision.

As another optional implementation, on the basis of the multiple controller deployment model, an Atomix node is embedded, and optimized deployment is performed on the Atomix node, specifically including:

(1) obtaining a second performance optimization index of Atomix node deployment; the second performance indicator comprises an average synchronization delay between the Atomix nodes, an average synchronization delay between the Atomix nodes and the controller node

Specifically, the average synchronization delay between the Atomix nodes is:

wherein A is the number of Atomix nodes,

is the shortest link delay between Atomix nodes, a_j,a_k,a_iAtomix nodes i, j, k;

the average synchronization delay between the Atomix node and the controller node is: ,

is the shortest link delay, z, between the Atomix node and the controller node_ijA binary number, with a value of 1, indicates a successful connection of Atomix node i with controller j.

(2) Constructing a second objective function according to the second performance optimization index;

specifically, the second objective function is:

min imizeρ₁T_aaavg+ρ₂T_acavg，ρ₁，ρ₂setting a weight value rho for each second performance optimization index₁+ρ₂＝1。

(3) Acquiring a second constraint condition of Atomix node deployment; the second constraint condition is that the number of Atomix nodes deployed in the network is B, and at least one mapping relation exists between the Atomix nodes and the controller nodes;

specifically, at least one mapping relationship exists between the Atomix node and the controller node, and is as follows:

(4) constructing an Atomix node deployment model according to the first objective function and the first constraint condition;

(5) and solving the Atomix node deployment model by using a depth reinforcement learning algorithm.

The solution method may refer to solving a portion of the multi-controller deployment model.

In the embodiment, based on network environments such as a variable complex SDN scene and an actual battlefield, a multi-controller deployment mechanism based on an SDN network and deep reinforcement learning is provided, a plurality of controllers are efficiently and reasonably placed in a network, cooperative management of the plurality of controllers on the whole network is realized, network delay and bandwidth overhead in a special application scene is reduced, the utilization rate of network resources is improved, loads under the controllers are balanced, and the robustness of the network is enhanced, so that the network failure probability caused by the collapse of the controllers is reduced. And comprehensively considering the performances of network delay, overhead, load and the like, establishing a control layer management method under a special application scene, and realizing the distributed management of each node in the network. In conclusion, the innovation points of the invention are as follows:

(1) and establishing a multi-controller deployment model. And comprehensively selecting a plurality of network performance optimization indexes according to the network requirements in the special application scene. In order to enable the control layer to carry out cooperative management on the network, normal operation of the network can not be influenced when one controller fails, except for considering time delay and load, synchronous overhead among the controllers, minimum safety of the controllers and bandwidth constraint of a control layer link are used as network performance optimization indexes, factors influencing deployment in multiple aspects are considered, so that special requirements under special application scenes are met, and the controllers are reasonably deployed.

And designing a multi-controller deployment model. Selecting and defining network performance optimization indexes to be considered when deploying the controller, obtaining an objective function by analyzing the relation among the optimization indexes and the deployment problem to be solved by the model, completing the establishment of the model, and solving the problem by a deep reinforcement learning algorithm.

(2) Aiming at the optimization of the synchronous overhead of the controllers in the multi-controller deployment model, the invention designs the synchronous data packet field format among the controllers according to the synchronous requirements of the controllers in a special application scene, thereby optimizing the network performance index of the controller deployment model in a targeted manner and establishing a more accurate and more flexible deployment model.

And designing synchronous data packet information of a control layer, and designing information needing to be synchronized among all controllers according to the requirements of different application scenes so as to provide a flexible optimization target for a controller deployment scheme.

(3) And aiming at the control layer cooperative management, establishing an optimized deployment model of the controller cluster. The network control layer established based on the ONOS controller needs to utilize a distributed coordination framework Atomix, and the invention provides an Atomix node deployment model to optimize the deployment quantity and the deployment position of Atomix.

Aiming at the fact that the version of the existing ONOS controller can be physically separated from the Atomix nodes, cluster deployment is more flexible, and an Atomix node deployment optimization scheme is designed to improve network synchronization performance.

(4) Aiming at the NP-Hard problem of multi-controller deployment, the method solves the model by using a deep reinforcement learning algorithm. When an actual network application scene is deployed, the distance between each node is long, and an optimal result needs to be calculated within a limited time. The deep reinforcement learning algorithm is low in complexity, and integrates historical network data learning into controller deployment and switch controller mapping decisions so as to adapt to a network environment.

Aiming at the established model, a deep reinforcement learning algorithm is applied to the solution of the deployment model, so that the result of the method is more efficient and reasonable.

In order to make the solution of the present embodiment more clearly understood, the following detailed description is provided:

control layer synchronous message design based on ONOS controller

Due to the complexity and the changeability of application scenes, the invention needs to establish a large-scale complex SDN network platform. Meanwhile, in a complex network environment, a controller needs to frequently issue control information such as a flow table and a service flow, however, a single controller is difficult to bear functions such as control information issuing of all nodes in a network, and therefore conflict of control information issuing may be caused, and collapse of control nodes is caused. Therefore, to avoid a single point of failure, increasing the reaction speed and overall performance of the network requires deployment of multiple controllers across the network for network management.

In the mechanism of the present invention, the entire network is divided into a plurality of sub-domains, and each controller controls one area in the network. Because each controller needs to undertake the task of calculating the policy for the switches controlled by the controller, the controller in each domain needs to master not only the switch topology relationship in the control range of the controller, but also the switch topology relationship in the control range of other controllers. Therefore, a synchronous communication mechanism needs to be established between the domain controllers to synchronize the topology information of the domains with each other. Meanwhile, in order to avoid the out-of-control situation of the switch caused by the failure of the controller, when the corresponding controller of the switch fails, other non-failure controllers can still quickly and timely send control information to the out-of-control switch after taking over the out-of-control switch, and information such as calculated strategies and the like needs to be synchronized among all domain controllers.

Therefore, two controllers are taken as an example to describe the synchronization process between the controllers, as shown in fig. 2, the process is divided into three steps:

each domain controller collects the topology information of the data layer;

the controllers establish connection through a TCP three-way handshake protocol;

the controller A and the controller B respectively calculate strategies;

the controllers send synchronous data packets to each other for information synchronization;

after the synchronization is completed, the controller A and the controller B mutually send an ACK message to terminate the synchronization.

Aiming at the step II, the invention designs the format of the synchronous data packet. In the invention, the synchronous information among the controllers comprises topology information, flow table information and specific service flow information. The three sync packet field formats are shown in table 1, table 2, and table 3.

Table 1 topology information synchronization packet field format

Table 2 flow table information synchronization packet field format

Table 3 synchronous data packet field format for specific service flow information

(II) establishment of flat multi-controller deployment model based on distribution

The deployment of multiple controllers needs to address three key issues:

given a network topology, calculating the number of controllers to be deployed;

determining that the deployment controller is located at the most reasonable location;

each switch should be managed by which controller.

Therefore, starting from the above three key problems, the invention establishes a deployment model of the controller according to a large-scale SDN and a complex network application scenario.

Physical network: the network topology is made up of an undirected graph G (V, E), where V represents a set of switches and E represents a set of physical links. K denotes the number of controllers in the network, C ═ C₁,...,c_kDenotes a set of controllers, each controller being placed at the position of a switch in the network, in this case p, in the present invention_θIndicating the deployment location of the controller. By using

Is represented by a controller theta_iManaged switches, the mapping relationship between the switches and the controllers can be expressed as a set

The average propagation delay between the switch and the controller and the average propagation delay between the controllers are related to the deployment positions of the controllers, and the network performance optimization indexes are selected based on the average propagation delay and the average propagation delay between the switches and the controllers as follows:

average propagation delay between the switch and the controller: representing the average of the propagation delay between the switch and the controller. As shown in (formula 1). Where N ═ V | is the number of switches, d_ijIs the shortest link delay, x, between the switch and the controller_ijIs a binary number and a value of 1 indicates a successful connection of switch i to controller j.

Average propagation delay between controllers: representing the average value of the propagation delay between the controllers. As shown in (equation 2). Wherein K is the number of the controllers,

is the shortest link delay between the controllers.

③ synchronization overhead between controllers: representing the communication overhead that occurs when synchronization is performed between controllers. As shown in (equation 3). The data packet format and synchronization between it and the controllerThe step frequency is related. Wherein l_jkPacket length, p, for synchronization between controller j and controller k_sjkIs the frequency of synchronization between controller j and controller k.

And fourthly, the minimum safety of the synchronization between the controllers is as follows:

because the controller and other controllers can obtain the topology information of the whole network through communication, in order to reduce the probability of information leakage when the network is attacked and enhance the security of the network, the number of deployed controllers needs to be minimized. In the model, a safety factor mu is set to restrict the number of deployment controllers.

Aiming at the description of the optimization indexes selected by the model, the multi-controller deployment model based on time delay and synchronous overhead is established in the invention. The deployment of the controller is realized by comprehensively considering the time delay and the synchronization overhead condition and combining the constraint conditions of the load of the controller, the minimum safety of synchronization and the like. The optimization objective is shown in (equation 4).

min imize(αT_scavg+βT_ccavg+ρC_cc)+μ·K (4)

In the model, the average propagation delay between the switch and the controller and the average propagation delay between the controllers have a contradiction relationship. That is, the two network performance optimization indicators are constrained with each other, and in order to minimize the average propagation delay between the switches and the controllers, the controllers are disposed at positions that are close to the switches, so that the distance between the controllers is increased, and the average propagation delay between the controllers is increased. And vice versa. Under such a relationship, there is no solution that optimizes all performance optimization indexes, and generally, if one optimization index is improved, the performance of the other optimization indexes will be sacrificed. Therefore, in the present invention, a weight is set for each network performance optimization index, where α + β + ρ ═ 1. According to the specific application scene, an efficient and flexible deployment model is established for the emphasis points of each optimization index.

The model needs to satisfy the following constraint conditions:

controller load constraint:

in the controller deployment process, the model needs to meet the load limit of the controller, that is, the number of switches managed by each controller cannot exceed a specific threshold, the difference value of the number of switches managed by each controller is defined as a load difference average value Δ LB, as shown in (equation 7), and the value cannot exceed a specified threshold LB_C。L_optThe desired optimal controller load amount. The difference of the load quantity among all the controllers in the network is defined as a load deviation degree index LBI, as shown in (formula 6), the smaller the index is, the better the network load balancing performance after the controllers are reasonably deployed is.

The mapping relation between the switch and the controller is as follows:

in the model, the number of the controllers deployed in the network is K, and there is only one mapping relationship between the switches and the controllers, and meanwhile, if it is required to ensure that each switch has a controller to which it belongs, the mapping relationship needs to satisfy the following expression (formula 8).

x_sc≤y_c (8)

Controlling layer synchronous link bandwidth constraint: deployed controllers in nodes

In a specific application environment such as a large-scale SDN or a complex network, the bandwidth requirement of synchronous information between controllers cannot exceed the bandwidth resource provided by a control layer physical link, and the bandwidth provided by the control layer physical link is set as B_WThe bandwidth constraint is as shown in (equation 9).

(III) Cluster deployment optimization model based on distributed coordination framework Atomix

Because the invention adopts the ONOS controller, the cluster management under the environment of the ONOS controller needs to adopt a distributed coordination framework Atomix which physically separates the functions of cluster management, service discovery, persistent data storage and the like from the ONOS node. With knowledge of ATOMIX, before deploying the ONOS controller cluster, one ATOMIX cluster must first be formed for data storage and coordination, and then the ONOS nodes are configured with a list of ATOMIX nodes to be connected. Meanwhile, in the ONOS controller of the past version, Atomix nodes need to be embedded into the ONOS controller to form a cluster and synchronize states. Whereas in the current version of the ONOS controller, functions like synchronization status are moved to a separate Atomix cluster. According to the learning of the Atomix framework, the Atomix framework can be deployed on a non-control node and can also be embedded into a control node. However, the deployment of Atomix nodes is scarce according to the reference of related data. Therefore, it is also important to effectively select the number and the positions of the Atomix nodes in the large-scale SDN. Therefore, on the basis that the Atomix nodes of the first step and the second step are embedded into the ONOS controller to establish the model, the Atomix nodes are optimally deployed to establish the model for determining the deployment number and the deployment positions of the Atomix nodes, and the synchronization of the state information of the whole network is realized.

In the model for optimizing the deployment of the Atomix nodes, the strong consistency between the Atomix nodes and the ONOS nodes and between the Atomix nodes is maintained due to the Raft protocol, and the performance optimization indexes are selected and defined in the invention.

Average synchronization delay between Atomix nodes: and representing the average value of the propagation delay generated by the synchronization information between the Atomix nodes. As shown in (equation 10). Wherein A is the number of Atomix nodes,

is the shortest link delay between Atomix nodes.

Average synchronization delay between Atomix nodes and ONOS nodes: represents the average value of the propagation delay generated by the synchronization information between the Atomix node and the ONOS node. As shown in (equation 11). Wherein K is the number of the ONOS controllers,

is the shortest link delay, z, between the Atomix node and the ONOS node_ijA binary number, with a value of 1, indicates a successful connection of Atomix node i with the ONOS controller j.

Aiming at the description of the optimization indexes selected by the model, the distributed coordination framework Atomix optimized deployment model based on time delay is established in the invention. The optimization objective is shown in (equation 12).

min imizeρ₁T_aaavg+ρ₂T_acavg (12)

In the model, the average synchronization delay between the Atomix nodes and the ONOS nodes are mutually restricted. In order to reduce the average synchronization delay between the Atomix nodes, the deployment of the Atomix nodes is more concentrated, and thus the average synchronization delay between the Atomix nodes and the ONOS nodes is increased. And vice versa. Therefore, in the present invention, in order to balance the relationship between the two, by setting weights where ρ is₁+ρ₂And (1) establishing an efficient and flexible Atomix node deployment model according to a specific application scene.

The constraint conditions to be met by the model are as follows:

in the model, the number of Atomix nodes deployed in the network is B, and at least one mapping relationship exists between the Atomix nodes and the ONOS nodes, so that the mapping relationship needs to satisfy the condition shown in (formula 13).

Adaptive multi-controller deployment algorithm framework design

Aiming at the established multi-controller deployment model, the invention provides a self-adaptive multi-controller deployment algorithm based on deep reinforcement learning, and aims to realize the deployment of the controllers more efficiently and accurately. And converting the controller deployment problem into an MDP model for solving. Setting the state space, the action space, the state transition probability and the reward function as a quadruple (S, A, P, R). The definition is as follows:

(1) state space S

In the invention, the state space can be expressed as the controller placement condition and network topology information of each node in the network at the current time t. Is represented as follows:

wherein, the meaning of each element is as follows:

representing physical network topology information at time t, including t_t，c_t。

t_t: representing the time delay of each link at t.

c_t: representing the size of the synchronization overhead for each control link at time t.

b_t: representing the load situation of the deployed control node at the time t.

f_t: representing the probability of each node failing at time t.

ω_t: representing control of nodes at time tThe placement of the device includes

Representing the number of deployments of the controller at time t.

Representing the deployment location of the controller at time t.

δ_t: representing the Atomix placement of each node at time t, including

Representing the number of deployments of Atomix at time t.

Representing the deployment location of Atomix at time t.

(2) Action space A

In the present invention, the action space a is expressed as the number of controllers that need to be deployed in the network, the location of which appropriate node the controller is deployed on, and the mapping relationship between the switch and the controller. Is represented as follows:

a_t＝(p_θ,S_C)，θ∈(1,2,...,K)

a′_t＝p_c,c∈(1,2,...,B)

wherein, the meaning of each element is as follows:

p_θ: representing the appropriate location at which the controller is deployed.

p_c: representing the location at which the Atomix is deployed.

S_C：

Representing a mapping between switches and controllers.

K: representing the number of controllers that need to be deployed in the network.

B: representing the number of Atomix needed to be deployed in the network.

(3) Probability of state transition P

In the present invention, the state transition probability represents the transition from the current state s_tPerform a certain action a_tPost-transition to the next state s_t+1The probability of (c). Expressed as follows:

wherein, the meaning of each element is as follows:

s_t: representing the current state.

s_t+1: representing the next state.

a_t: representing some action in the current state.

(4) Reward function R

In the invention, each time an action is executed, a reward value is generated according to a set reward function, the larger the reward is, the higher the value of the action is, and the better the performance can be obtained by the deployment of the controller. Therefore, the average communication delay between the switch and the controller, the average communication delay between the controllers, the synchronization overhead between the controllers, and the minimum security of the controllers are set as bonus functions. Is represented as follows:

r₁＝-((αT_1t+βT_2t+ρC_cc)+μ·K)

r₂＝-(ρ₁T_3t+ρ₂T_4t)

wherein, the meaning of each element is as follows:

α: and the proportion of the average communication delay between the switch and the controller in the reward penalty measures in the deployment process is represented.

Beta: representing the proportion of the average communication delay between controllers in the reward penalty measure during deployment.

ρ: representing the proportional amount of synchronization overhead between controllers in a reward penalty measure during deployment.

μ: representing a safety factor for the controller during deployment.

ρ₁: and the communication delay among the Atomix accounts for the proportion of the reward penalty measures in the deployment process.

ρ₂: and the communication delay between the Atomix and the controller in the deployment process accounts for the proportion of the reward penalty measures.

T_1t: representing the average communication delay between the switch and the controller during deployment.

T_2t: representing the average communication delay between controllers of the deployment process.

T_3t: representing the inter-Atomix communication delay during deployment.

T_4t: representing the communication delay between the Atomix and the controller during deployment.

C_cc: representing the synchronization overhead between the controllers of the deployment process.

K: representing the number of controllers to deploy the process.

In the invention, the importance of the link delay to the deployment is considered by the reward function, and different weights are distributed to the delay between the switch and the controller and the delay between the controllers according to the actual application scene; meanwhile, considering the synchronization overhead and the minimum security, the more the deployed result meets the four targets, the larger the reward value is.

In order to enable the model to select the optimal action in a certain network state, and simultaneously obtain the maximum accumulated reward value, so that the obtained deployment result is more accurate, the invention enables the intelligent agent to better sense the environmental information such as the network topology state and the like by establishing the neural network structure, thereby generating a better strategy through better interactive learning with the environment.

The state at each time is used as the input of the neural network, and the dimension of the input state determines the number of input layer neurons. The middle two layers of the neural network are fully connected, the output is the Q value of all possible actions executed in the input state, and the number of output neurons is determined by the size of an action set.

The DQN algorithm adopted by the invention is an off-line learning method, the parameters of the neural network are set, and each control node in the network is deployed by training the result of the output model. As shown in fig. 3, the structure of the neural network is given. The following is the design of the multi-controller deployment decision algorithm training flow and the design of the Atomix deployment decision algorithm training flow.

Example 2

As shown in fig. 4, the present embodiment provides a system based on an SDN network and a deep reinforcement learning multi-controller deployment method, including:

a first performance optimization index obtaining module M1, configured to obtain a first performance optimization index of a multi-controller deployment according to an SDN network structure; the first performance optimization indicator comprises: average propagation delay between the switch and the controller, average propagation delay between the controllers, synchronization overhead between the controllers, and minimum security of synchronization between the controllers;

a first objective function establishing module M2, configured to establish a first objective function according to the first performance optimization index;

a first constraint condition obtaining module M3, configured to obtain a first constraint condition for multi-controller deployment; the first constraint condition comprises controller load constraint, mapping relation constraint of the switch and the controller and control layer synchronous link bandwidth constraint;

a multi-controller deployment model building module M4, configured to build a multi-controller deployment model according to the first objective function and the first constraint condition;

and the solving module M5 is used for solving the multi-controller deployment model by utilizing a deep reinforcement learning algorithm based on a Markov decision model to obtain a multi-controller deployment scheme.

For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A multi-controller deployment method based on an SDN network and deep reinforcement learning is characterized by comprising the following steps:

2. The method of claim 1, wherein the average propagation delay between the switch and the controller is expressed by:

wherein N is the number of switches, d_ijIs the shortest link delay, x, between the switch and the controller_ijIs a binary number, with a value of 1 indicating a successful connection of switch i to controller j, s_iFor switch i, S is a set of switches, c_jIs controller j and C is a set of controllers.

c_kk, the number of controllers,

the shortest link delay between controllers;

3. The method of claim 2, wherein the first objective function is:

min imize(αT_scavg+βT_ccavg+ρC_cc)+μ·K

wherein, T_scavgRepresenting the average propagation delay between the switch and the controller; t is_ccavgAverage propagation delay among controllers;inter-controller synchronization overhead; mu is a safety factor for realizing the minimum safety of the synchronization between the controllers; α, β, ρ are each first performance optimization index weight, and α + β + ρ is 1.

4. The method of claim 1, wherein the first constraint is:

the controller load constraints are:

wherein L is_optThe desired optimal controller load number; l is_cIs the actual controller load number; l is_jFor the load number of the controller j, LBI is used for measuring the load number difference among all controllers in the network and is defined as a load deviation degree index; the delta LB is the difference value of the number of the exchangers managed by each controller and is defined as the average value of the load difference; LB_CA threshold of Δ LB;

the mapping relation constraint of the switch and the controller is as follows:

the control layer synchronous link bandwidth constraint is as follows:

wherein, B_WAnd provides bandwidth for controlling layer physical links.

5. The method according to claim 3, wherein solving the multi-controller deployment model using a deep reinforcement learning algorithm based on a Markov decision model specifically comprises:

acquiring a state space of the Markov decision model according to the controller placement condition and the network topology information of each node in the network at the current moment;

6. The method of claim 1, further comprising: embedding an Atomix node on the basis of the multi-controller deployment model, and performing optimized deployment on the Atomix node, wherein the method specifically comprises the following steps:

obtaining a second performance optimization index of Atomix node deployment; the second performance indicator comprises an average synchronization delay between the Atomix nodes, an average synchronization delay between the Atomix nodes and the controller node

Constructing a second objective function according to the second performance optimization index;

acquiring a second constraint condition of Atomix node deployment; the second constraint condition is that the number of Atomix nodes deployed in the network is B, and at least one mapping relation exists between the Atomix nodes and the controller nodes;

constructing an Atomix node deployment model according to the first objective function and the first constraint condition;

and solving the Atomix node deployment model by using a depth reinforcement learning algorithm.

7. The method of claim 6The average synchronization time delay between the Atomix nodes is as follows:

wherein A is the number of Atomix nodes,

8. The method of claim 7, wherein the second objective function is:

9. The method of claim 7, wherein at least one mapping relationship exists between the Atomix node and the controller node:

10. a system based on the SDN network and deep reinforcement learning multi-controller deployment method of any one of claims 1-9, comprising: