CN115834371A

CN115834371A - Space-ground converged network cross-domain SFC deployment method based on hybrid state synchronous DRL

Info

Publication number: CN115834371A
Application number: CN202211457628.5A
Authority: CN
Inventors: 武楠; 李浩阳; 张婷婷; 李彬
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-11-21
Filing date: 2022-11-21
Publication date: 2023-03-21
Anticipated expiration: 2042-11-21
Also published as: CN115834371B

Abstract

The invention discloses a cross-domain SFC deployment method of a space-ground fusion network based on hybrid state synchronization DRL (data logging language), which can effectively reduce huge expenses caused by real-time synchronization of the global network state of the space-ground fusion network through key parameter synchronization and is suitable for a satellite control node with limited resources. On the other hand, by virtue of the synchronization of the state information of the global network and the idea of digital twin, the virtual copy Actor-Critic network of the A3C has rich and sufficiently real sample sets, and the success rate of SFC deployment of the global Actor-Critic network and the real copy Actor-Critic network can be further improved; meanwhile, the actual copy Actor-Critic network of the A3C makes an optimal decision on cross-domain SFC deployment of the world fusion network based on real-time synchronous network key state parameters, and supports world fusion network services.

Description

Space-ground converged network cross-domain SFC deployment method based on hybrid state synchronous DRL

Technical Field

The invention belongs to the technical field of network control, and particularly relates to a space-ground convergence network cross-domain SFC deployment method based on hybrid state synchronous DRL.

Background

The heaven and earth converged network is formed by fusing a satellite and earth network by relying on a ground network and expanding a satellite communication network, has the advantages of high bearing capacity, wide coverage range, flexible networking mode and the like, and is a key technology of the next generation of information network. However, the characteristics of large space-time scale, strong topology dynamics, heterogeneous links and the like of the space-ground converged network bring huge challenges to the network service support capability.

Deep Reinforcement Learning (DRL) is an effective decision optimization method, and has wide application in the field of network control. By introducing the intelligent plane, the DRL carries out iterative optimization on self decision based on the perceived network state and issues the decision to the control plane, thereby realizing the optimal control of the network. In particular, the cross-domain service function chain deployment (SFC) of the world convergence network can be modeled as a Markov Decision Process (MDP), and the Bellman equation is iteratively solved by the DRL to support the world convergence network service. However, the traditional DRL method faces the problems of high network state space dimension, high data set acquisition overhead, difficult cross-domain collaboration and the like.

In order to realize the optimal deployment of the SFC, the intelligent plane needs to acquire the global network state information in real time, and the complexity of the DRL algorithm is increased while huge communication overhead is brought to the world convergence network. On the other hand, in a world-wide converged network with limited resources, the acquisition of network state data is limited, so that the sample set is small in scale and difficult to support the training of the DRL.

Disclosure of Invention

In view of this, the invention provides a space-ground convergence network cross-domain SFC deployment method based on a hybrid state synchronous DRL based on an asynchronous dominant Actor-critical algorithm (A3C) in deep reinforcement learning. The A3C has a global Actor-criticic network and a plurality of copy Actor-criticic networks, and can asynchronously and parallelly realize the updating of the neural network. Specifically, the invention designs a hybrid network state synchronization mechanism, which is used for synchronizing key parameters of network states of all domains of a world fusion network in real time, training a partial replica Actor-critical network and a global Actor-critical network, synchronizing global network state information of the world fusion network when the network is idle, generating the key parameters through digital twinning, and training other replica Actor-critical networks and global Actor-critical networks, thereby realizing cross-domain SFC deployment.

A space-ground fusion network cross-domain SFC deployment method based on hybrid state synchronous DRL adopts an A3C network for training, wherein: one Actor-critical network is used as a global Actor-critical network theta ^G P Actor-critical networks form a real copy Actor-critical network set

Q Actor-critical networks form a virtual copy Actor-critical network set

P and Q are not less than 1;

determining a required heaven-earth converged network key parameter vector when deploying the ith virtual network function VNF of the heaven-earth converged network

And networking network state vector s of the world converged network _i ；

For training optimization of the A3C network, the following two processes are carried out simultaneously:

the first process is as follows: for reality copy Actor-criticc network

In the i-th virtual network function VNF v _i When deployment is carried out, key parameters of the network are fused according to the real-time synchronization of the heaven and the earth

To obtain s _i Input of

Gets v _i Deployment decision a _i And sends it down to the control plane; control ofPlane according to a _i V is to be _i Deployment to respective domains according to v _i Get a reward based on the deployed results of (a) _i Calculating a policy gradient, updating in sequence

The Critic network and the Actor network are used until all VNFs are deployed; when the SFC is deployed once, the deployment is updated

The gradient of (a) is uploaded to a global Actor-critical network Θ ^G And update the theta sequentially ^G Finally copying the parameters of the Actor network and the critical network to a real copy Actor-critical network

The second process is as follows: for a virtual copy Actor-critical network, the network is

In the i-th virtual network function VNF v _i When deployment is carried out, key parameters are calculated according to the global network state of intelligent plane synchronization

And obtaining s _i Input of

Gets v _i Deployment decision a _i In the world-wide integration network simulation software, a is executed based on the network state of each domain _i V is to be _i Deployment to corresponding Domain, v _i Simulating the execution situation in the domain to obtain the return, and obtaining the return based on the return and a _i Calculating a policy gradient, updating in sequence

The Critic network and the Actor network until all VNFs are deployed; when it is finishedAfter one SFC deployment, updating

And completing the deployment of the SFC according to the first process and the second process.

Preferably, the world fusion network key parameter vector

Comprises the following steps:

networking network state vector s of the world convergence network _i Comprises the following steps:

；

the method comprises the following steps that a set of N network domains of a world fusion network is set to be D, a set of K inter-domain links is set to be L, a set of I virtual network functions VNF contained in the SFC is set to be V, and a set of J resource types is set to be R; the interval 1. Ltoreq. I'. Ltoreq.I comprises I, the ith VNF v _i An authorized domain that e V can be deployed is

For j resource type r _j E.g. the requirement of R is alpha _i,j The bandwidth required for cross-domain transmission is w _i Maximum time delay of the whole SFC is tau ⁺ (ii) a N-th field d ⁿ E.g. D resource r _j The balance of

It is to VNF v _i Has a processing delay of

The kth inter-domain link l _k The bandwidth of E L is gamma ^k Time delay of τ ^l,k (ii) a Wherein when d ⁿ When it is an unauthorized domain

Gamma when no link is present ^k ＝0，τ ^l,k ＝∞；

{χ _i' } _1≤i'≤I For indicating the state of the currently deployed VNF, if ^ x _i' If n is not zero, then v is _i' Deployed in domain d ⁿ Upper, if X _i' If =0, then v is represented _i' Not yet deployed, and has χ _i' ＝0,i'≥i。

Preferably, when the A3C network is trained, in the Actor network, the set constraint condition is implemented by using an action mask, that is, a decision that does not satisfy the constraint is shielded; wherein the constraint condition comprises:

1) One VNF can only be deployed into one domain, and a physical link must exist between two consecutively deployed VNFs;

2) Resources occupied by all Virtual Network Functions (VNFs) deployed in one domain cannot exceed the resource margin of the domain;

3) The bandwidth occupied by all the Virtual Network Functions (VNFs) carried by one link cannot exceed the bandwidth of the link;

4) The sum of the processing delay and the cross-domain transmission delay in all VNF domains cannot exceed the total delay.

Preferably, the return is calculated by a return function as follows:

wherein, S ∈ {0,1} is a Boolean variable, which indicates whether the service function chain is successfully deployed, S =1 successfully, and S =0 unsuccessfully; r _S And R _F Reward punishment of success and failure of deployment respectively；

Representing the resources within the domain required for deploying all virtual network functions VNF V in constraint 2);

c ^l,k representing the link bandwidth required for deploying all virtual network functions VNF V in constraint 3);

τ ^v,l representing the processing delay required for deploying all virtual network functions VNF V in constraint 4);

p ^l,k ，p ^v,l are respectively as

c ^l,k And τ ^v,l The corresponding weight factor.

The invention has the following beneficial effects:

in the cross-domain SFC deployment method of the heaven and earth fusion network based on the hybrid state synchronization DRL, on one hand, huge expenses caused by real-time synchronization of the global network state of the heaven and earth fusion network can be effectively reduced through key parameter synchronization, and the cross-domain SFC deployment method is suitable for satellite control nodes with limited resources. On the other hand, by virtue of the synchronization of the state information of the global network and the idea of digital twin, the virtual copy Actor-Critic network of the A3C has rich and sufficiently real sample sets, and the success rate of SFC deployment of the global Actor-Critic network and the real copy Actor-Critic network can be further improved. Meanwhile, the actual copy Actor-Critic network of the A3C makes an optimal decision on cross-domain SFC deployment of the world fusion network based on real-time synchronous network key state parameters, and supports world fusion network services.

Drawings

Fig. 1 is a schematic block diagram of a space-ground convergence network cross-domain SFC deployment based on hybrid state synchronous DRL.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

The invention aims to overcome the defects of the prior art and solve the problem of cross-domain SFC deployment under space-time large scale of a space-ground converged network, and provides a space-ground converged network cross-domain SFC deployment method based on hybrid state synchronous DRL. Compared with the SFC deployment method based on the global state synchronization and the key state synchronization, the method provided by the invention has the advantages of low synchronization overhead, high deployment success rate, sufficient training samples and the like.

The method is realized by the following technical scheme:

aiming at cross-domain service function chain deployment under large space-time scale, the invention introduces a deep reinforcement learning technology into a heaven-earth fusion network, decouples the heaven-earth fusion network from the aspects of logic and function, and constructs a logic network model based on a physical plane, a control plane and an intelligent plane, as shown in figure 1.

In the method provided by the invention, the real-time synchronous key parameter is used for updating the real copy Actor-criticic network in the A3C, and the synchronous global network state generates the virtual key parameter by a digital twin method and is used for updating the virtual copy Actor-criticic network. When the real copy Actor-critical network or the virtual copy Actor-critical network completes a round of updating, the strategy gradient of the real copy Actor-critical network or the virtual copy Actor-critical network is used for updating the global Actor-critical network. Let the global Actor-Critic network be Θ ^G There are P real copies of the Actor-Critic network as

There are Q virtual copy Actor-criticic networks as

For the network state, a set of N network domains of the world convergence network is D, a set of K inter-domain links is L, a set of I Virtual Network Functions (VNFs) contained in the SFC is V, a set of J resource types is R, and the ith VNfv is _i An authorized domain that e V can be deployed is

For j resource type r _j E.g. requirement of R is alpha _i,j The bandwidth required for cross-domain transmission is w _i The maximum time delay of the whole SFC is tau ⁺ . N-th field d ⁿ E.g. D resource r _j The balance of

(if d) ⁿ Is an unauthorized domain, then

) Of VNfv _i Has a processing delay of

The kth inter-domain link l _k The bandwidth of E L is gamma ^k Time delay of τ ^l,k (in the absence of a link γ) ^k ＝0，τ ^l,k = ∞). Thus, VNfv is being deployed _i The key network parameter Delta needed for neural network training _i Network topology, authorized domain, resource allowance of each domain, processing time delay, bandwidth and time delay of inter-domain link, namely:

wherein

Is v is _i Parameters obtained after deployment, the values of which are related to the VNF deployment algorithm used in the domain, are used

Indicating the ability to obtain directly (except for

External) to the network. Thus, deployment VNFv _i The temporal network state may be represented as a vector

Wherein, the interval 1 is not less than I' not more than I comprises I; { X _i' } _1≤i'≤I For indicating the state of the currently deployed VNF, if ^ x _i' If n is not zero, then v is _i' Deployed in domain d ⁿ Upper, if X _i' If =0, then v is represented _i' Not yet deployed and has a chi _i' ＝0,i'≥i。

For network decision, VNfv _i Deployable decision is a _i ∈D _i . It is assumed that the shortest path from each domain to the other domains is known (which can be solved by Dijkstra). Suppose that the value of the Boolean variable b epsilon {0,1} is 0 indicates that the event is false (NO), and the value of 1 indicates that the event is true (YES).

1) Using Boolean variables, taking into account the gamut mapping constraints

Denotes v _i Whether or not to be deployed to domain d ⁿ The above. There is therefore a gamut mapping constraint, namely:

the constraint indicates that a VNF can only be deployed into one domain.

In addition, two successive VNFvs _i And v _i+1 The following cross-domain mapping constraints must be satisfied:

where the Boolean variable ρ ^n,m E {0,1} represents the field d ⁿ To domain d ^m Whether or not a link exists, the constraint indicating that a physical link must exist between two VNFs deployed in succession. For example, when domain dn to domain d ^m When there is no link, the link is not present,

denotes v _i And v _i+1 Cannot be deployed to domain d, respectively ⁿ And domain d ^m The above.

2) Considering the intra-domain resource constraints, for a successfully deployed SFC (i.e., all VNFVs), it will be directed to the intra-domain resources r _j Is satisfied with

The constraint indicates that the resources occupied by all VNFs deployed within a domain cannot exceed the domain resource margin.

3) Considering the cross-domain link bandwidth constraint, for a successfully deployed SFC (i.e., all VNF V), the bandwidth constraint is

Wherein the Boolean variable ξ ^n,m,k E {0,1} represents d _n To domain d _m Whether or not the path of (1) contains l _k The constraint means that the bandwidth occupied by all VNFs carried by a link cannot exceed the link bandwidth.

4) Considering the cross-domain link latency constraint, for a successfully deployed SFC (i.e., all VNFVs), the latency constraint is:

the constraint represents the total delay of one SFC, i.e. the sum of the processing delay in all VNF domains and the cross-domain transmission delay, cannot exceed the total delay.

In the Actor network, the constraint condition is realized by adopting an action mask, namely, the decision which does not meet the constraint is shielded.

For the return function, a weight factor is defined

p ^l,k ，p ^v,l Representing the cost overhead of a unit resource (computation, storage, bandwidth, time delay, etc.) to the heaven-earth converged network, the reward function is:

wherein the Boolean variable S is equal to {0,1} and represents whether the service function chain is deployed successfully, R _S And R _F Respectively, successful deployment and failed deployment.

For training optimization of the A3C network, two cases will be distinguished.

First, for a real copy Actor-critical network

In the ith VNfv to SFC _i When deployment is carried out, key parameters of the network are fused according to the real-time synchronization of the heaven and the earth

To obtain s _i Input of

Gets v _i Deployment decision a _i And sent to the control plane. Control plane according to a _i V is to be _i Deployment to respective domains according to v _i Post-deployment result derived parameters

And calculating a reward R based on R and a _i Calculating a policy gradient, updating in sequence

The Critic network and the Actor network until all VNFVs of the SFC are deployed. When SFC is completed once, updating

Second, for a virtual copy Actor-critical network, the network is

In the ith VNfv to SFC _i When deployment is carried out, key parameters are calculated according to the global network state of intelligent plane synchronization

And obtaining s _i Input of

Gets v _i Deployment decision a _i With the aid of the idea of digital twin, in network simulation software, a is executed based on the network state of each domain _i V is to be _i Deploying to the corresponding domain, and randomly running the existing intra-domain VNF embedding algorithm to v _i Simulating the execution condition in the domain to obtain the parameters

And (4) the Critic network and the Actor network until all VNF V of the SFC are deployed. When SFC is completed once, updating

The gradient of (a) is uploaded to a global Actor-critical network Θ ^G And update the theta sequentially ^G Finally copying the parameters of the Actor network and the criticic network to a real copy Actor-criticic networkIs composed of

The training of the two networks is carried out in parallel without mutual interference.

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A space-ground fusion network cross-domain SFC deployment method based on hybrid state synchronous DRL is characterized in that an A3C network is adopted for training, wherein: one Actor-critical network is used as a global Actor-critical network theta ^G P Actor-critical networks form a real copy Actor-critical network set

Q Actor-critical networks form a virtual copy Actor-critical network

P and Q are not less than 1;

And networking network state vector s of the world converged network _i ；

the first process is as follows: for real copy Actor-Critic network

In the i-th virtual network function VNF v _i When deployment is carried out, the network is converged according to the real-time synchronous heaven and earthKey parameter

To obtain s _i Input of

Gets v _i Deployment decision a _i And sends it to the control plane; control plane according to a _i V is to be _i Deployment to respective domains according to v _i Get a reward based on the deployed results of (a) _i Calculating a policy gradient, updating in sequence

The Critic network and the Actor network until all the VNFs are deployed; when one SFC deployment is finished, updating

And obtain s _i Inputting of

The Critic network and the Actor network until all VNFs are deployed; when one deployment is completed, updating

The gradient of (a) is uploaded to a global Actor-Critic network Θ ^G And update the theta sequentially ^G Finally copying the parameters of the Actor network and the critical network to a real copy Actor-critical network

The deployment of SFCs (i.e., all VNFVs) is accomplished in accordance with the first procedure and the second procedure.

2. The method of claim 1, wherein the space-ground converged network cross-domain SFC deployment method based on hybrid state synchronous DRL is characterized in that the space-ground converged network key parameter vector

Comprises the following steps:

the method comprises the following steps that a set of N network domains of a world convergence network is set to be D, a set of K inter-domain links is set to be L, a set of I virtual network functions VNF contained in an SFC is set to be V, and a set of J resource types is set to be R; the interval 1. Ltoreq. I'. Ltoreq.I comprises I, the ith VNF v _i An authorized domain that e V can be deployed is

It is to VNF v _i Has a processing delay of

Gamma when no link is present ^k ＝0，τ ^l,k ＝∞；

3. The method for deploying SFC across heaven and earth fusion network based on hybrid state synchronous DRL according to claim 2, wherein when training the A3C network, in the Actor network, the action mask is adopted to realize the set constraint condition, i.e. the decision not meeting the constraint is shielded; wherein the constraint condition comprises:

4. The method for deploying SFCs across the heaven and earth converged network based on hybrid state synchronous DRL of claim 3, wherein the reward is calculated by a reward function as follows:

wherein, S ∈ {0,1} is a Boolean variable, which indicates whether the service function chain is successfully deployed, S =1 successfully, and S =0 unsuccessfully; r _S And R _F Respectively being reward punishment of successful deployment and failed deployment;

representing the resources within the domain required by all the VNFVs of the virtual network function in constraint 2);

p ^l,k ，p ^v,l are respectively as

c ^l,k And τ ^v,l The corresponding weight factor.