CN115767562B - Service function chain deployment method based on reinforcement learning joint coordinated multi-point transmission - Google Patents
Service function chain deployment method based on reinforcement learning joint coordinated multi-point transmission Download PDFInfo
- Publication number
- CN115767562B CN115767562B CN202211012894.7A CN202211012894A CN115767562B CN 115767562 B CN115767562 B CN 115767562B CN 202211012894 A CN202211012894 A CN 202211012894A CN 115767562 B CN115767562 B CN 115767562B
- Authority
- CN
- China
- Prior art keywords
- user
- network
- deployment
- time slot
- service function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000006870 function Effects 0.000 title claims abstract description 99
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000005540 biological transmission Effects 0.000 title claims abstract description 23
- 230000002787 reinforcement Effects 0.000 title claims abstract description 16
- 230000009471 action Effects 0.000 claims abstract description 26
- 238000005457 optimization Methods 0.000 claims abstract description 18
- 230000007774 longterm Effects 0.000 claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 16
- 238000005516 engineering process Methods 0.000 claims abstract description 13
- 238000013508 migration Methods 0.000 claims abstract description 13
- 230000005012 migration Effects 0.000 claims abstract description 13
- 238000004891 communication Methods 0.000 claims abstract description 8
- 238000013178 mathematical model Methods 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 54
- 238000004808 supercritical fluid chromatography Methods 0.000 claims description 52
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 238000005265 energy consumption Methods 0.000 claims description 4
- 241000764238 Isis Species 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 238000005562 fading Methods 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 3
- WFMVQPURDPTNBD-LLVKDONJSA-N (3,3-difluoroazetidin-1-yl)-[(5R)-5-phenyl-6,7-dihydro-5H-pyrrolo[1,2-b][1,2,4]triazol-2-yl]methanone Chemical compound FC1(F)CN(C1)C(=O)C1=NN2[C@H](CCC2=N1)C1=CC=CC=C1 WFMVQPURDPTNBD-LLVKDONJSA-N 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a service function chain deployment method based on reinforcement learning joint cooperation multipoint transmission, which comprises the steps of firstly describing an edge network model, describing channel characteristics of a server and a user in an edge network, and eliminating communication interference between a plurality of servers and the user by using beam forming; establishing a mathematical model under the limitation of the number of server VNF instantiations, the processing capacity of the server, the physical link bandwidth, the VNF routing and the VNF migration budget; modeling the long-term optimization problem; decoupling the long-term optimization problem into a slot-by-slot optimization problem; and finally, establishing a sub-optimization problem for solving the reward function, and reducing the complexity of searching the action space. The invention eliminates the wireless link interference between users by utilizing a zero-forcing beamforming technology based on CoMP, and then decouples the long-term dynamic SFC deployment problem into a time slot-by-time slot optimization problem by using an Actor-Critic algorithm based on a natural gradient method, thereby carrying out online solving.
Description
Technical Field
The invention belongs to the technical field of communication, and particularly relates to a service function chain deployment method based on reinforcement learning joint cooperation multipoint transmission.
Background
Under SDN (Software Defined Network, SDN) architecture, 6G communication systems are expected to promote network quality of service through emerging network function virtualization (Network Function Virtualization, NFV) technologies, so that service functions can be deployed directly in commercial servers using virtual machine or container technologies without being deployed in proprietary hardware. This allows the 6G communication system operator to directly flexibly scale the network on demand by increasing the number of commercial servers. By utilizing the NFV technology, the logic sequences of all service functions which the data packet passes through can be cascaded to form an SFC, and flexible customization and deployment of various services are realized through arrangement of virtualized network functions (Virtualized Network Function, VNF).
Currently, there is much work to research SFC deployment methods for traditional network architecture, such as content distribution networks and centralized cloud computing networks. Unlike traditional network architecture, SFC is deployed in the 6G edge network, which can enable the positions of users and VNs to be adjacent to each other, thereby obtaining services nearby and improving the service quality of computationally intensive and delay sensitive services; meanwhile, with NFV technology, the edge server can provide richer VNF orchestration combinations, thereby providing more complex and flexible service types.
At present, the research of the SFC deployment method in the edge network mainly aims at a single-time-slot static network, ignores the interference characteristic of a wireless channel in the edge network, and ignores the dynamic change characteristics of cache resources, calculation resources and communication resources in an edge server. Therefore, the invention provides an SFC deployment method based on online Actor-Critic learning and combined CoMP beam forming in a 6G wireless edge network, which utilizes a zero-forcing beam forming technology based on CoMP to eliminate wireless link interference among a plurality of SFCs.
Disclosure of Invention
The invention aims to provide a service function chain deployment method based on reinforcement learning combined coordinated multi-point transmission, which utilizes a zero-forcing beamforming technology based on CoMP to eliminate wireless link interference among users, and then uses an Actor-Critic algorithm based on a natural gradient method to decouple a long-term dynamic SFC deployment problem into a time slot-by-time slot optimization problem so as to solve online.
The technical scheme adopted by the invention is that the service function chain deployment method based on reinforcement learning combined coordinated multi-point transmission is implemented according to the following steps:
step 1, describing an edge network model, including the characteristics of an edge server, a network virtual function, a user and a service function chain;
Step 2, describing channel characteristics of a server and a user in an edge network, and eliminating communication interference between a plurality of servers and the user by using beam forming;
Step 3, establishing a mathematical model under the limitation of the number of server VNF instantiations, the processing capacity of the server, the physical link bandwidth, the VNF routing and the VNF migration budget;
Step 4, modeling a long-term optimization problem according to the resource constraint established in the step 1-3;
step 5, constructing a Markov decision process model MDP, and decoupling a long-term optimization problem into a time slot-by-time slot optimization problem;
Step 6, using an Actor-Critic reinforcement learning algorithm based on natural gradient to learn the SFC optimal deployment strategy on line time slot by time slot;
and 7, establishing a sub-optimization problem of the solution of the reward function when searching the action space, reducing the complexity of searching the action space, and finally obtaining the optimal solution.
The present invention is also characterized in that,
The step 1 is specifically implemented according to the following steps:
Step 1.1, in the edge network, an edge server is connected with a remote radio frequency module RRH and uses Simultaneously representing the nth edge server and the RRH of that server, whereinRepresenting a set of servers in the edge network, N representing the total number of servers in the edge network; the edge servers are connected with each other through an X2 link, and each edge server can provide a plurality of different virtual functions by utilizing a virtual machine technology;
Step 1.2 use Representing the mth user in the edge network,Representing the set of users in the edge network, M representing the total number of users, assuming that each user can only be served by one service function chain SFC, and defining the SFC as follows:
Wherein, First service function of SFC representing user m,Representing the first service function,The last service function is indicated and designated as baseband processing vBBU.
The step 2 is specifically implemented according to the following steps:
2.1, rayleigh fading and path loss between user m and RRH, using Representing the channel matrix between user m and the RRH numbered n, whereinRepresenting an L n×Lm -dimensional complex matrix, L n represents the number of transmit antennas of RRH numbered n, and L m represents the number of receive antennas of user m, then the signal u m,t received by user m in time slot t can be represented as:
Wherein, Representing the channel matrix between user m and all RRHs in the edge network in time slot t, whereinRepresenting the channel matrix between user m and the RRH numbered n, where (-) H represents the conjugate transpose of the matrix,Representing the total number of antennas for all RRHs; /(I)The beamforming matrix is expressed as a beamforming matrix of all RRHs to the user m, and d m is the number of data streams received by the user m; the identity matrix is denoted by I,Mean value is zero, covariance isIs a gaussian random codebook of (c); n m,t is covarianceIs white gaussian noise;
2.2, the second term of the formula can be removed by continuously encoding the received signal u m,t of the user m in step 2.1 based on the gaussian random codebook, and the received data rate R m,t of the user m in the time slot t can be expressed as:
2.3 setting And P m,n respectively represent the service function processing power consumption and the wireless transmission power consumption provided by the edge server n to the user m, and letVNF instance vBBU indicating whether user m uses edge server n; then all RRHs will beam-form matrix/>, for user mThe following should be satisfied:
Wherein the method comprises the steps of Tr (·) represents the trace operation performed on the matrix;
2.4, eliminating wireless interference between SFCs by utilizing a zero forcing beamforming technology of RRH, namely stacking channel matrixes of all users, and then performing QR decomposition:
Wherein, Can be expressed as: /(I) Is a set of orthogonal bases, and Is an upper triangular matrix of full rank, the remaining matrix blocks of the upper triangle are any non-zero matrix, and thus, the beamforming matrix of user m can be expressed asWherein
2.5, The conditions must be satisfied to eliminate interference: and/> In the formulaThe first L m line of S m,t is related to the received data rate and can therefore be simplified toWhereinH m,t may be simplified to H m,t=Qm,tRm,m,t, and the effective beamforming matrix Σ m,t may be defined as: /(I)Let matrixStep 2.3 establishes the constraint is equivalent to:
Constraint 1:
Constraint 2:
the received data rate R m,t in step 2.2 of 2.6 needs to be equal to or greater than the data rate threshold R m,th in order to perform correct data decoding, namely:
Constraint 3: r m,t=log2|ILm+∑m,t|≥Rm,th.
The step 3 is specifically implemented according to the following steps:
3.1, set up Representing a set of edge servers capable of providing service functions f, it is assumed that each service function can only be deployed on one edge server, namely:
constraint 4: Wherein/> Representing service functionsWhether or not to be deployed on edge server n, Indicates whether the service function f is provided in the edge server n in the time slot t, andAndThe method meets the following conditions: constraint 5: /(I)
3.2 The total data rate of service flows handled by a certain VNF instance cannot exceed the processing capacity of the VNF instanceNamely:
Constraint 6:
3.3 the total data rate of the service flow flowing through a certain link cannot exceed the link bandwidth Namely:
constraint 7: Wherein/> RepresentationAndWhether deployed on edge servers n and s, respectively;
3.4, only when in time slot t AndAt the same time 1,Can take 1; thenAndThe relationship between can be described as:
Constraint 8:
3.5, definition For the service migration costs of the edge servers n and s, the total service migration cost of the system cannot exceed the migration threshold C mig,th, namely:
Constraint 9:
Step 4 is specifically implemented according to the following steps:
4.1, defining the total overhead of the system to comprise data flow overhead and power consumption overhead;
4.2, first define The RRH wireless transmission power consumption isP f,n is then defined as the energy consumption of the edge server n to turn on the service function f,Maintaining service functions for edge server nThe total overhead of the system deployment SFC in time slot t is:
Wherein η is a compromise coefficient between data flow overhead and power consumption overhead, in the above formula, the first term represents data flow overhead between edge servers, the second term represents power consumption overhead for starting a service function, the third term represents power consumption overhead for providing a service function for a user, and the fourth term represents wireless transmission power consumption for performing beamforming by an RRH;
4.3, step 4.2 establishes the overhead of system deployment SFC in a single time slot T, on the basis, the long-term dynamic SFC deployment overhead is defined as the average value of the system overhead of each time slot in the whole deployment process, the total number of time slots in the deployment process is represented by T, and the system deployment overhead is represented by T Representing the minimum of SFC deployment overhead to solve for long term dynamics, namely:
Wherein the variables in C t Rm,t、P f, and Σ m,t are constrained by constraints 1-9 established in step 3 and step 2 by solvingObtaining specific deployment result/>, of SFC of each time slot
Step 5 is specifically implemented according to the following steps:
5.1 establishing MDP four-tuple Wherein the state spaceThe four elements are respectively a wireless channel matrix between the user and the RRH, processing capacity of the VNF instance, link bandwidth between edge servers and SFC deployment result of the last time slot, namely:
5.2 define actions
5.3 Definition ofIs (s t,at) the corresponding bonus function; if the action a t taken cannot find a feasible solution, setting the bonus function to a smaller negative number;
5.4 solving the maximum value of the reward function r (s t,at) under the premise of the given action a t, and recording the solving problem of the maximum reward function as Namely:
Wherein, Representation ofGiven parameters, willConversion toSpecific deployment results/>, to solve for each slot SFC
Step 6 is specifically implemented according to the following steps:
Using an Actor neural network to output deployment strategies, using a Critic neural network to evaluate each strategy through a Q value approximation method, using a neural network w to approximate an action cost function, namely Qw(st,at)≈Qπ(st,at),Qw(st,at) representing the expectation of rewards obtained by each subsequent state after action a t is taken in state s t, and Q π(st,at) as the action cost function;
6.2, employing empirical playback and target network techniques to improve training stability, the loss function of the Critic network can be defined as:
Wherein, Representing a desirability operator,For the empirical playback pool, w' is the model of the target network in time slot t,An estimate of the expected value at the average return;
6.3, gradient is calculated for Loss (w) to w, and then the updating mode of w is as follows:
Wherein alpha c is the learning rate of the Critic network, and I is the number of samples obtained from the experience playback pool;
6.4, defining the expected value of average return based on parameterized policy pi θ as follows:
Wherein, Representing the steady state distribution of state s;
And 6.5, training an Actor network by adopting a natural gradient method, wherein the updating mode of the network model theta is changed into: Where a a denotes the learning rate of the Actor network, F (θ) is the fischer information matrix, Represents the gradient of J (pi θ) to θ;
and 6.6, integrating the Actor network and the Critic network, so that training of the neural network is performed along the natural gradient direction, and the neural network model is approximate to global optimum.
The step 7 is specifically implemented according to the following steps:
7.1 pair of AndRelaxation treatment willConverting into a convex problem; thus introducing an L p (0 < p < 1) norm penalty function to force the relaxation variable to be an integer from 0 to 1; letObtainProgressive optimal sub-problemThe following are provided:
where σ is the penalty parameter, Delta is an arbitrarily small positive number,
Variable(s)AndSatisfy constraint
7.2, Defining the iterative mode of punishment parameter as follows: delta v+1=ηδv (eta > 1) such that penalty term P δ (y) converges to 0 at a linear rate;
7.3 due to The penalty term in (a) is non-convex, resulting inDifficult to solve, and/>, adopting a continuous convex approximation SCA techniqueConversion to a convex problem solution, willFirst-order taylor expansion of penalty term of (2), i.eWhere y v is the optimal solution for the last SCA iteration,Is the gradient value of P δ (y) at y v to y;
7.4, in the v+1st SCA iteration, Eventually becomes a convex problem, namely:
7.5, according to the above steps to P 1-S as Is obtained by solving for P 1-S Further obtaining the maximum value of the rewarding function and finally obtaining the deployment result of each time slot SFC according to the maximum rewarding
The service function chain deployment method based on reinforcement learning combined coordinated multi-point transmission has the advantages that the long-term dynamic deployment of SFC can be completed in a non-interference mode on the premise of ensuring the service quality of users, and the operation cost of an edge server and the wireless transmission power consumption cost of RRH in the deployment process are further reduced.
Drawings
Fig. 1 is a schematic diagram of a system model of SFC deployment in combination with CoMP beamforming in a wireless edge network according to the present invention;
fig. 2 is a schematic flow chart of an algorithm for SFC online deployment based on Actor-Critic learning in a wireless edge network according to the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings and the detailed description.
Referring to fig. 1-2, fig. 1 is a schematic diagram of a system model of SFC deployment for joint CoMP beamforming in a wireless edge network, where two SFC instances, one edge network, a mapping relationship between multiple service functions and an edge server, and two CoMP beams for two different users are included; FIG. 2 is a flowchart of an Actor-Critic algorithm used to solve the SFC online deployment problem in a wireless edge network. The embodiment describes in detail an SFC online deployment method based on an Actor-Critic algorithm in a wireless edge network.
The invention discloses a service function chain deployment method based on reinforcement learning joint cooperation multipoint transmission, which is implemented according to the following steps:
step 1, describing an edge network model, including the characteristics of an edge server, a network virtual function, a user and a service function chain;
The step 1 is specifically implemented according to the following steps:
step 1.1 in the edge network, an edge server is connected to a remote radio frequency module (Remote Radio Head, RRH) and uses Represents a set of servers in the edge network, and N represents the total number of servers in the edge network. The edge servers are interconnected by an X2 link and each edge server may provide a variety of different virtual functions (e.g., caching, computing, and firewall) using virtual machine technology.
Step 1.2 useRepresenting the mth user in the edge network,Representing the set of users in the edge network, M representing the total number of users. It is assumed that each user can only be served by one service function chain (Service Function Chaining, SFC) and that SFC is defined as follows:
Wherein, First service function of SFC representing user m,Representing the first service function,The last service function is indicated and designated as baseband processing (Virtualized Building Base band Unit, vBBU).
Step 2, describing channel characteristics of a server and a user in an edge network on the basis of the step 1, and eliminating communication interference between a plurality of servers and the user by using beam forming;
The step 2 is specifically implemented according to the following steps:
2.1 Rayleigh fading and path loss exist between user m and RRH, using Representing the channel matrix between user m and the RRH numbered n, whereinAn L n×Lm -dimensional complex matrix is represented, L n represents the number of transmit antennas of RRH numbered n, and L m represents the number of receive antennas of user m. The signal u m,t received by user m in time slot t may be expressed as:
Wherein, Representing the channel matrix between user m and all RRHs in the edge network in time slot t, whereinRepresenting the channel matrix between user m and the RRH numbered n, where (-) H represents the conjugate transpose of the matrix,Representing the total number of antennas for all RRHs; /(I)The beamforming matrix is expressed as a beamforming matrix of all RRHs to the user m, and d m is the number of data streams received by the user m; the identity matrix is denoted by I,Mean value is zero, covariance isIs a gaussian random codebook of (c); n m,t is covarianceIs white gaussian noise;
2.2 the second term of the formula can be removed by continuous coding based on a gaussian random codebook for the received signal u m,t of user m in step 2.1, and the received data rate R m,t of user m in time slot t can be expressed as:
2.3 setting And P m,n respectively represent the service function processing power consumption and the wireless transmission power consumption provided by the edge server n to the user m, and letVNF instance vBBU indicating whether user m uses edge server n; then all RRHs will beam-form matrix/>, for user mThe following should be satisfied:
Wherein the method comprises the steps of Tr (·) represents the trace operation performed on the matrix.
2.4, Eliminating wireless interference between SFCs by utilizing a zero forcing beamforming technology of RRH, namely stacking channel matrixes of all users, and then performing QR decomposition:
Wherein, Can be expressed as: /(I) Is a set of orthogonal bases, and Is the full rank upper triangular matrix, and the rest matrix blocks of the upper triangular matrix are non-zero matrices. Thus, the beamforming matrix for user m may be represented asWherein
2.5 Conditions to be satisfied for interference cancellation: and/> In the formulaThe first L m line of S m,t is related to the received data rate and can therefore be simplified toWhereinH m,t can be simplified to H m,t=Qm,tRm,m,t. The effective beamforming matrix Σ m,t may be defined as: /(I)Let matrixStep 2.3 establishes the constraint is equivalent to:
Constraint 1:
Constraint 2:
2.6 the received data rate R m,t in step 2.2 needs to be equal to or greater than the data rate threshold R m,th in order to perform correct data decoding, i.e.:
Constraint 3: r m,t=log2|ILm+∑m,t|≥Rm,th
Step 3, establishing a mathematical model under the limitation of the number of server VNF instantiations, the processing capacity of the server, the physical link bandwidth, the VNF routing and the VNF migration budget;
The step 3 is specifically implemented according to the following steps:
3.1 arrangement Representing a set of edge servers capable of providing service functions f, it is assumed that each service function can only be deployed on one edge server, namely:
constraint 4: Wherein/> Representing service functionsWhether or not to be deployed on edge server n, Indicating whether the service function f is provided in the edge server n in the time slot t. AndAndThe method meets the following conditions: constraint 5: /(I)
3.2 The total data rate of service flows handled by a VNF instance cannot exceed the processing capacity of the VNF instanceNamely:
Constraint 6:
3.3 the total data rate of the service flows flowing over a link must not exceed its link bandwidth Namely:
constraint 7: Wherein/> RepresentationAndWhether deployed on edge servers n and s, respectively.
3.4 Only when in time slot tAndAt the same time 1,Only 1 can be taken. ThenAndThe relationship between can be described as:
Constraint 8:
3.5 definition For the service migration costs of the edge servers n and s, the total service migration cost of the system cannot exceed the migration threshold C mig,th, namely:
Constraint 9:
Step 4, modeling a long-term optimization problem according to the resource constraint established in the step 1-3;
Step 4 is specifically implemented according to the following steps:
4.1 defining the total overhead of the system includes data flow overhead and power consumption overhead.
4.2 First definitionThe RRH wireless transmission power consumption isP f,n is then defined as the energy consumption of the edge server n to turn on the service function f,Maintaining service functions for edge server nThe total overhead of the system deployment SFC in time slot t is:
Where η is a trade-off coefficient between data flow overhead and power consumption overhead. In the above formula, the first term represents data flow overhead between edge servers, the second term represents power consumption overhead for starting a service function, the third term represents power consumption overhead for providing a service function for a user, and the fourth term represents wireless transmission power consumption for performing beamforming by an RRH.
4.3, Step 4.2 establishes the overhead of system deployment SFC in a single time slot t, and on the basis, the long-term dynamic SFC deployment overhead is defined as the average value of the system overhead of each time slot in the whole deployment process. The total time slot number in the deployment process is represented by TRepresenting the minimum of SFC deployment overhead to solve for long term dynamics, namely:
Wherein the variables in C t Rm,t、P f,n and Σ m,t are constrained by constraints 1-9 established in step 3 and step 2 by solvingObtaining specific deployment result/>, of SFC of each time slot
Step 5, constructing a Markov decision process (Markov Decision Processes) model MDP, and decoupling a long-term optimization problem into a time slot-by-time slot optimization problem;
step 5 is specifically implemented according to the following steps:
5.1 establishing MDP four-tuple Wherein the state spaceThe four elements are respectively a wireless channel matrix between the user and the RRH, processing capacity of the VNF instance, link bandwidth between edge servers and SFC deployment result of the last time slot, namely:
5.2 define actions This is because the original action space contains the variablesToo high a dimension, isTherefore, the motion space is processed to reduce the dimension.
5.3 DefinitionIs (s t,at) the corresponding bonus function; if the action a t taken cannot find a viable solution, then the bonus function is set to a small negative number. /(I)
5.4 Solving the maximum value of the reward function r (s t,at) given action a t, and noting the solution problem of the maximum reward function asNamely:
Wherein, Representation ofGiven parameters, willConversion toSpecific deployment results/>, to solve for each slot SFC
Step 6, using an Actor-Critic reinforcement learning algorithm based on natural gradient to learn the SFC optimal deployment strategy on line time slot by time slot;
step 6 is specifically implemented according to the following steps:
6.1 outputting deployment strategies by using an Actor neural network, and evaluating each strategy by using a Critic neural network through a Q value approximation method. The neural network w is used to approximate the action cost function, i.e., Qw(st,at)≈Qπ(st,at),Qw(st,at) represents the expected return that each subsequent state will get after action a t is taken in state s t, Q π(st,at) is the action cost function.
6.2 To break the time correlation between samples, empirical playback and target network techniques are employed to improve training stability, where the loss function of Critic network can be defined as:
Wherein, Representing a desirability operator,For the empirical playback pool, w' is the model of the target network in time slot t,Is an estimate of the expected value at the average return.
6.3 Gradient is calculated for Loss (w) to w, and then the updating mode of w is as follows:
Wherein, alpha c is the learning rate of the Critic network, and I is the number of samples obtained from the experience playback pool.
6.4 Based on parameterized policy pi θ, the expected value of average return is defined as follows:
Wherein, Representing the steady state distribution of state s.
6.5 In order to avoid the situation that J (pi θ) falls into local optimum when training along the standard gradient direction, a natural gradient method is adopted to train an Actor network, and the updating mode of a network model theta is changed into: where α a represents the learning rate of the Actor network, F (θ) is the Fischer information matrix,/> Represents the gradient of J (pi θ) to θ.
6.6 The algorithm flow is shown in figure 2, and the Actor network and the Critic network are integrated, so that the training of the neural network is carried out along the natural gradient direction, and the neural network model is enabled to approach global optimum.
Step 7, establishing a sub-optimization problem of the solution of the reward function when searching the action space, reducing the searching complexity of the action space, and solving the reward function set in the step 5.3 to obtainIs a progressive optimal solution of (a). /(I)
The step 7 is specifically implemented according to the following steps:
7.1 pair of AndRelaxation treatment willConverting into a convex problem; however, the convex problem obtained after relaxation cannot guarantee that the optimal solution is a 0-1 integer solution, so that the relaxation problem is not equivalent to the original problem, and therefore, an L p (0 < p < 1) norm penalty function is introduced to force a relaxation variable to be a 0-1 integer. LetObtainProgressive optimal sub-problemThe following are provided:
where σ is the penalty parameter, Delta is an arbitrarily small positive number, VariableAndSatisfy constraint
7.2 The iterative way of defining penalty parameters is as follows: delta v+1=ηδv (eta > 1) such that penalty term P δ (y) converges to 0 at a linear rate.
7.3 Due toThe penalty term in (a) is non-convex, resulting inDifficult to solve, using a continuous convex approximation (Successive Convex Approximation, SCA) technique willConverting into a convex problem solution. Will beFirst order taylor expansion of penalty term of (2), i.e.Where y v is the optimal solution for the last SCA iteration,Is the gradient value of P δ (y) at y v to y.
7.4 In the v+1st SCA iteration,Eventually becomes a convex problem, namely:
7.5 according to the above procedure to P 1-S as Is obtained by solving for P 1-S Further obtaining the maximum value of the rewarding function and finally obtaining the deployment result of each time slot SFC according to the maximum rewarding
Claims (5)
1. The service function chain deployment method based on reinforcement learning combined coordinated multi-point transmission is characterized by comprising the following steps of:
step 1, describing an edge network model, including the characteristics of an edge server, a network virtual function, a user and a service function chain;
Step 2, describing channel characteristics of a server and a user in an edge network, and eliminating communication interference between a plurality of servers and the user by using beam forming; the step 2 is specifically implemented according to the following steps:
2.1, rayleigh fading and path loss between user m and RRH, using Representing the channel matrix between user m and the RRH numbered n, whereinRepresenting an L n×Lm -dimensional complex matrix, L n represents the number of transmit antennas of RRH numbered n, and L m represents the number of receive antennas of user m, then the signal u m,t received by user m in time slot t can be represented as:
Wherein, Representing the channel matrix between user m and all RRHs in the edge network in time slot t, whereinRepresenting the channel matrix between user m and the RRH numbered n, where (-) H represents the conjugate transpose of the matrix,Representing the total number of antennas for all RRHs; /(I)The beamforming matrix is expressed as a beamforming matrix of all RRHs to the user m, and d m is the number of data streams received by the user m; the identity matrix is denoted by I,Mean value is zero, covariance isIs a gaussian random codebook of (c); n m,t is covarianceIs white gaussian noise;
2.2, the second term of the formula can be removed by continuously encoding the received signal u m,t of the user m in step 2.1 based on the gaussian random codebook, and the received data rate R m,t of the user m in the time slot t can be expressed as:
2.3 setting And P m,n respectively represent the service function processing power consumption and the wireless transmission power consumption provided by the edge server n to the user m, and letVNF instance vBBU indicating whether user m uses edge server n; then all RRHs will beam-form matrix/>, for user mThe following should be satisfied:
Wherein the method comprises the steps of Tr (·) represents the trace operation performed on the matrix;
2.4, eliminating wireless interference between SFCs by utilizing a zero forcing beamforming technology of RRH, namely stacking channel matrixes of all users, and then performing QR decomposition:
Wherein, Can be expressed as: /(I) Is a set of orthogonal bases, and Is an upper triangular matrix of full rank, the remaining matrix blocks of the upper triangle are any non-zero matrix, and thus, the beamforming matrix of user m can be expressed asWherein
2.5, The conditions must be satisfied to eliminate interference: and/> In the formulaThe first L m line of S m,t is related to the received data rate and can therefore be simplified toWhereinH m,t may be simplified to H m,t=Qm,tRm,m,t, and the effective beamforming matrix Σ m,t may be defined as: /(I)Let matrixStep 2.3 establishes the constraint is equivalent to:
Constraint 1:
Constraint 2:
the received data rate R m,t in step 2.2 of 2.6 needs to be equal to or greater than the data rate threshold R m,th in order to perform correct data decoding, namely:
constraint 3:R m,t=log2|ILm+∑m,t|≥Rm,th;
Step 3, establishing a mathematical model under the limitation of the number of server VNF instantiations, the processing capacity of the server, the physical link bandwidth, the VNF routing and the VNF migration budget;
The step 3 is specifically implemented according to the following steps:
3.1, set up Representing a set of edge servers capable of providing service functions f, it is assumed that each service function can only be deployed on one edge server, namely:
constraint 4: Wherein/> Indicating whether the service function f l m is deployed on the edge server n,Indicates whether the service function f is provided in the edge server n in the time slot t, andAndThe method meets the following conditions: constraint 5: /(I)
3.2 The total data rate of service flows handled by a certain VNF instance cannot exceed the processing capacity of the VNF instanceNamely:
Constraint 6:
3.3 the total data rate of the service flow flowing through a certain link cannot exceed the link bandwidth Namely:
constraint 7:
Wherein, Represents f l m andWhether deployed on edge servers n and s, respectively;
3.4, only when in time slot t AndAt the same time 1,Can take 1; thenAndThe relationship between can be described as:
Constraint 8:
3.5, definition For the service migration costs of the edge servers n and s, the total service migration cost of the system cannot exceed the migration threshold C mig,th, namely:
Constraint 9:
Step 4, modeling a long-term optimization problem according to the resource constraint established in the step 1-3;
The step 4 is specifically implemented according to the following steps:
4.1, defining the total overhead of the system to comprise data flow overhead and power consumption overhead;
4.2, first define The RRH wireless transmission power consumption is
P f,n is then defined as the energy consumption of the edge server n to turn on the service function f,Maintaining the energy consumption of the service function f l m for the edge server n, the total overhead of the system deployment SFC in the slot t is:
Wherein η is a compromise coefficient between data flow overhead and power consumption overhead, in the above formula, the first term represents data flow overhead between edge servers, the second term represents power consumption overhead for starting a service function, the third term represents power consumption overhead for providing a service function for a user, and the fourth term represents wireless transmission power consumption for performing beamforming by an RRH;
4.3, step 4.2 establishes the overhead of system deployment SFC in a single time slot T, on the basis, the long-term dynamic SFC deployment overhead is defined as the average value of the system overhead of each time slot in the whole deployment process, the total number of time slots in the deployment process is represented by T, and the system deployment overhead is represented by T Representing the minimum of SFC deployment overhead to solve for long term dynamics, namely:
Wherein the variables in C t P f,n and Σ m,t are constrained by constraints 1-9 established in step 3 and step 2 by solvingObtaining specific deployment result/>, of SFC of each time slot
Step 5, constructing a Markov decision process model MDP, and decoupling a long-term optimization problem into a time slot-by-time slot optimization problem;
Step 6, using an Actor-Critic reinforcement learning algorithm based on natural gradient to learn the SFC optimal deployment strategy on line time slot by time slot;
and 7, establishing a sub-optimization problem of the solution of the reward function when searching the action space, reducing the complexity of searching the action space, and finally obtaining the optimal solution.
2. The service function chain deployment method based on reinforcement learning joint cooperative multipoint transmission according to claim 1, wherein said step 1 is specifically implemented according to the steps of:
Step 1.1, in the edge network, an edge server is connected with a remote radio frequency module RRH and uses Simultaneously representing the nth edge server and the RRH of that server, whereinRepresenting a set of servers in the edge network, N representing the total number of servers in the edge network; the edge servers are connected with each other through an X2 link, and each edge server can provide a plurality of different virtual functions by utilizing a virtual machine technology;
Step 1.2 use Representing the mth user in the edge network,Representing the set of users in the edge network, M representing the total number of users, assuming that each user can only be served by one service function chain SFC, and defining the SFC as follows:
where f 1 m denotes the first service function of the SFC of user m, f l m denotes the first service function, The last service function is indicated and designated as baseband processing vBBU.
3. The service function chain deployment method based on reinforcement learning joint cooperative multipoint transmission according to claim 1, wherein said step 5 is specifically implemented according to the steps of:
5.1 establishing MDP four-tuple Wherein the state spaceThe four elements are respectively a wireless channel matrix between the user and the RRH, processing capacity of the VNF instance, link bandwidth between edge servers and SFC deployment result of the last time slot, namely:
5.2 define actions
5.3 Definition ofIs (s t,at) the corresponding bonus function; if the action a t taken cannot find a feasible solution, setting the bonus function to a smaller negative number;
5.4 solving the maximum value of the reward function r (s t,at) under the premise of the given action a t, and recording the solving problem of the maximum reward function as Namely:
Wherein, Representation ofGiven parameters, willConversion toSpecific deployment results/>, to solve for each slot SFC
4. The service function chain deployment method based on reinforcement learning joint cooperative multipoint transmission according to claim 1, wherein said step 6 is specifically implemented according to the steps of:
6.1, using an Actor neural network to output deployment policies, using a Critic neural network to evaluate each policy by means of Q-value approximation, using neural network w to approximate action cost functions, i.e. Qw(st,at)≈Qπ(st,at),Qw(st,at) to represent the expectations of rewards that can be obtained for subsequent states after action a t is taken in state s t,
Q π(st,at) is an action cost function;
6.2, employing empirical playback and target network techniques to improve training stability, the loss function of the Critic network can be defined as:
Wherein, Representing a desirability operator,For the empirical playback pool, w' is the model of the target network in time slot t,An estimate of the expected value at the average return;
6.3, gradient is calculated for Loss (w) to w, and then the updating mode of w is as follows:
Wherein alpha c is the learning rate of the Critic network, and I is the number of samples obtained from the experience playback pool;
6.4, defining the expected value of average return based on parameterized policy pi θ as follows:
Wherein, Representing the steady state distribution of state s;
And 6.5, training an Actor network by adopting a natural gradient method, wherein the updating mode of the network model theta is changed into: Where a a denotes the learning rate of the Actor network, F (θ) is the fischer information matrix, Represents the gradient of J (pi θ) to θ;
and 6.6, integrating the Actor network and the Critic network, so that training of the neural network is performed along the natural gradient direction, and the neural network model is approximate to global optimum.
5. The service function chain deployment method based on reinforcement learning joint cooperative multipoint transmission according to claim 1, wherein said step 7 is specifically implemented according to the steps of:
7.1 pair of AndRelaxation treatment willConverting into a convex problem; thus introducing an L p (0 < p < 1) norm penalty function to force the relaxation variable to be an integer from 0 to 1; letObtainProgressive optimal sub-problemThe following are provided:
where σ is the penalty parameter, Delta is an arbitrarily small positive number, VariableAndSatisfy constraint
7.2, Defining the iterative mode of punishment parameter as follows: delta v+1=ηδv (eta > 1) such that penalty term P δ (y) converges to 0 at a linear rate;
7.3 due to The penalty term in (a) is non-convex, resulting inDifficult to solve, and/>, adopting a continuous convex approximation SCA techniqueConversion to a convex problem solution, willFirst-order taylor expansion of penalty term of (2), i.e
Where y v is the optimal solution for the last SCA iteration,Is the gradient value of P δ (y) at y v to y;
7.4, in the v+1st SCA iteration, Eventually becomes a convex problem, namely:
7.5, according to the above steps to P 1-S as Is obtained by solving for P 1-S Further obtaining the maximum value of the reward function and finally obtaining the deployment result/>, of each time slot SFC according to the maximum reward
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211012894.7A CN115767562B (en) | 2022-08-23 | 2022-08-23 | Service function chain deployment method based on reinforcement learning joint coordinated multi-point transmission |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211012894.7A CN115767562B (en) | 2022-08-23 | 2022-08-23 | Service function chain deployment method based on reinforcement learning joint coordinated multi-point transmission |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115767562A CN115767562A (en) | 2023-03-07 |
CN115767562B true CN115767562B (en) | 2024-06-21 |
Family
ID=85349254
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211012894.7A Active CN115767562B (en) | 2022-08-23 | 2022-08-23 | Service function chain deployment method based on reinforcement learning joint coordinated multi-point transmission |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115767562B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116599687B (en) * | 2023-03-15 | 2023-11-24 | 中国人民解放军61660部队 | Low-communication-delay cascade vulnerability scanning probe deployment method and system |
CN117938669B (en) * | 2024-03-25 | 2024-06-18 | 贵州大学 | Network function chain self-adaptive arrangement method for 6G general intelligent service |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113573320A (en) * | 2021-07-06 | 2021-10-29 | 西安理工大学 | SFC deployment method based on improved actor-critic algorithm in edge network |
WO2022109184A1 (en) * | 2020-11-20 | 2022-05-27 | Intel Corporation | Service function chaining policies for 5g systems |
-
2022
- 2022-08-23 CN CN202211012894.7A patent/CN115767562B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022109184A1 (en) * | 2020-11-20 | 2022-05-27 | Intel Corporation | Service function chaining policies for 5g systems |
CN113573320A (en) * | 2021-07-06 | 2021-10-29 | 西安理工大学 | SFC deployment method based on improved actor-critic algorithm in edge network |
Also Published As
Publication number | Publication date |
---|---|
CN115767562A (en) | 2023-03-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lee et al. | Deep power control: Transmit power control scheme based on convolutional neural network | |
CN111800828B (en) | Mobile edge computing resource allocation method for ultra-dense network | |
Zhao et al. | Simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted UAV communications | |
CN115767562B (en) | Service function chain deployment method based on reinforcement learning joint coordinated multi-point transmission | |
De Kerret et al. | Team deep neural networks for interference channels | |
CN113222179A (en) | Federal learning model compression method based on model sparsification and weight quantization | |
CN113162666B (en) | Intelligent steel-oriented large-scale MIMO hybrid precoding method and device | |
Xue et al. | Cooperative deep reinforcement learning enabled power allocation for packet duplication URLLC in multi-connectivity vehicular networks | |
Yuan et al. | Adapting to dynamic LEO-B5G systems: Meta-critic learning based efficient resource scheduling | |
Elbir et al. | Hybrid federated and centralized learning | |
CN113613301B (en) | Air-ground integrated network intelligent switching method based on DQN | |
Li et al. | Deep neural network based computational resource allocation for mobile edge computing | |
CN116436512A (en) | Multi-objective optimization method, system and equipment for RIS auxiliary communication | |
Siddiqi et al. | Deep reinforcement based power allocation for the max-min optimization in non-orthogonal multiple access | |
Razavikia et al. | Blind asynchronous over-the-air federated edge learning | |
Wang et al. | Federated learning for precoding design in cell-free massive mimo systems | |
Wu et al. | Client Selection and Cost-Efficient Joint Optimization for NOMA-Enabled Hierarchical Federated Learning | |
CN117320075A (en) | Edge computing network deployment and resource management method for water area ship | |
Zhang et al. | Transformer-based channel prediction for rate-splitting multiple access-enabled vehicle-to-everything communication | |
Huang et al. | Wireless federated learning over MIMO networks: Joint device scheduling and beamforming design | |
de Kerret et al. | Decentralized deep scheduling for interference channels | |
Waraiet et al. | Deep Reinforcement Learning-based Robust Design for an IRS-assisted MISO-NOMA System | |
Wang et al. | Deep transfer reinforcement learning for resource allocation in hybrid multiple access systems | |
Zhang et al. | Reinforcement Learning-Based Offloading for RIS-Aided Cloud-Edge Computing in IoT Networks: Modeling, Analysis and Optimization | |
CN117746172A (en) | Heterogeneous model polymerization method and system based on domain difference perception distillation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |