CN117938959A - Multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm - Google Patents

Multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm Download PDF

Info

Publication number
CN117938959A
CN117938959A CN202410048027.1A CN202410048027A CN117938959A CN 117938959 A CN117938959 A CN 117938959A CN 202410048027 A CN202410048027 A CN 202410048027A CN 117938959 A CN117938959 A CN 117938959A
Authority
CN
China
Prior art keywords
request
vnf
sfc
network
delay
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410048027.1A
Other languages
Chinese (zh)
Inventor
王然
赵佳亮
吴强
郝洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202410048027.1A priority Critical patent/CN117938959A/en
Publication of CN117938959A publication Critical patent/CN117938959A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm, belonging to the service function chain arrangement technology. The method comprises the steps of firstly constructing a physical network model and an SFC request model, establishing a mapping relation between system models, and using binary decision variablesIndicating whether VNF i in request r can be deployed on server node v. Then a mathematical model of the SFC deployment problem is constructed, including optimization objectives and constraints of the problem, and the concept of time slot τ is introduced, and τ r,a and τ r,1 are used to represent the arrival time and the lifetime of the request, respectively, including optimization of three objectives, minimizing delay, minimizing cost, and maximizing the overall throughput of the request. Modeling the SFC deployment problem as a Markov decision process model to describe the change of the network state; finally, combining deep reinforcement learning and multi-objective genetic algorithm to design a layout scheme.

Description

Multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm
Technical Field
The invention belongs to a service function chain arrangement technology, and particularly relates to a multi-target service function chain deployment method based on combination of deep reinforcement learning and genetic algorithm.
Background
In NFV systems, service requests are typically represented by a Service Function Chain (SFC), which consists of a set of Virtual Network Functions (VNFs) that are executed in a strict order. The service function chain is used as an important bearing form of the network and the service, and the virtualized network service function is reasonably arranged, so that the complete end-to-end network service is constructed. In the 4G age, SFC deployment has focused mainly on a single optimization objective, such as maximization of resource utilization. However, in the 5G and 6G ages, the advent of the extremely diverse business requirements and application scenarios will drive a profound revolution in the evolution of mobile communications. This transition will focus on facilitating SFC deployment to achieve a number of optimization goals, including key metrics such as latency, cost, and throughput. These extreme scenarios and demands include high precision industries with higher reliability and security, augmented Reality (AR) with higher communication speeds and lower latency, virtual Reality (VR), high definition video streaming media, etc.
Currently, the deployment problem of SFC has been largely solved and proved to be the NP-hard problem, as shown in FIG. 1. In a multi-objective SFC deployment problem, multi-objective evolutionary algorithms and deep reinforcement learning are typically used. Both algorithms have their own advantages and disadvantages. A multi-objective evolutionary algorithm, such as the NSGA-II algorithm, can simultaneously optimize multiple objective functions without converting the multiple objective functions into a single objective function, and can effectively generate a pareto front and a set of pareto optimal solutions. However, the performance of the algorithm depends largely on the quality of the initial population, and a high quality initial population generally helps the algorithm converge to a better solution faster. The multi-objective evolutionary algorithm requires a large amount of iteration and computation time to obtain the initial solution at the same time. Deep reinforcement learning has great potential in solving the multi-objective optimization problem because it can learn the strategy of control directly from some high-dimensional initial data without manual intervention. Training deep neural networks, however, requires significant computational resources and time. Thus, there is an urgent need for a more efficient deployment algorithm to achieve dynamic and efficient deployment of a multi-target service function chain.
The invention provides a MOERL deployment method for solving a multi-target functional chain, which takes minimum time delay, maximum request acceptance rate and minimum cost as optimization targets, establishes a multi-target SFC deployment model, and researches an intelligent deployment strategy of the multi-target service chain based on the combination of deep reinforcement learning and genetic algorithm.
Disclosure of Invention
The invention aims to: in order to solve the limitations of the existing SFC deployment algorithm and the problems that single-target optimization is difficult to meet the requirements of low time delay, high acceptance rate, low cost and the like, the invention provides a deployment method of a multi-target service function chain based on combination of deep reinforcement learning and genetic algorithm, which realizes three optimization targets of minimum time delay, maximum request acceptance rate and minimum cost.
The technical scheme is as follows: a multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm comprises the following steps:
s1, constructing a system model, wherein the system model comprises a physical network model and an SFC model, and further comprises a mapping relation between the system models;
in the system model, for SFC requests in the network, the deployment of the VNF is required to meet the specific service requirement of each SFC request, and for the VNF, the CPU requirement and the memory requirement are considered, and the VNF is determined by binary decision variables Indicating whether VNF i in request r can be deployed on server node v;
S2, constructing a mathematical model of the SFC deployment problem, wherein the mathematical model comprises an optimization target and constraint conditions of the problem, and the optimization target comprises minimizing delay, minimizing cost and maximizing total throughput of the request;
The method introduces a time slot τ in which the state of whether the request r is still being serviced is represented by a binary variable x r,τ to solve the real-time network dynamics problem caused by the random arrival and departure of the request:
Where τ r,a and τ r,1 represent the arrival time and the time to live of the request r, respectively;
since a single node may deploy multiple instances of VNF services to handle multiple requests, binary decision variables are used To represent the number of VNF i service instances deployed on node v:
Where r (i) denotes an ith VNF requesting r, denoted as f i;
the objective function of minimizing the delay is expressed as follows:
Where T d represents the total response delay, including the communication delay on the link and the processing delay on the server node;
The objective function of the minimized cost is expressed as follows:
Wherein λ v and λ B represent unit costs of server resources and bandwidth, respectively; c v represents the CPU and memory capacity available on the v-th server node; Indicating whether VNF i in request r can be deployed on server node v; b v denotes the total output bandwidth of the server v,/> Indicating whether the virtual link l i in request r can be mapped onto physical link e;
The method uses a binary decision variable y r to indicate whether the request r is accepted, as follows:
Where T d represents the total response delay, and T r represents the maximum tolerated delay;
The objective function to maximize the total throughput of the request is expressed as follows:
Where B r represents the bandwidth requirement of request r, τ r,1 represents the lifetime of request r.
(3) Modeling the SFC deployment problem as a Markov decision process model to describe the change in network state;
The Markov decision process model is expressed as < S, A, P, R, gamma >, corresponding to the state space, the action space, the state transition probability, the reward function and the discount factor, respectively, in each decision event, the agent observes the system state S (t) and selects action a (t), by selecting action a (t), obtaining the reward R (t) and the next system state S (t+1); wherein the reward r (t) is used to evaluate the effectiveness of the action a (t);
(4) Combining deep reinforcement learning with a multi-target genetic algorithm to design a layout scheme, and specifically constructing an Actor-Critic neural network based on a deep reinforcement learning method; the network actor is responsible for generating a VNF deployment strategy, the network critic is used for evaluating the value of the generation strategy, training is carried out through near-end strategy optimization, different deployment schemes obtained through training are used as initial populations of an NSGA-II algorithm, and the deployment schemes are continuously optimized through the NSGA-II algorithm.
Further, step S1 includes:
Representing the NFV network as an undirected graph g= (V, E), where V represents a set of server nodes, E represents a set of physical links connecting any two nodes, and each server is capable of instantiating one or more virtual machines to support different types of VNFs, the set of virtual machines supporting VNFs being denoted m= { M 1,m2,...,m|M| }; each server has a fixed resource capacity, including computing resources and storage resources; Representing available CPU and memory capacity on the v-th server node; b v denotes the total output bandwidth of server v, T e denotes the communication delay of link e;
Representing one SFC request sequence in the network as r= { R i,r2,...,rR }, for different service requests, a series of VNFs need to be deployed to meet the specific service requirement of each SFC request, the set of VNFs needed for service is represented as f= { F 1,f2,...,fN }, each VNF has specific resource requirements, and thus is used Respectively representing the CPU requirement and the memory requirement of the VNF; one SFC request is denoted { f 1,f2,...,fk }, which means that the SFC request will pass k VNs in turn;
Considering that each SFC request has special QoS requirements, including bandwidth requirement B r and maximum delay tolerance T r;
Further, the virtual link between VNFs connected to the SFC request is denoted as l= { L 1,l2,...,lN-1 }, where L i is the i-th virtual link connecting VNF i and VNF i+1 in the SFC request.
Further, step S1 is based on binary decision variablesThe specific handling of whether or not a VNF is successfully deployed on a serving node is:
if VNFf i in the request r was successfully deployed on the server node v, Otherwise,/>Similarly,/>Indicating whether the virtual link l i in request r can be mapped onto physical link e;
If the virtual link l i in request r successfully maps onto physical link e, Otherwise,/>
Further, step S2 further specifically includes:
(21) In the case of sufficient resources, multiple VNFs may be deployed on the same server node, so the resource constraints of the server node are:
(22) Since the total output bandwidth of a server node sets an upper limit for the bandwidth requirements of all requests passing through that node, the bandwidth constraint is:
(23) The total response delay, including the communication delay on the link and the processing delay on the server node, is denoted by T d, the communication delay on the link T c is denoted as:
(24) The processing delay of the VNF is affected by the computing power of the virtual machine and the VNF type, which may cause the processing delay of different virtual machines to be different; the processing rate of request r on virtual machine m is therefore denoted v m, and the total processing delay T p is denoted:
Where λ represents the average arrival rate of the request r, α m represents the CPU sharing rate of the virtual machine m to the request r, δ m represents the maximum aggregate processing capacity of the virtual machine m, and D m represents the processing density of the virtual machine m to the request r, assuming that the arrival of the request follows the poisson process;
thus, the total response delay T d is:
Td=Tc+Tp
The time delay constraint is as follows:
further, the step S3 specifically includes:
(31) System status representation
The method represents the system state at time slot t as
For each state in the system, which should include the characteristics of the physical network and the characteristics of the request being processed, the state is defined as a set of vectorsWherein/>Representing the available resources of each node,/>Representing available link bandwidth resources, M t represents characteristics of the request being processed, including resource demand/>Bandwidth requirement/>Request the number of undeployed VNFs in r/>Residual delay space/>Requesting the survival time Pr of r;
(32) Motion representation
Action a is represented as a set of server indexes a= { a|a e {0,1,2,., |n| }, which points to an index of a server node, which means that VNF has been successfully deployed on the a-th server node;
where a=0 indicates that VNFs cannot be deployed on any node, for each VNF there are (|n|+1) actions;
(33) Reward setting
Based on three goals of minimizing delay, minimizing cost, and maximizing total throughput of requests that require co-optimization, the reward function is a weighted sum of total accepted request throughput minus a weighted sum of total deployment cost and a weighted sum of total response delay, expressed as:
rt=αBrτr,1-βC(τ)-ωTd
where α, β and ω are weighting factors for each target, and C (τ) is the cost of each slot τ;
Thus, the total prize is:
Where gamma E [0,1] is a discount factor representing the discount level of future rewards.
Further, the step S4 specifically includes:
Extracting the state of the physical network and the requested features as states and inputting the states and the features into the actor network, changing the states and the requested features into single-column vectors through a hidden layer, mapping the single-column vectors into vectors in a (0, 1) interval through a softmax layer, wherein each value in the vectors represents the probability of the VNF being deployed on a corresponding server node, and selecting the node with the highest probability to deploy the VNF, namely the output pi θ(st,at of the actor network; the output Q π(st,at of critic network) is used to evaluate the value of policy pi θ(st,at); selecting a near-end strategy optimization algorithm to train the neural network, wherein the near-end strategy optimization aims at updating parameters of the strategy network, so that an action sequence generated under the current strategy can obtain higher accumulated rewards;
The policy pi is expressed as a continuous function pi θ (s, a) =p (a|s, θ), representing the probability of taking action a in state s, the network is updated by constructing a loss function.
Further, actor network loss functions in near-end policy optimization are typically calculated based on KL divergence; critic network loss functions typically use TD errors;
the loss functions of actor and critic networks are shown below, respectively:
Wherein pi θ (s, a) represents the action selection probability of the new policy, Representing the action selection probability of the old policy, A (s, a) representing the dominance function for measuring action dominance, KL (θ, θ old) representing the KL divergence of the new policy relative to the old policy, the parameter β being the hyper-parameter controlling the KL divergence weight,/>Representing critic the predicted state values of the network.
Further, the method also includes employing a multi-objective optimization algorithm for resolving optimization problems having multiple conflicting objectives; the multi-objective optimization algorithm is based on a genetic algorithm principle, and comprises the following calculation steps:
Objective function: minimizing latency, minimizing deployment costs, and maximizing the overall throughput of accepted requests;
Population: in each time slot tau, integrating all the requested deployment schemes as an initial population of the multi-objective optimization algorithm;
Individual coding: representing each selected chromosome with an integer within the interval [1, N ], where N is the number of servers, each value corresponding to a server number that can carry a VNF;
Crossing: the crossover operation adopts single-point crossover, the genes of two individuals are segmented at a randomly selected crossover point, and then the segmented parts are exchanged, so that a new individual is generated;
mutation: mutation operations take single point mutations by introducing randomness in order to explore new solutions in the search space.
The beneficial effects are that: in order to ensure different QoS requirements of different network services, the invention expands the traditional SFC single-target optimization model into a multi-target optimization model, and solves the problems of delay, deployment cost and throughput in the existing work. Meanwhile, in order to solve the limitations of the existing SFC deployment algorithm, the invention provides a two-stage solution MOERL algorithm, and the efficiency and effectiveness of SFC deployment are improved.
Drawings
FIG. 1 is a schematic diagram of a service function chain deployment;
FIG. 2 is a flow framework of MOERL algorithm;
FIG. 3 is a comparative schematic of the Pareto front obtained by MOERL and NSGA-II, DRL methods.
Detailed Description
For a detailed description of the disclosed embodiments, the invention is further described below with reference to the accompanying drawings and examples.
In order to solve the limitations of the existing SFC deployment algorithm and the problems that single-target optimization is difficult to meet the requirements of low time delay, high acceptance rate, low cost and the like, the invention provides a deployment method of a multi-target service function chain based on combination of deep reinforcement learning and genetic algorithm, which realizes three optimization targets of minimum time delay, maximum request acceptance rate and minimum cost. The method specifically comprises the following steps:
(1) Constructing a system model, which mainly comprises a physical network model, an SFC request model and a mapping relation;
The present invention represents the NFV network as an undirected graph g= (V, E), where V represents a set of server nodes, E represents a set of physical links connecting any two nodes, V i represents an i-th server node, E i represents a j-th physical link. Each server is able to instantiate multiple virtual machines to support different types of VNFs, the set of virtual machines supporting VNFs being denoted m= { M 1,m2,...,m|M| }. Each server has a fixed resource capacity, including computing resources and storage resources. Representing the available CPU and memory capacity on the v-th server node. B v denotes the total output bandwidth of the server v, and T e denotes the communication delay of the link e.
The invention represents an SFC request sequence in the network as r= { R 1,r2,...,rR }. For different service requests, a series of VNFs needs to be deployed to meet the specific service needs of each SFC request. The set of VNFs required for service is denoted as f= { F 1,f2,...,fN }, each VNF has a specific resource requirement and is thus usedRespectively representing the CPU requirements and the memory requirements of the VNF. One SFC request is denoted { f 1,f2,...fk }, which means that the SFC request will pass k VNs in turn. Each SFC request has special QoS requirements including bandwidth requirement B r and maximum delay tolerance T r. Further, the virtual link between VNFs connecting SFC requests is denoted as l= { L 1,l2,...,lN-1 }, where L i is the i-th virtual link connecting VNFf i and VNFf i+1 in the SFC requests.
Whether a VNF is successfully deployed on a server node depends on whether there are sufficient resources. Using binary decision variablesIndicating whether VNFf i in the request r can be deployed on the server node v. In detail, if VNFf i in the request r is successfully deployed on the server node v,/>Otherwise,/>Similarly,/>Indicating whether the virtual link l i in request r can be mapped onto physical link e. If the virtual link l i in request r successfully maps onto physical link e,Otherwise,/>
(2) Constructing a mathematical model of the SFC deployment problem, wherein the mathematical model comprises an optimization target and constraint conditions of the problem;
To address the real-time network dynamics caused by random arrival and departure of requests, the present invention introduces the concept of time slot τ, using τ r,a and τ r,1 to represent the arrival time and the time-to-live of the request, respectively. In slot τ, the state of whether the request r is still being serviced is represented by a binary variable x r,τ:
since a single node may deploy multiple instances of VNF services to handle multiple requests, use To represent the number of VNFf i service instances deployed on node v:
where r (i) represents the i-th VNF requesting r.
First, in case of sufficient resources, multiple VNFs may be deployed on the same server node. The resource constraints of the server node are therefore:
Second, since the total output bandwidth of a server node sets an upper limit for the bandwidth demand of all requests through that node, the bandwidth constraint is:
Finally, the total response delay is denoted by T d, including the communication delay on the link and the processing delay on the server node. The communication delay T c on the link is denoted as:
The processing delay of the VNF is affected by the computing power of the virtual machine and the VNF type, which may result in different processing delays of different virtual machines. The processing rate of request r on virtual machine m is therefore denoted v m, and the total processing delay T p is denoted:
Where λ represents the average arrival rate of the request r, α m represents the CPU sharing rate of the virtual machine m to the request r, δ m represents the maximum aggregate processing capacity of the virtual machine m, and D m represents the processing density of the virtual machine m to the request r, assuming that the arrival of the request follows the poisson process.
Thus, the total response delay T d is:
Td=Tc+Tp
The time delay constraint is as follows:
The method of the present invention proposes three optimization objective functions, target 1 is minimizing delay, target 2 is minimizing cost, and target 3 is maximizing the total throughput of the request.
The expression of objective function 1 is as follows:
The expression of objective function 2 is as follows:
Where λ v and λ B represent the unit cost of server resources and bandwidth, respectively.
The method of the present invention uses a binary decision variable y r to indicate whether the request r is accepted, as follows:
The expression of objective function 3 is as follows:
(3) Modeling the SFC deployment problem as a Markov decision process model to describe the change in network state;
With the above preparation work, a markov decision process model is then proposed, and the mathematical representation of the markov decision process model is < S, a, P, R, γ >, which correspond to the state space, the action space, the state transition probability, the reward function, and the discount factor, respectively. In each decision event, the agent observes the system state s (t) and selects action a (t), by selecting action a (t), the reward r (t) and the next system state s (t+1) are obtained. The prize r (t) may be used to evaluate the validity of the action a (t). The method comprises the following steps:
1)STATE
For each state in the system, the characteristics of the physical network and the characteristics of the request being processed should be included. Defining a state as a set of vectors Wherein/>Representing the available resources of each node,/>Indicating available link bandwidth resources. M t characterizes the request being processed, including resource requirements/>Bandwidth requirement/>Request the number of undeployed VNFs in r/>Residual delay space/>The lifetime P r of the request r. Thus, the system state at time slot t is denoted/>
2)ACTION
Action a is represented as a set of server indexes a= { a|a e {0,1,2, |n| }, which points to an index of a server node, which means that VNF has been successfully deployed on the a-th server node. a=0 indicates that VNF cannot be deployed on any node. For each VNF there are (|n|+1) actions.
3)REWARD
In view of co-optimizing these three objectives, the reward function is a weighted sum of the total accepted request throughput minus a weighted sum of the total deployment costs and a weighted sum of the total response delays, which can be expressed as:
rt=αBrτr,1-βC(τ)-ωTd
where α, β and ω are weighting factors for each target and C (τ) is the cost of each slot τ.
Thus, the total prize is:
Where gamma E [0,1] is a discount factor representing the discount level of future rewards.
(4) The invention combines deep reinforcement learning and multi-objective genetic algorithm, and designs a reasonable and efficient arrangement scheme (MOERL). As shown in fig. 2, an Actor-Critic neural network was constructed using a deep reinforcement learning method. The actor network is responsible for generating VNF deployment policies and the critic network is used to evaluate the value of the generation policies. Training is carried out through near-end strategy optimization, different deployment schemes obtained through training are used as initial populations of NSGA-II algorithm, and the deployment schemes are continuously optimized through NSGA-II algorithm, specifically:
MOERL agents are actor networks and critic networks, where the input states, output actions, and generation of VNF placement policies for actor networks are used to approximate the policy model pi (s, a). Whereas the input state of critic network, the output cost function evaluates the value of the policy for approximating the function Q π (s, a). First, the state of the physical network and the requested features are extracted as states and input into the actor network, then they are changed into single-column vectors by the hidden layer, then mapped into vectors of (0, 1) intervals by the sofimax layer, wherein each value in the vectors represents the probability of the VNF being deployed onto the corresponding server node, and the node with the highest probability is selected to deploy the VNF, i.e., the output pi θ(st,at of the actor network. The output Q π(st,at of critic network) is used to evaluate the value of policy pi θ(st,at). The neural network is trained by selecting a near-end strategy optimization algorithm, wherein the near-end strategy optimization aims at updating parameters of the strategy network, so that an action sequence generated under the current strategy can obtain higher accumulated rewards. We express the policy pi as a continuous function pi θ (s, a) =p (a|s, θ), representing the probability of taking action a in state s. The network is updated by constructing a loss function. Actor network loss functions in the near-end policy optimization are typically calculated based on KL divergence. critic network loss functions typically use TD errors. The two loss functions are shown below:
Wherein pi θ (s, a) represents the action selection probability of the new policy, Representing the action selection probability of the old policy, A (s, a) representing the dominance function for measuring action dominance, KL (θ, θ old) representing the KL divergence of the new policy relative to the old policy, the parameter β being the hyper-parameter controlling the KL divergence weight,/>Representing critic the predicted state values of the network.
NSGA-II (Non-dominated Sorting Genetic Algorithm II) is a multi-objective optimization algorithm based on the principle of genetic algorithm to solve optimization problems with multiple conflicting objectives. The method generates a group of approximate optimal solution sets through non-dominant sorting, crowding degree distance and other technologies, effectively balances the diversity and convergence of knowledge, and is suitable for solving complex multi-objective problems. Since the DRL generated deployment scheme is constraint-satisfying, the initial population efficiency is higher than that generated by directly using NSGA-II algorithm. NSGA-II mainly comprises the following aspects:
(a) Objective function: minimizing latency, minimizing deployment costs, and maximizing the overall throughput of accepted requests.
(B) Population: in each time slot τ, all requested deployment schemes are integrated as the initial population for NSGA-II.
(C) Individual coding: we denote each selected chromosome by an integer within the interval 1, N, where N is the number of servers, each value corresponding to a server number that can carry a VNF.
(D) Crossing: the crossover operation takes the form of a single point crossover, i.e., the segmentation of the genes of two individuals at a randomly selected crossover point, followed by the exchange of the segmented portions, thereby creating new individuals.
(E) Mutation: mutation operation adopts single point mutation, and mutation refers to a process that one or more gene values in individual gene codes are randomly changed. It is used to introduce some randomness in order to explore new solutions in the search space.
NSGA-II iteratively performs selection, crossover and mutation operations to evolve populations in multiple generations, maintaining a diverse and high quality solution set at the pareto front. The fitness value is calculated based on three optimization objectives that minimize delay, minimize cost, and maximize the total throughput of accepted requests. The algorithm aims to find a set of solutions that provide trade-offs between these conflicting goals, resulting in pareto fronts that represent solutions for different deployment strategies of the SFC.
Examples: 1-3, simulation experiments were performed using a PyTorch framework based Python programming language running on a computer equipped with a CPU processor of 2.10GHz, 12th GenIntel (R) Core (TM) i7-12700F, and 16GB of memory. The relevant parameters for the simulation are as follows: the network topology has 12 nodes and 15 bi-directional links. To generate the SFC request, there are 6 types VNFs in the network. The VNFs number per SFC was set to [1,6]. The minimum bandwidth required for SFC requests is randomly distributed at [10,30] Mbps. Furthermore, the maximum tolerated delay per SFC request is between [20,50] ms. Each VNF requires support of CPU and memory resources on the server node, the capacity required for each VNF is between [1,20] cores and [2,4] gb, respectively. NSGA-II parameters: the population size was set to 100; the mutation probability was 20%; the crossover probability is 90%; the maximum algebra is set to 100.
To evaluate MOERL performance, the NSGA-II algorithm and DRL of the experiment were compared, and the Pareto fronts of the three algorithms were compared. The Pareto front obtained by MOERL, DRL, NSGA-II algorithm is shown in fig. 3, compared with the NSGA-II method, the deployment scheme of MOERL method can reduce the delay by 8.3%, reduce the cost by 4.8%, and improve the throughput of accepted requests by 5.5%; the deployment scheme of MOERL method can reduce 16.1% delay compared to DRL method, while reducing 1.3% cost and improving the throughput of accepted requests by 10%. Looking at the three-dimensional Pareto boundary in fig. 3, it is evident that the proposed MOERL method is superior to the rest of the comparison algorithm. This is because the DRL generates a set of initial placement schemes, which can be seen as a preliminary local search. Then, using it as the initial population for NSGA-II can speed up the convergence rate of NSGA-II, since NSGA-II does not need to start searching from a random population. NSGA-II can further optimize these schemes to obtain a better Pareto front. The advantage of this hybrid approach is that it can fully exploit the global search capability of deep reinforcement learning and the diversity maintenance capability of multi-objective genetic algorithms.

Claims (9)

1. The multi-target SFC deployment method based on the deep reinforcement learning and genetic algorithm is characterized by comprising the following steps of:
s1, constructing a system model, wherein the system model comprises a physical network model and an SFC model, and further comprises a mapping relation between the system models;
in the system model, for SFC requests in the network, the deployment of the VNF is required to meet the specific service requirement of each SFC request, and for the VNF, the CPU requirement and the memory requirement are considered, and the VNF is determined by binary decision variables Indicating whether VNF i in request r can be deployed on server node v;
S2, constructing a mathematical model of the SFC deployment problem, wherein the mathematical model comprises an optimization target and constraint conditions of the problem, and the optimization target comprises minimizing delay, minimizing cost and maximizing total throughput of the request;
The method introduces a time slot τ in which the state of whether the request r is still being serviced is represented by a binary variable x r,τ to solve the real-time network dynamics problem caused by the random arrival and departure of the request:
Where τ r,a and τ r,1 represent the arrival time and the time to live of the request r, respectively;
since a single node may deploy multiple instances of VNF services to handle multiple requests, binary decision variables are used To represent the number of VNF i service instances deployed on node v:
Where r (i) denotes an ith VNF requesting r, denoted as f i;
the objective function of minimizing the delay is expressed as follows:
Where T d represents the total response delay, including the communication delay on the link and the processing delay on the server node;
The objective function of the minimized cost is expressed as follows:
Wherein λ v and λ B represent unit costs of server resources and bandwidth, respectively; c v represents the CPU and memory capacity available on the v-th server node; Indicating whether VNF i in request r can be deployed on server node v; b v denotes the total output bandwidth of the server v,/> Indicating whether virtual link 1 i in request r can be mapped onto physical link e;
The method uses a binary decision variable y r to indicate whether the request r is accepted, as follows:
Where T d represents the total response delay, and T r represents the maximum tolerated delay;
The objective function to maximize the total throughput of the request is expressed as follows:
Where B r represents the bandwidth requirement of request r, τ r,1 represents the lifetime of request r;
(3) Modeling the SFC deployment problem as a Markov decision process model to describe the change in network state;
The Markov decision process model is expressed as < S, A, P, R, gamma >, corresponding to the state space, the action space, the state transition probability, the reward function and the discount factor, respectively, in each decision event, the agent observes the system state S (t) and selects action a (t), by selecting action a (t), obtaining the reward R (t) and the next system state S (t+1); wherein the reward r (t) is used to evaluate the effectiveness of the action a (t);
(4) Combining deep reinforcement learning with a multi-target genetic algorithm to design a layout scheme, and specifically constructing an Actor-Critic neural network based on a deep reinforcement learning method; the network actor is responsible for generating a VNF deployment strategy, the network critic is used for evaluating the value of the generation strategy, training is carried out through near-end strategy optimization, different deployment schemes obtained through training are used as initial populations of an NSGA-II algorithm, and the deployment schemes are continuously optimized through the NSGA-II algorithm.
2. The multi-objective SFC deployment method based on deep reinforcement learning and genetic algorithm of claim 1, wherein step S1 comprises:
Representing the NFV network as an undirected graph g= (V, E), where V represents a set of server nodes, E represents a set of physical links connecting any two nodes, and each server is capable of instantiating one or more virtual machines to support different types of VNFs, the set of virtual machines supporting VNFs being denoted m= { M 1,m2,...,m|M| }; each server has a fixed resource capacity, including computing resources and storage resources; Representing available CPU and memory capacity on the v-th server node; b v denotes the total output bandwidth of the server v, and T e denotes the communication delay of the link e.
3. The multi-objective SFC deployment method based on deep reinforcement learning and genetic algorithm according to claim 2, characterized in that the method represents the SFC request sequence in the network as r= { R i,r2,...,rR }, for different service requests, a series of VNFs need to be deployed to meet the specific service requirement of each SFC request, the set of VNFs needed for service is represented as f= { F 1,f2,...,fN }, each VNF has specific resource requirements, and thus is usedRespectively representing the CPU requirement and the memory requirement of the VNF; one SFC request is denoted { f 1,f2,...,fk }, indicating that the SFC request will pass k VNs in turn;
And considering that each SFC request has special QoS requirements, including bandwidth requirement B r and maximum delay tolerance T r;
Further, the virtual link between VNFs connected to the SFC request is denoted as l= { L 1,l2,...,lN-1 }, where L i is the i-th virtual link connecting VNF i and VNF i+1 in the SFC request.
4. The deep reinforcement learning and genetic algorithm-based multi-objective SFC deployment method according to claim 1, wherein step S1 is based on binary decision variablesThe specific handling of whether or not a VNF is successfully deployed on a serving node is:
If VNF i in request r is successfully deployed on server node v, Otherwise,/>
In a similar manner to that described above,Indicating whether virtual link 1 i in request r can be mapped onto physical link e, if virtual link 1 i in request r successfully maps onto physical link e,/>Otherwise,/>
5. The multi-objective SFC deployment method based on deep reinforcement learning and genetic algorithm according to claim 1, wherein step S2 further specifically comprises:
(21) Aiming at the condition that resources meet the requirement of deploying a plurality of VNs on the same server node, the resources of the server node are constrained as follows:
(22) Since the total output bandwidth of a server node sets an upper limit for the bandwidth demand of all requests through that node, the bandwidth constraint is defined as:
(23) The total response delay, including the communication delay on the link and the processing delay on the server node, is denoted by T d, the communication delay on the link T c is denoted as:
(24) The processing delay of the VNF is affected by the computing power of the virtual machine and the VNF type, which may cause the processing delay of different virtual machines to be different; the processing rate of request r on virtual machine m is therefore denoted v m, and the total processing delay T p is denoted:
Where λ represents the average arrival rate of the request r, α m represents the CPU sharing rate of the virtual machine m to the request r, δ m represents the maximum aggregate processing capacity of the virtual machine m, and D m represents the processing density of the virtual machine m to the request r, assuming that the arrival of the request follows the poisson process;
thus, the total response delay T d is:
Td=Tc+Tp
The time delay constraint is as follows:
6. The multi-objective SFC deployment method based on deep reinforcement learning and genetic algorithm according to claim 1, wherein step S3 specifically comprises:
(31) System status representation
The method represents the system state at time slot t as
For each state in the system, which should include the characteristics of the physical network and the characteristics of the request being processed, the state is defined as a set of vectorsWherein/>Representing the available resources of each node,/>Representing available link bandwidth resources, M t represents characteristics of the request being processed, including resource demand/>Bandwidth requirement/>Request the number of undeployed VNFs in r/>Residual delay space/>The survival time P r of the request r;
(32) Motion representation
Action a is represented as a set of server indexes a= { a|a e {0,1,2,., |n| }, which points to an index of a server node, which means that VNF has been successfully deployed on the a-th server node;
where a=0 indicates that VNFs cannot be deployed on any node, for each VNF there are (|n|+1) actions;
(33) Reward setting
Based on three goals of minimizing delay, minimizing cost, and maximizing total throughput of requests that require co-optimization, the reward function is a weighted sum of total accepted request throughput minus a weighted sum of total deployment cost and a weighted sum of total response delay, expressed as:
rt=αBrτr,1-βC(t)-ωTd
where α, β and ω are weighting factors for each target, and C (τ) is the cost of each slot τ;
Thus, the total prize is:
Where gamma E [0,1] is a discount factor representing the discount level of future rewards.
7. The multi-objective SFC deployment method based on deep reinforcement learning and genetic algorithm according to claim 1, wherein step S4 specifically comprises:
Extracting the state of the physical network and the requested features as states and inputting the states and the features into the actor network, changing the states and the requested features into single-column vectors through a hidden layer, mapping the single-column vectors into vectors in a (0, 1) interval through a softmax layer, wherein each value in the vectors represents the probability of the VNF being deployed on a corresponding server node, and selecting the node with the highest probability to deploy the VNF, namely the output pi θ(st,at of the actor network; the output Q π(st,at of critic network) is used to evaluate the value of policy pi θ(st,at); selecting a near-end strategy optimization algorithm to train the neural network, wherein the near-end strategy optimization aims at updating parameters of the strategy network, so that an action sequence generated under the current strategy can obtain higher accumulated rewards;
The policy pi is expressed as a continuous function pi θ (s, a) =p (a|s, θ), representing the probability of taking action a in state s, the network is updated by constructing a loss function.
8. The multi-objective SFC deployment method based on deep reinforcement learning and genetic algorithm according to claim 7, wherein actor network loss functions in the near-end policy optimization are generally calculated based on KL divergence; critic network loss functions typically use TD errors;
the loss functions of actor and critic networks are shown below, respectively:
Wherein pi θ (s, a) represents the action selection probability of the new policy, Representing the action selection probability of the old policy, A (s, a) representing the dominance function for measuring action dominance, KL (θ, θ old) representing the KL divergence of the new policy relative to the old policy, the parameter β being the hyper-parameter controlling the KL divergence weight,/>Representing critic the predicted state values of the network.
9. The deep reinforcement learning and genetic algorithm-based multi-objective SFC deployment method of claim 1, further comprising employing a multi-objective optimization algorithm for solving an optimization problem having a plurality of conflicting objectives; the multi-objective optimization algorithm is based on a genetic algorithm principle, and comprises the following calculation steps:
Objective function: minimizing latency, minimizing deployment costs, and maximizing the overall throughput of accepted requests;
Population: in each time slot tau, integrating all the requested deployment schemes as an initial population of the multi-objective optimization algorithm;
Individual coding: representing each selected chromosome with an integer within the interval [1, N ], where N is the number of servers, each value corresponding to a server number that can carry a VNF;
Crossing: the crossover operation adopts single-point crossover, the genes of two individuals are segmented at a randomly selected crossover point, and then the segmented parts are exchanged, so that a new individual is generated;
mutation: mutation operations take single point mutations by introducing randomness in order to explore new solutions in the search space.
CN202410048027.1A 2024-01-12 2024-01-12 Multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm Pending CN117938959A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410048027.1A CN117938959A (en) 2024-01-12 2024-01-12 Multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410048027.1A CN117938959A (en) 2024-01-12 2024-01-12 Multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm

Publications (1)

Publication Number Publication Date
CN117938959A true CN117938959A (en) 2024-04-26

Family

ID=90762476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410048027.1A Pending CN117938959A (en) 2024-01-12 2024-01-12 Multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm

Country Status (1)

Country Link
CN (1) CN117938959A (en)

Similar Documents

Publication Publication Date Title
CN111445111B (en) Electric power Internet of things task allocation method based on edge cooperation
CN113098714B (en) Low-delay network slicing method based on reinforcement learning
CN112598150B (en) Method for improving fire detection effect based on federal learning in intelligent power plant
CN112995289A (en) Internet of vehicles multi-target computing task unloading scheduling method based on non-dominated sorting genetic strategy
CN111585811B (en) Virtual optical network mapping method based on multi-agent deep reinforcement learning
CN115374853A (en) Asynchronous federal learning method and system based on T-Step polymerization algorithm
CN115686846B (en) Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation
CN112990485A (en) Knowledge strategy selection method and device based on reinforcement learning
CN115934333A (en) Historical data perception-based cloud computing resource scheduling method and system
CN114710439B (en) Network energy consumption and throughput joint optimization routing method based on deep reinforcement learning
CN112036651A (en) Electricity price prediction method based on quantum immune optimization BP neural network algorithm
CN117151482A (en) Emergency material scheduling and path planning method based on multi-objective optimization
CN113285832B (en) NSGA-II-based power multi-mode network resource optimization allocation method
CN115102867A (en) Block chain fragmentation system performance optimization method combined with deep reinforcement learning
Zhao et al. Adaptive swarm intelligent offloading based on digital twin-assisted prediction in VEC
Qin et al. Dynamic IoT service placement based on shared parallel architecture in fog-cloud computing
CN116684291A (en) Service function chain mapping resource intelligent allocation method suitable for generalized platform
CN116566891A (en) Delay-sensitive service function chain parallel route optimization method, device and medium
CN117938959A (en) Multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm
CN114444240B (en) Delay and service life optimization method for cyber-physical system
CN116367190A (en) Digital twin function virtualization method for 6G mobile network
CN115001978A (en) Cloud tenant virtual network intelligent mapping method based on reinforcement learning model
CN112906745B (en) Integrity intelligent network training method based on edge cooperation
CN114006817B (en) VGDT construction method and device oriented to SDN and readable storage medium
CN111290853A (en) Cloud data center scheduling method based on self-adaptive improved genetic algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination