CN117938959A

CN117938959A - Multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm

Info

Publication number: CN117938959A
Application number: CN202410048027.1A
Authority: CN
Inventors: 王然; 赵佳亮; 吴强; 郝洁
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2024-01-12
Filing date: 2024-01-12
Publication date: 2024-04-26

Abstract

The invention discloses a multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm, belonging to the service function chain arrangement technology. The method comprises the steps of firstly constructing a physical network model and an SFC request model, establishing a mapping relation between system models, and using binary decision variablesIndicating whether VNF _i in request r can be deployed on server node v. Then a mathematical model of the SFC deployment problem is constructed, including optimization objectives and constraints of the problem, and the concept of time slot τ is introduced, and τ _r,a and τ _r,1 are used to represent the arrival time and the lifetime of the request, respectively, including optimization of three objectives, minimizing delay, minimizing cost, and maximizing the overall throughput of the request. Modeling the SFC deployment problem as a Markov decision process model to describe the change of the network state; finally, combining deep reinforcement learning and multi-objective genetic algorithm to design a layout scheme.

Description

Multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm

Technical Field

The invention belongs to a service function chain arrangement technology, and particularly relates to a multi-target service function chain deployment method based on combination of deep reinforcement learning and genetic algorithm.

Background

In NFV systems, service requests are typically represented by a Service Function Chain (SFC), which consists of a set of Virtual Network Functions (VNFs) that are executed in a strict order. The service function chain is used as an important bearing form of the network and the service, and the virtualized network service function is reasonably arranged, so that the complete end-to-end network service is constructed. In the 4G age, SFC deployment has focused mainly on a single optimization objective, such as maximization of resource utilization. However, in the 5G and 6G ages, the advent of the extremely diverse business requirements and application scenarios will drive a profound revolution in the evolution of mobile communications. This transition will focus on facilitating SFC deployment to achieve a number of optimization goals, including key metrics such as latency, cost, and throughput. These extreme scenarios and demands include high precision industries with higher reliability and security, augmented Reality (AR) with higher communication speeds and lower latency, virtual Reality (VR), high definition video streaming media, etc.

Currently, the deployment problem of SFC has been largely solved and proved to be the NP-hard problem, as shown in FIG. 1. In a multi-objective SFC deployment problem, multi-objective evolutionary algorithms and deep reinforcement learning are typically used. Both algorithms have their own advantages and disadvantages. A multi-objective evolutionary algorithm, such as the NSGA-II algorithm, can simultaneously optimize multiple objective functions without converting the multiple objective functions into a single objective function, and can effectively generate a pareto front and a set of pareto optimal solutions. However, the performance of the algorithm depends largely on the quality of the initial population, and a high quality initial population generally helps the algorithm converge to a better solution faster. The multi-objective evolutionary algorithm requires a large amount of iteration and computation time to obtain the initial solution at the same time. Deep reinforcement learning has great potential in solving the multi-objective optimization problem because it can learn the strategy of control directly from some high-dimensional initial data without manual intervention. Training deep neural networks, however, requires significant computational resources and time. Thus, there is an urgent need for a more efficient deployment algorithm to achieve dynamic and efficient deployment of a multi-target service function chain.

The invention provides a MOERL deployment method for solving a multi-target functional chain, which takes minimum time delay, maximum request acceptance rate and minimum cost as optimization targets, establishes a multi-target SFC deployment model, and researches an intelligent deployment strategy of the multi-target service chain based on the combination of deep reinforcement learning and genetic algorithm.

Disclosure of Invention

The invention aims to: in order to solve the limitations of the existing SFC deployment algorithm and the problems that single-target optimization is difficult to meet the requirements of low time delay, high acceptance rate, low cost and the like, the invention provides a deployment method of a multi-target service function chain based on combination of deep reinforcement learning and genetic algorithm, which realizes three optimization targets of minimum time delay, maximum request acceptance rate and minimum cost.

The technical scheme is as follows: a multi-target SFC deployment method based on deep reinforcement learning and genetic algorithm comprises the following steps:

s1, constructing a system model, wherein the system model comprises a physical network model and an SFC model, and further comprises a mapping relation between the system models;

in the system model, for SFC requests in the network, the deployment of the VNF is required to meet the specific service requirement of each SFC request, and for the VNF, the CPU requirement and the memory requirement are considered, and the VNF is determined by binary decision variables Indicating whether VNF _i in request r can be deployed on server node v;

S2, constructing a mathematical model of the SFC deployment problem, wherein the mathematical model comprises an optimization target and constraint conditions of the problem, and the optimization target comprises minimizing delay, minimizing cost and maximizing total throughput of the request;

The method introduces a time slot τ in which the state of whether the request r is still being serviced is represented by a binary variable x _r,τ to solve the real-time network dynamics problem caused by the random arrival and departure of the request:

Where τ _r,a and τ _r,1 represent the arrival time and the time to live of the request r, respectively;

since a single node may deploy multiple instances of VNF services to handle multiple requests, binary decision variables are used To represent the number of VNF _i service instances deployed on node v:

Where r (i) denotes an ith VNF requesting r, denoted as f _i;

the objective function of minimizing the delay is expressed as follows:

Where T _d represents the total response delay, including the communication delay on the link and the processing delay on the server node;

The objective function of the minimized cost is expressed as follows:

Wherein λ _v and λ _B represent unit costs of server resources and bandwidth, respectively; c _v represents the CPU and memory capacity available on the v-th server node; Indicating whether VNF _i in request r can be deployed on server node v; b _v denotes the total output bandwidth of the server v,/> Indicating whether the virtual link l _i in request r can be mapped onto physical link e;

The method uses a binary decision variable y _r to indicate whether the request r is accepted, as follows:

Where T _d represents the total response delay, and T _r represents the maximum tolerated delay;

The objective function to maximize the total throughput of the request is expressed as follows:

Where B _r represents the bandwidth requirement of request r, τ _r,1 represents the lifetime of request r.

(3) Modeling the SFC deployment problem as a Markov decision process model to describe the change in network state;

The Markov decision process model is expressed as < S, A, P, R, gamma >, corresponding to the state space, the action space, the state transition probability, the reward function and the discount factor, respectively, in each decision event, the agent observes the system state S (t) and selects action a (t), by selecting action a (t), obtaining the reward R (t) and the next system state S (t+1); wherein the reward r (t) is used to evaluate the effectiveness of the action a (t);

(4) Combining deep reinforcement learning with a multi-target genetic algorithm to design a layout scheme, and specifically constructing an Actor-Critic neural network based on a deep reinforcement learning method; the network actor is responsible for generating a VNF deployment strategy, the network critic is used for evaluating the value of the generation strategy, training is carried out through near-end strategy optimization, different deployment schemes obtained through training are used as initial populations of an NSGA-II algorithm, and the deployment schemes are continuously optimized through the NSGA-II algorithm.

Further, step S1 includes:

Representing the NFV network as an undirected graph g= (V, E), where V represents a set of server nodes, E represents a set of physical links connecting any two nodes, and each server is capable of instantiating one or more virtual machines to support different types of VNFs, the set of virtual machines supporting VNFs being denoted m= { M ₁,m₂,...,m_|M| }; each server has a fixed resource capacity, including computing resources and storage resources; Representing available CPU and memory capacity on the v-th server node; b _v denotes the total output bandwidth of server v, T _e denotes the communication delay of link e;

Representing one SFC request sequence in the network as r= { R _i,r₂,...,r_R }, for different service requests, a series of VNFs need to be deployed to meet the specific service requirement of each SFC request, the set of VNFs needed for service is represented as f= { F ₁,f₂,...,f_N }, each VNF has specific resource requirements, and thus is used Respectively representing the CPU requirement and the memory requirement of the VNF; one SFC request is denoted { f ₁,f₂,...,f_k }, which means that the SFC request will pass k VNs in turn;

Considering that each SFC request has special QoS requirements, including bandwidth requirement B _r and maximum delay tolerance T _r;

Further, the virtual link between VNFs connected to the SFC request is denoted as l= { L ₁,l₂,...,l_N-1 }, where L _i is the i-th virtual link connecting VNF _i and VNF _i+1 in the SFC request.

Further, step S1 is based on binary decision variablesThe specific handling of whether or not a VNF is successfully deployed on a serving node is:

if VNFf _i in the request r was successfully deployed on the server node v, Otherwise,/>Similarly,/>Indicating whether the virtual link l _i in request r can be mapped onto physical link e;

If the virtual link l _i in request r successfully maps onto physical link e, Otherwise,/>

Further, step S2 further specifically includes:

(21) In the case of sufficient resources, multiple VNFs may be deployed on the same server node, so the resource constraints of the server node are:

(22) Since the total output bandwidth of a server node sets an upper limit for the bandwidth requirements of all requests passing through that node, the bandwidth constraint is:

(23) The total response delay, including the communication delay on the link and the processing delay on the server node, is denoted by T _d, the communication delay on the link T _c is denoted as:

(24) The processing delay of the VNF is affected by the computing power of the virtual machine and the VNF type, which may cause the processing delay of different virtual machines to be different; the processing rate of request r on virtual machine m is therefore denoted v _m, and the total processing delay T _p is denoted:

Where λ represents the average arrival rate of the request r, α _m represents the CPU sharing rate of the virtual machine m to the request r, δ _m represents the maximum aggregate processing capacity of the virtual machine m, and D _m represents the processing density of the virtual machine m to the request r, assuming that the arrival of the request follows the poisson process;

thus, the total response delay T _d is:

T_d＝T_c+T_p

The time delay constraint is as follows:

further, the step S3 specifically includes:

(31) System status representation

The method represents the system state at time slot t as

For each state in the system, which should include the characteristics of the physical network and the characteristics of the request being processed, the state is defined as a set of vectorsWherein/>Representing the available resources of each node,/>Representing available link bandwidth resources, M _t represents characteristics of the request being processed, including resource demand/>Bandwidth requirement/>Request the number of undeployed VNFs in r/>Residual delay space/>Requesting the survival time Pr of r;

(32) Motion representation

Action a is represented as a set of server indexes a= { a|a e {0,1,2,., |n| }, which points to an index of a server node, which means that VNF has been successfully deployed on the a-th server node;

where a=0 indicates that VNFs cannot be deployed on any node, for each VNF there are (|n|+1) actions;

(33) Reward setting

Based on three goals of minimizing delay, minimizing cost, and maximizing total throughput of requests that require co-optimization, the reward function is a weighted sum of total accepted request throughput minus a weighted sum of total deployment cost and a weighted sum of total response delay, expressed as:

r_t＝αB_rτ_r,1-βC(τ)-ωT_d

where α, β and ω are weighting factors for each target, and C (τ) is the cost of each slot τ;

Thus, the total prize is:

Where gamma E [0,1] is a discount factor representing the discount level of future rewards.

Further, the step S4 specifically includes:

Extracting the state of the physical network and the requested features as states and inputting the states and the features into the actor network, changing the states and the requested features into single-column vectors through a hidden layer, mapping the single-column vectors into vectors in a (0, 1) interval through a softmax layer, wherein each value in the vectors represents the probability of the VNF being deployed on a corresponding server node, and selecting the node with the highest probability to deploy the VNF, namely the output pi _θ(s_t,a_t of the actor network; the output Q _π(s_t,a_t of critic network) is used to evaluate the value of policy pi _θ(s_t,a_t); selecting a near-end strategy optimization algorithm to train the neural network, wherein the near-end strategy optimization aims at updating parameters of the strategy network, so that an action sequence generated under the current strategy can obtain higher accumulated rewards;

The policy pi is expressed as a continuous function pi _θ (s, a) =p (a|s, θ), representing the probability of taking action a in state s, the network is updated by constructing a loss function.

Further, actor network loss functions in near-end policy optimization are typically calculated based on KL divergence; critic network loss functions typically use TD errors;

the loss functions of actor and critic networks are shown below, respectively:

Wherein pi _θ (s, a) represents the action selection probability of the new policy, Representing the action selection probability of the old policy, A (s, a) representing the dominance function for measuring action dominance, KL (θ, θ _old) representing the KL divergence of the new policy relative to the old policy, the parameter β being the hyper-parameter controlling the KL divergence weight,/>Representing critic the predicted state values of the network.

Further, the method also includes employing a multi-objective optimization algorithm for resolving optimization problems having multiple conflicting objectives; the multi-objective optimization algorithm is based on a genetic algorithm principle, and comprises the following calculation steps:

Objective function: minimizing latency, minimizing deployment costs, and maximizing the overall throughput of accepted requests;

Population: in each time slot tau, integrating all the requested deployment schemes as an initial population of the multi-objective optimization algorithm;

Individual coding: representing each selected chromosome with an integer within the interval [1, N ], where N is the number of servers, each value corresponding to a server number that can carry a VNF;

Crossing: the crossover operation adopts single-point crossover, the genes of two individuals are segmented at a randomly selected crossover point, and then the segmented parts are exchanged, so that a new individual is generated;

mutation: mutation operations take single point mutations by introducing randomness in order to explore new solutions in the search space.

The beneficial effects are that: in order to ensure different QoS requirements of different network services, the invention expands the traditional SFC single-target optimization model into a multi-target optimization model, and solves the problems of delay, deployment cost and throughput in the existing work. Meanwhile, in order to solve the limitations of the existing SFC deployment algorithm, the invention provides a two-stage solution MOERL algorithm, and the efficiency and effectiveness of SFC deployment are improved.

Drawings

FIG. 1 is a schematic diagram of a service function chain deployment;

FIG. 2 is a flow framework of MOERL algorithm;

FIG. 3 is a comparative schematic of the Pareto front obtained by MOERL and NSGA-II, DRL methods.

Detailed Description

For a detailed description of the disclosed embodiments, the invention is further described below with reference to the accompanying drawings and examples.

In order to solve the limitations of the existing SFC deployment algorithm and the problems that single-target optimization is difficult to meet the requirements of low time delay, high acceptance rate, low cost and the like, the invention provides a deployment method of a multi-target service function chain based on combination of deep reinforcement learning and genetic algorithm, which realizes three optimization targets of minimum time delay, maximum request acceptance rate and minimum cost. The method specifically comprises the following steps:

(1) Constructing a system model, which mainly comprises a physical network model, an SFC request model and a mapping relation;

The present invention represents the NFV network as an undirected graph g= (V, E), where V represents a set of server nodes, E represents a set of physical links connecting any two nodes, V _i represents an i-th server node, E _i represents a j-th physical link. Each server is able to instantiate multiple virtual machines to support different types of VNFs, the set of virtual machines supporting VNFs being denoted m= { M ₁,m₂,...,m_|M| }. Each server has a fixed resource capacity, including computing resources and storage resources. Representing the available CPU and memory capacity on the v-th server node. B _v denotes the total output bandwidth of the server v, and T _e denotes the communication delay of the link e.

The invention represents an SFC request sequence in the network as r= { R ₁,r₂,...,r_R }. For different service requests, a series of VNFs needs to be deployed to meet the specific service needs of each SFC request. The set of VNFs required for service is denoted as f= { F ₁,f₂,...,f_N }, each VNF has a specific resource requirement and is thus usedRespectively representing the CPU requirements and the memory requirements of the VNF. One SFC request is denoted { f ₁,f₂,...f_k }, which means that the SFC request will pass k VNs in turn. Each SFC request has special QoS requirements including bandwidth requirement B _r and maximum delay tolerance T _r. Further, the virtual link between VNFs connecting SFC requests is denoted as l= { L ₁,l₂,...,l_N-1 }, where L _i is the i-th virtual link connecting VNFf _i and VNFf _i+1 in the SFC requests.

Whether a VNF is successfully deployed on a server node depends on whether there are sufficient resources. Using binary decision variablesIndicating whether VNFf _i in the request r can be deployed on the server node v. In detail, if VNFf _i in the request r is successfully deployed on the server node v,/>Otherwise,/>Similarly,/>Indicating whether the virtual link l _i in request r can be mapped onto physical link e. If the virtual link l _i in request r successfully maps onto physical link e,Otherwise,/>

(2) Constructing a mathematical model of the SFC deployment problem, wherein the mathematical model comprises an optimization target and constraint conditions of the problem;

To address the real-time network dynamics caused by random arrival and departure of requests, the present invention introduces the concept of time slot τ, using τ _r,a and τ _r,1 to represent the arrival time and the time-to-live of the request, respectively. In slot τ, the state of whether the request r is still being serviced is represented by a binary variable x _r,τ:

since a single node may deploy multiple instances of VNF services to handle multiple requests, use To represent the number of VNFf _i service instances deployed on node v:

where r (i) represents the i-th VNF requesting r.

First, in case of sufficient resources, multiple VNFs may be deployed on the same server node. The resource constraints of the server node are therefore:

Second, since the total output bandwidth of a server node sets an upper limit for the bandwidth demand of all requests through that node, the bandwidth constraint is:

Finally, the total response delay is denoted by T _d, including the communication delay on the link and the processing delay on the server node. The communication delay T _c on the link is denoted as:

The processing delay of the VNF is affected by the computing power of the virtual machine and the VNF type, which may result in different processing delays of different virtual machines. The processing rate of request r on virtual machine m is therefore denoted v _m, and the total processing delay T _p is denoted:

Where λ represents the average arrival rate of the request r, α _m represents the CPU sharing rate of the virtual machine m to the request r, δ _m represents the maximum aggregate processing capacity of the virtual machine m, and D _m represents the processing density of the virtual machine m to the request r, assuming that the arrival of the request follows the poisson process.

Thus, the total response delay T _d is:

T_d＝T_c+T_p

The time delay constraint is as follows:

The method of the present invention proposes three optimization objective functions, target 1 is minimizing delay, target 2 is minimizing cost, and target 3 is maximizing the total throughput of the request.

The expression of objective function 1 is as follows:

The expression of objective function 2 is as follows:

Where λ _v and λ _B represent the unit cost of server resources and bandwidth, respectively.

The method of the present invention uses a binary decision variable y _r to indicate whether the request r is accepted, as follows:

The expression of objective function 3 is as follows:

With the above preparation work, a markov decision process model is then proposed, and the mathematical representation of the markov decision process model is < S, a, P, R, γ >, which correspond to the state space, the action space, the state transition probability, the reward function, and the discount factor, respectively. In each decision event, the agent observes the system state s (t) and selects action a (t), by selecting action a (t), the reward r (t) and the next system state s (t+1) are obtained. The prize r (t) may be used to evaluate the validity of the action a (t). The method comprises the following steps:

1)STATE

For each state in the system, the characteristics of the physical network and the characteristics of the request being processed should be included. Defining a state as a set of vectors Wherein/>Representing the available resources of each node,/>Indicating available link bandwidth resources. M _t characterizes the request being processed, including resource requirements/>Bandwidth requirement/>Request the number of undeployed VNFs in r/>Residual delay space/>The lifetime P _r of the request r. Thus, the system state at time slot t is denoted/>

2)ACTION

Action a is represented as a set of server indexes a= { a|a e {0,1,2, |n| }, which points to an index of a server node, which means that VNF has been successfully deployed on the a-th server node. a=0 indicates that VNF cannot be deployed on any node. For each VNF there are (|n|+1) actions.

3)REWARD

In view of co-optimizing these three objectives, the reward function is a weighted sum of the total accepted request throughput minus a weighted sum of the total deployment costs and a weighted sum of the total response delays, which can be expressed as:

r_t＝αB_rτ_r,1-βC(τ)-ωT_d

where α, β and ω are weighting factors for each target and C (τ) is the cost of each slot τ.

Thus, the total prize is:

(4) The invention combines deep reinforcement learning and multi-objective genetic algorithm, and designs a reasonable and efficient arrangement scheme (MOERL). As shown in fig. 2, an Actor-Critic neural network was constructed using a deep reinforcement learning method. The actor network is responsible for generating VNF deployment policies and the critic network is used to evaluate the value of the generation policies. Training is carried out through near-end strategy optimization, different deployment schemes obtained through training are used as initial populations of NSGA-II algorithm, and the deployment schemes are continuously optimized through NSGA-II algorithm, specifically:

MOERL agents are actor networks and critic networks, where the input states, output actions, and generation of VNF placement policies for actor networks are used to approximate the policy model pi (s, a). Whereas the input state of critic network, the output cost function evaluates the value of the policy for approximating the function Q _π (s, a). First, the state of the physical network and the requested features are extracted as states and input into the actor network, then they are changed into single-column vectors by the hidden layer, then mapped into vectors of (0, 1) intervals by the sofimax layer, wherein each value in the vectors represents the probability of the VNF being deployed onto the corresponding server node, and the node with the highest probability is selected to deploy the VNF, i.e., the output pi _θ(s_t,a_t of the actor network. The output Q _π(s_t,a_t of critic network) is used to evaluate the value of policy pi _θ(s_t,a_t). The neural network is trained by selecting a near-end strategy optimization algorithm, wherein the near-end strategy optimization aims at updating parameters of the strategy network, so that an action sequence generated under the current strategy can obtain higher accumulated rewards. We express the policy pi as a continuous function pi _θ (s, a) =p (a|s, θ), representing the probability of taking action a in state s. The network is updated by constructing a loss function. Actor network loss functions in the near-end policy optimization are typically calculated based on KL divergence. critic network loss functions typically use TD errors. The two loss functions are shown below:

NSGA-II (Non-dominated Sorting Genetic Algorithm II) is a multi-objective optimization algorithm based on the principle of genetic algorithm to solve optimization problems with multiple conflicting objectives. The method generates a group of approximate optimal solution sets through non-dominant sorting, crowding degree distance and other technologies, effectively balances the diversity and convergence of knowledge, and is suitable for solving complex multi-objective problems. Since the DRL generated deployment scheme is constraint-satisfying, the initial population efficiency is higher than that generated by directly using NSGA-II algorithm. NSGA-II mainly comprises the following aspects:

(a) Objective function: minimizing latency, minimizing deployment costs, and maximizing the overall throughput of accepted requests.

(B) Population: in each time slot τ, all requested deployment schemes are integrated as the initial population for NSGA-II.

(C) Individual coding: we denote each selected chromosome by an integer within the interval 1, N, where N is the number of servers, each value corresponding to a server number that can carry a VNF.

(D) Crossing: the crossover operation takes the form of a single point crossover, i.e., the segmentation of the genes of two individuals at a randomly selected crossover point, followed by the exchange of the segmented portions, thereby creating new individuals.

(E) Mutation: mutation operation adopts single point mutation, and mutation refers to a process that one or more gene values in individual gene codes are randomly changed. It is used to introduce some randomness in order to explore new solutions in the search space.

NSGA-II iteratively performs selection, crossover and mutation operations to evolve populations in multiple generations, maintaining a diverse and high quality solution set at the pareto front. The fitness value is calculated based on three optimization objectives that minimize delay, minimize cost, and maximize the total throughput of accepted requests. The algorithm aims to find a set of solutions that provide trade-offs between these conflicting goals, resulting in pareto fronts that represent solutions for different deployment strategies of the SFC.

Examples: 1-3, simulation experiments were performed using a PyTorch framework based Python programming language running on a computer equipped with a CPU processor of 2.10GHz, 12th GenIntel (R) Core (TM) i7-12700F, and 16GB of memory. The relevant parameters for the simulation are as follows: the network topology has 12 nodes and 15 bi-directional links. To generate the SFC request, there are 6 types VNFs in the network. The VNFs number per SFC was set to [1,6]. The minimum bandwidth required for SFC requests is randomly distributed at [10,30] Mbps. Furthermore, the maximum tolerated delay per SFC request is between [20,50] ms. Each VNF requires support of CPU and memory resources on the server node, the capacity required for each VNF is between [1,20] cores and [2,4] gb, respectively. NSGA-II parameters: the population size was set to 100; the mutation probability was 20%; the crossover probability is 90%; the maximum algebra is set to 100.

To evaluate MOERL performance, the NSGA-II algorithm and DRL of the experiment were compared, and the Pareto fronts of the three algorithms were compared. The Pareto front obtained by MOERL, DRL, NSGA-II algorithm is shown in fig. 3, compared with the NSGA-II method, the deployment scheme of MOERL method can reduce the delay by 8.3%, reduce the cost by 4.8%, and improve the throughput of accepted requests by 5.5%; the deployment scheme of MOERL method can reduce 16.1% delay compared to DRL method, while reducing 1.3% cost and improving the throughput of accepted requests by 10%. Looking at the three-dimensional Pareto boundary in fig. 3, it is evident that the proposed MOERL method is superior to the rest of the comparison algorithm. This is because the DRL generates a set of initial placement schemes, which can be seen as a preliminary local search. Then, using it as the initial population for NSGA-II can speed up the convergence rate of NSGA-II, since NSGA-II does not need to start searching from a random population. NSGA-II can further optimize these schemes to obtain a better Pareto front. The advantage of this hybrid approach is that it can fully exploit the global search capability of deep reinforcement learning and the diversity maintenance capability of multi-objective genetic algorithms.

Claims

1. The multi-target SFC deployment method based on the deep reinforcement learning and genetic algorithm is characterized by comprising the following steps of:

Where r (i) denotes an ith VNF requesting r, denoted as f _i;

the objective function of minimizing the delay is expressed as follows:

The objective function of the minimized cost is expressed as follows:

Wherein λ _v and λ _B represent unit costs of server resources and bandwidth, respectively; c _v represents the CPU and memory capacity available on the v-th server node; Indicating whether VNF _i in request r can be deployed on server node v; b _v denotes the total output bandwidth of the server v,/> Indicating whether virtual link 1 _i in request r can be mapped onto physical link e;

Where B _r represents the bandwidth requirement of request r, τ _r,1 represents the lifetime of request r;

2. The multi-objective SFC deployment method based on deep reinforcement learning and genetic algorithm of claim 1, wherein step S1 comprises:

Representing the NFV network as an undirected graph g= (V, E), where V represents a set of server nodes, E represents a set of physical links connecting any two nodes, and each server is capable of instantiating one or more virtual machines to support different types of VNFs, the set of virtual machines supporting VNFs being denoted m= { M ₁,m₂,...,m_|M| }; each server has a fixed resource capacity, including computing resources and storage resources; Representing available CPU and memory capacity on the v-th server node; b _v denotes the total output bandwidth of the server v, and T _e denotes the communication delay of the link e.

3. The multi-objective SFC deployment method based on deep reinforcement learning and genetic algorithm according to claim 2, characterized in that the method represents the SFC request sequence in the network as r= { R _i,r₂,...,r_R }, for different service requests, a series of VNFs need to be deployed to meet the specific service requirement of each SFC request, the set of VNFs needed for service is represented as f= { F ₁,f₂,...,f_N }, each VNF has specific resource requirements, and thus is usedRespectively representing the CPU requirement and the memory requirement of the VNF; one SFC request is denoted { f ₁,f₂,...,f_k }, indicating that the SFC request will pass k VNs in turn;

And considering that each SFC request has special QoS requirements, including bandwidth requirement B _r and maximum delay tolerance T _r;

4. The deep reinforcement learning and genetic algorithm-based multi-objective SFC deployment method according to claim 1, wherein step S1 is based on binary decision variablesThe specific handling of whether or not a VNF is successfully deployed on a serving node is:

If VNF _i in request r is successfully deployed on server node v, Otherwise,/>

In a similar manner to that described above,Indicating whether virtual link 1 _i in request r can be mapped onto physical link e, if virtual link 1 _i in request r successfully maps onto physical link e,/>Otherwise,/>

5. The multi-objective SFC deployment method based on deep reinforcement learning and genetic algorithm according to claim 1, wherein step S2 further specifically comprises:

(21) Aiming at the condition that resources meet the requirement of deploying a plurality of VNs on the same server node, the resources of the server node are constrained as follows:

(22) Since the total output bandwidth of a server node sets an upper limit for the bandwidth demand of all requests through that node, the bandwidth constraint is defined as:

thus, the total response delay T _d is:

T_d＝T_c+T_p

The time delay constraint is as follows:

6. The multi-objective SFC deployment method based on deep reinforcement learning and genetic algorithm according to claim 1, wherein step S3 specifically comprises:

(31) System status representation

The method represents the system state at time slot t as

For each state in the system, which should include the characteristics of the physical network and the characteristics of the request being processed, the state is defined as a set of vectorsWherein/>Representing the available resources of each node,/>Representing available link bandwidth resources, M _t represents characteristics of the request being processed, including resource demand/>Bandwidth requirement/>Request the number of undeployed VNFs in r/>Residual delay space/>The survival time P _r of the request r;

(32) Motion representation

(33) Reward setting

r_t＝αB_rτ_r,1-βC(t)-ωT_d

Thus, the total prize is:

7. The multi-objective SFC deployment method based on deep reinforcement learning and genetic algorithm according to claim 1, wherein step S4 specifically comprises:

8. The multi-objective SFC deployment method based on deep reinforcement learning and genetic algorithm according to claim 7, wherein actor network loss functions in the near-end policy optimization are generally calculated based on KL divergence; critic network loss functions typically use TD errors;

the loss functions of actor and critic networks are shown below, respectively:

9. The deep reinforcement learning and genetic algorithm-based multi-objective SFC deployment method of claim 1, further comprising employing a multi-objective optimization algorithm for solving an optimization problem having a plurality of conflicting objectives; the multi-objective optimization algorithm is based on a genetic algorithm principle, and comprises the following calculation steps: