CN110247795B

CN110247795B - Intent-based cloud network resource service chain arranging method and system

Info

Publication number: CN110247795B
Application number: CN201910461367.6A
Authority: CN
Inventors: 郭少勇; 喻鹏; 邱雪松; 贺文晨; 李文萃; 申京; 邵苏杰; 徐思雅; 亓峰; 丰雷
Original assignee: Beijing University of Posts and Telecommunications; Information and Telecommunication Branch of State Grid Henan Electric Power Co Ltd
Current assignee: Beijing University of Posts and Telecommunications; Information and Telecommunication Branch of State Grid Henan Electric Power Co Ltd
Priority date: 2019-05-30
Filing date: 2019-05-30
Publication date: 2020-09-25
Anticipated expiration: 2039-05-30
Also published as: CN110247795A

Abstract

The embodiment of the invention provides an intention-based cloud network resource service chain arranging method and system, wherein the method comprises the following steps: providing an end-to-end service for cloud network resources based on a preset northbound interface reference architecture; and performing online arrangement and dynamic adjustment on the end-to-end service based on a service chain arrangement framework of deep reinforcement learning, wherein in the online arrangement and dynamic adjustment, a preset multi-objective optimization problem model is solved to minimize the arrangement cost and delay of the service chain. According to the method and the system for arranging the cloud network resource service chain based on the intention, provided by the embodiment of the invention, a multi-target optimization problem model is constructed by providing a preset northbound interface reference framework and an SFC arrangement framework based on the DRL, so that the long-term service chain arrangement cost is reduced to the maximum extent.

Description

Intent-based cloud network resource service chain arranging method and system

Technical Field

The invention relates to the technical field of communication, in particular to an intention-based cloud network resource service chain arranging method and system.

Background

The rapid growth of internet of things services with different QoS requirements presents network operators with significant challenges in terms of rapid delivery and QoS guarantees. Network Function Virtualization (NFV) and Software Defined Networking (SDN) have become key technologies for flexible resource allocation and dynamic service provisioning. However, both techniques still require the application of manual operations to define the service model and configure the network details, which in turn requires highly skilled administrators and a significant amount of time. These manual tasks have a detrimental effect on improving reliability and providing service quickly. Therefore, Intent Based Networking (IBN) has been proposed to simplify low level configuration and speed up service delivery.

One key aspect of supporting intent-based Service provisioning is the vendor-independent and technology-independent northbound interface (NBI) for converting customer languages into abstract definitions of Service Functional Chain (SFC). Another key step is the online orchestration based on the abstract definition of the SFC model to achieve a demand-driven, automatically-tuned service delivery style.

However, the above method still needs to obtain complete network details in advance to obtain a global optimal solution, but such accurate information is often difficult to collect. Therefore, there is a need for an intent-based cloud resource service chaining method to solve the above problems.

Disclosure of Invention

In order to solve the above problems, embodiments of the present invention provide an intent-based cloud resource service chaining method and system that overcome the above problems or at least partially solve the above problems.

In a first aspect, an embodiment of the present invention provides an intention-based cloud network resource service chain arrangement method, including:

providing an end-to-end service for cloud network resources based on a preset northbound interface reference architecture;

and performing online arrangement and dynamic adjustment on the end-to-end service based on a service chain arrangement framework of deep reinforcement learning, wherein in the online arrangement and dynamic adjustment, a preset multi-objective optimization problem model is solved to minimize the arrangement cost and delay of the service chain.

Wherein the multi-objective optimization problem model is represented as:

min{cost(server)+cost(link)}

wherein, cost (server) is related cost of server resource, cost (link) traffic forwarding cost, C₁,C₂,C₃,C₄,C₅,C₆,C₇Is a resource constraint.

Wherein solving the predetermined multi-objective optimization problem model to minimize service chain orchestration cost and delay comprises:

and obtaining the optimal solution of the multi-objective optimization problem model based on a preset double-layer depth Q network algorithm.

The method for obtaining the optimal solution of the multi-objective optimization problem model based on the preset double-layer depth Q network algorithm comprises the following steps:

initializing a business process;

and performing service arrangement on the initialized service flow based on a preset double-layer depth Q network.

The initializing the business process includes:

randomly selecting a target scheme meeting requirements from a cloud server set;

determining a target routing scheme between the VNFs based on a shortest path selection algorithm;

the orchestration costs for all service chains are calculated.

The service arrangement is performed on the initialized service process based on the preset double-layer depth Q network, and comprises the following steps:

after the state space is initialized, inputting a state to the double-layer depth Q network;

acquiring an action corresponding to the input state and calculating a target Q value;

and updating the input state based on a gradient descent method until a preset termination condition is reached.

In a second aspect, an embodiment of the present invention further provides an intent-based cloud network resource service chaining system, including:

the service module is used for providing end-to-end service for the cloud network resources based on a preset northbound interface reference framework;

and the arrangement adjusting module is used for carrying out online arrangement and dynamic adjustment on the end-to-end service based on a service chain arrangement framework of deep reinforcement learning, wherein in the online arrangement and dynamic adjustment, a preset multi-objective optimization problem model is solved so as to minimize the arrangement cost and delay of the service chain.

Third aspect an embodiment of the present invention provides an electronic device, including:

a processor, a memory, a communication interface, and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the memory stores program instructions executable by the processor, and the processor calls the program instructions to execute the intent-based cloud network resource service chaining method.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, which stores computer instructions, where the computer instructions cause the computer to execute the above method for arranging an intent-based cloud network resource service chain.

According to the method and the system for arranging the cloud network resource service chain based on the intention, provided by the embodiment of the invention, a multi-target optimization problem model is constructed by providing a preset northbound interface reference framework and an SFC arrangement framework based on the DRL, so that the long-term service chain arrangement cost is reduced to the maximum extent.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of an intention-based cloud network resource service chain arrangement method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of training steps at different learning rates according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating training steps under different algorithms provided by an embodiment of the present invention;

FIG. 4 is a graph illustrating average delay for various algorithms provided by embodiments of the present invention;

FIG. 5 is a diagram of the total cost of various algorithms provided by an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of an intent-based cloud resource service chaining system according to an embodiment of the present invention;

fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flowchart of a method for arranging a cloud network resource service chain based on intentions according to an embodiment of the present invention, as shown in fig. 1, including:

101. providing an end-to-end service for cloud network resources based on a preset northbound interface reference architecture;

102. and performing online arrangement and dynamic adjustment on the end-to-end service based on a service chain arrangement framework of deep reinforcement learning, wherein in the online arrangement and dynamic adjustment, a preset multi-objective optimization problem model is solved to minimize the arrangement cost and delay of the service chain.

It should be noted that an application scenario of the embodiment of the present invention is how to implement service chain arrangement on cloud network resources in the internet of things, and in this scenario, the embodiment of the present invention provides an IBN reference architecture to manage infrastructure of the internet of things and provide end-to-end services across multiple domains. Specifically, in step 101, the IBN reference architecture provided in the embodiment of the present invention includes: VNF Manager (VNF Manager, VNFM) and NFV Orchestrator (NFV editor, NFVO), management and control plane, and data plane. The VNF manager and NFV orchestrator are used to allow customers to declare IR using human-readable language and then translate the declarative policies into high-level service abstractions, such as VNF properties, QoS functions, and thresholds, through intent-based NBI. The management and control plane includes a Virtualization Infrastructure Manager (VIM) that maps high-level abstract policies to low-level service chaining orchestration policies, and a controller that coordinates SDN _ C and Cloud controller (Cloud _ C) to automate SFC orchestration. The data plane is configured to receive control messages through a South Bound Interface (SBI), and provide physical resources for VNF placement and traffic routing in a cloud domain, and sensors and actuators in an internet of things domain are responsible for data collection. The IBN reference architecture provided by the above embodiments of the present invention can implement management of the internet of things infrastructure and provide end-to-end services across multiple domains.

Further, in step 102, the embodiment of the present invention provides a service chain orchestration framework based on deep reinforcement learning, i.e. an SFC orchestration framework based on DRL. The SFC orchestration framework of the DRL can obtain SFC abstraction models and environment details through intent-based NBI and web learning models. Then, the DRL-based SFC orchestration framework can coordinate the controller, realize corresponding actions of the current network through the technology-specific SBI, and simultaneously, the network provides feedback related to rewards or punishments to prompt the DRL-based SFC orchestration framework to adjust the behaviors thereof, so that the DRL-based SFC orchestration framework can realize an optimal strategy through effective training, wherein the optimal strategy is a cloud network resource service chain orchestration scheme required by the embodiment of the invention. In the process of obtaining the optimal strategy, the embodiment of the invention establishes a multi-objective optimization problem model to minimize the SFC arrangement cost and delay, and obtains the finally required cloud network resource service chain arrangement scheme through the optimal solution of the multi-objective optimization problem model.

In the multi-objective optimization problem model established in the embodiment of the invention, the SFC is composed of a triplet s { (v)_so,v_de)_s,F_s,r_sDenotes wherein (v)_so,v_de)_sA source node pair and a target node pair representing s. v. of_soGenerating with a data transmission rate r_sThe flow rate of (c). F_sF ∈ F representing specific SFC information including attributes, order and connectivity of VNF, CPU, memory resources required in cloud server and VNF_sIs delayed by the cpu_f，mem_fAnd d_fAnd (4) showing. Virtual link between u and w of VNF

And (4) showing. Definition D_sIs the delay threshold.

The physical network of the cloud domain is then represented by a weighted undirected graph G ═ N, LThe number of servers and links are denoted by M and H.cloud server V has CPU computing and memory resources for placing VNF instances, which are respectively represented by Cap_cpu(v) And Cap_mem(v) And (4) showing. Physical link l between nodes i and j_ijWith maximum data transmission rate b_ijAnd a transmission delay d_ij。

F, which represents the VNF in s, is mapped to the cloud server, and is 0 otherwise;

representing virtual links in s

Maps to a physical link, otherwise is 0.

VNF instances require CPU computing resources and memory resources in the cloud server. Load balancing needs to be considered because maintaining load balancing between servers and links can avoid traffic congestion and further improve network cost efficiency. Therefore, the embodiment of the invention provides two load balancing factors phi_vAnd Θ_ijAnd the load state is used for indicating the load state of the network, and the values of the load state and the load state have positive correlation with the resource utilization rate. Phi_vThe calculation is as follows:

α therein₁,β₁,χ₁Is a positive parameter used to adjust phi in the cost calculation process_vThe value of (c). Phi_vIs U_vWhether the linear or exponential function of (a) depends on (U)_vThe range of (1). U shape_vA weighted sum representing CPU and memory usage, calculated by:

wherein e is_pAnd e_mWeight representing CPU and memory usage, e_p+e _m1. The associated cost of server resources is calculated by:

unit prices of CPU and memory resources are respectively c₁And c₂And (4) showing. Next consider the forwarding cost in traffic routing, the load balancing factor Θ_ijThe calculation is as follows:

α therein₂,β₂,χ₂Is a positive parameter and is used to adjust Θ_ijThe value of (c). U shape_ijRepresents a link l_ijThe utilization rate of the medium transmission rate is calculated as follows:

the cost calculation method of traffic forwarding is as follows:

wherein, c₃Representing the unit price of the link transmission rate. (e)_l·Θ_ij+e_d·d_ij/D_s) Represents theta_ijWeighted sum delay d of_ij，e_l+e _d1. cost (link) is composed of three parts: theta_ijFixed delay d_ijAnd a unit price. As can be seen from the above calculation, nodes or links with larger remaining resources have a relatively lower cost.

The total Cost _ total of the SFC orchestration flow is calculated as follows:

Cost_total＝cost(server)+cost(link)；

the resource constraints are given by:

C₁:

C₂:

C₃:

C₄:

C5:

the delay constraint is given by:

C₆:

therefore, a multi-objective optimization problem model aiming at improving cost efficiency and ensuring QoS is established:

min{cost(server)+cost(link)}

On the basis of the above embodiment, solving the preset multi-objective optimization problem model to minimize service chain arrangement cost and delay includes:

and obtaining the optimal solution of the multi-objective optimization problem model based on a preset double-layer deep Q network (DDQN) algorithm.

Aiming at the multi-objective optimization problem model provided in the embodiment, the embodiment of the invention designs a double-layer depth Q network algorithm to obtain the optimal solution of the multi-objective optimization problem model.

In particular, embodiments of the present invention formulate the optimization problem as a Markov decision process { ST, A, Rd, P }, where ST represents a state space, A represents an action space, Rd is defined as a reward function, and P is a state transition probability. The definition is as follows:

the state space is: each agent has a corresponding orchestration scheme at a certain time. This state is defined as the satisfaction of the QoS requirements of all SFCs and is calculated by:

ST＝{st₁,st_s,...,st_K}；

wherein st_sK is the number of SFCs, {0,1 }. st _s1 indicates that the delay requirement of SFC can be satisfied under the current scheduling scheme. Otherwise, st _s0. The number of all states is 2^K。

The action space is as follows: the transition between the two states of the SFC means that the VNF placement or routing is changed by taking an action. The action set A is defined as follows:

where X is designed as the set of available actions placed by VNFs in the SFC business process. In addition, if X is given, the route between VNFs can be obtained by a shortest path algorithm.

Y is designed as the available set of actions for traffic routing between VNFs. Thus, the operation space for VNF placement and traffic routing is denoted as a ═ X, Y. s has an action number of 2^M+H。

The reward is as follows: state st if the agent takes some action_sWill shift to a new state st'_s. Agents may also obtain an immediate reward Rd_s(a_sSt, st'), defined as st_sTransfer to st'_sThe cost is reduced.

Rd_s(a_s,st,st')＝cost(st_s)-cost(st'_s)；

Where cost (st)_s) And cost (st)_s') represents the state st_sAnd st'_sThe arrangement cost of (a) and (b). By accumulating long-term rewards Rd_s(a_sSt, st') can achieve the highest cost efficiency, and the strategy pi can obtain the corresponding action to be taken by the SFC according to the current state. The best action is

Q_s(st, a) is defined as a state-action function and represents the expected cumulative discount reward for the specified state-action. Q_s(st, a) is expressed as:

where γ is a discount factor, indicating the importance of future rewards in learning. From the Bellman equation, the optimum can be obtained as follows

P_s(a_sSt, st ') represents the probability of a transition from state st to state st'. Therefore, an optimal strategy can be obtained based on the above formula

And is represented as:

in practice, it is often difficult to obtain an accurate transition probability. Therefore, Q-learning is designed to find the optimal solution in an iterative manner based on the available information, and it updates the Q-value function using the following equation:

wherein Q is affected for learning efficiency_s(st, a) update rate.

It can be appreciated that Q learning performs iterations based on a Q-value table, so it is difficult to obtain an optimal solution if the state and motion space is very large. To overcome this weakness, the Deep Q Network (DQN) provided by embodiments of the invention approximates the Q-value function by a Deep Neural Network (DNN) rather than a Q-value table. The DNN may be viewed as a depth map with multiple processing layers. θ represents the weights of these layers and is updated by gradient descent. The approximation of the value function used by DQN is calculated by:

Q_s(st,a,θ)≈Q_s(st,a)。

in addition, DQN utilizes empirical replay and independent target networks to eliminate data dependencies. Defining a target network to be based on a weight θ^-A target Q value is calculated. Theta and theta^-The difference between is that theta is updated in each iteration, but theta is^-Updated in a fixed number of iterations. The target Q function is given by:

the loss function of DQN is defined as the mean square error and is calculated by:

L(θ)＝E[(Target_Q_s-Q_s(st,a,θ))²]；

in each iteration, the weight θ needs to be updated to depend on the gradient

The loss function is minimized. The update function is calculated by:

θ'＝θ+[Target_Q_s-Q_s(st,a,θ)]·▽Q_s(st,a,θ)；

it is worth emphasizing that both DQN and Q learning utilize the maximum function to calculate Target _ Q_sThis leads to an overestimation problem. As an improvement, the DDQN first finds the corresponding action with the largest Q value in the current network, rather than the largest Q value of all actions directly in the target network:

a_max(st',θ)＝argmax_a'∈AQ_s(st',a',θ)；

then using the selected action a_max(st', θ) rewrite Target _ Q_s. New Target _ Q in DDQN_sCalculated from the following formula:

similarly, L (θ) and θ' also need to be updated together in DDQN.

Specifically, the two-layer deep Q network algorithm provided by the embodiment of the present invention may include two parts, where the first part is a service process initialization, and the service process initialization may include the following steps:

1. from σ_f,sRandomly select F ∈ F_sPossible placement solution of

σ_f,sFor F ∈ F_sA set of feasible cloud servers.

2. The routing scheme between VNFs is obtained by a shortest path algorithm.

3. The cost of the layout of all the SFCs is calculated.

The second part is the service arrangement using the DDQN network, and the part can comprise the following steps:

1. the state space is initialized to ST ═ ST₁,st_s,...,st_K}。

2. For SFCs, it will st_sAdding Q-value network Q as input_s(st_s,a,θ)。

3. The action is obtained through a greedy policy. The strategy selects random and best actions with probabilities and 1-respectively.

4. All SFC conversions are stored in the empirical replay memory.

5. Each agent samples (st, a, Rd, st') from the ER and calculates its target Q value depending on whether it is the final state or not.

6. They perform a gradient descent step (Target _ Q) with respect to θ of the Q-value network_s-Q_s(st,a,θ))². At each uf_θAfter step, theta^-Is replaced with theta.

7. If the current state ST is {1, 1., 1}, training will be terminated.

By combining the processes, the DDQN algorithm designed by the embodiment of the invention can obtain the optimal solution of the multi-objective optimization problem model, has better cost efficiency and convergence, and can ensure the QoS requirement and balance the flow.

In order to verify the performance of the method provided by the embodiment of the invention, simulation experiments are carried out in the embodiment of the invention. Specifically, the embodiment of the present invention simulates the proposed algorithm by using a cloud network composed of 30 nodes (10 cloud servers and 20 switches) and 50 links. The maximum data transmission rate of the link is fixed to 1 Gbps. The transmission delay of the link is 1-3 ms. The CPU and memory resources of the server are set to 32 and 100-200GB, respectively. Each SFC requires 2-4 VNFs with a data transfer rate of 20-50 Mbps. Each VNF requires 2-4 CPUs and 5-10GB memory resources with a processing delay of 2-5 ms. The structure of DNN consists of three hidden layers of fully connected neural networks with 64, 32 neurons. The simulation is mainly carried out from two aspects of convergence performance and optimization performance of the algorithm.

The convergence performance was first evaluated at different learning rates: fig. 2 is a schematic diagram of training steps at different learning rates according to an embodiment of the present invention, and as shown in fig. 2, the DDQN algorithm with three learning rates has a huge training step number at the beginning of an epicode. The number of training steps tends to decrease as the epicode increases, reflecting the good convergence performance of the DDQN. On the other hand, the learning rate is a key factor of convergence performance. For example, in EP 90, DDQN of 0.001 requires 92 training steps. By comparison, the 0.01 and 0.1 algorithms require only 40 and 26 steps to obtain the optimal solution.

Then, convergence performance under different reinforcement learning algorithms is compared, and fig. 3 is a schematic diagram of training steps under different algorithms provided by the embodiment of the present invention, as shown in fig. 3. It can be seen that Q learning has lower convergence performance at different epsilodes because it requires less measures to remove data dependencies. In contrast, DQN and DDQN establish empirical replay and independent target networks to resolve data correlations, so their training steps are less than Q learning. Taking the training steps of Q-learning, DQN, DDQN as examples, 51, 32, 26 respectively. Therefore, solving the overestimation problem is also an advantage of DDQN compared to DQN.

Further, embodiments of the present invention evaluate the average delay, total reward, and load balancing status using the following two algorithms as a comparison. QoS-driven placement algorithm (QPA): it first takes an end-to-end path of the SFC and then extends over the path to minimize cost and delay while meeting resource requirements. Random fit placement algorithm (RPA): the placement of the VNFs is performed in the form of a random fit, taking into account all the solutions that satisfy all the constraints and randomly selecting one of them, and then also randomly selecting the path therein. Fig. 4 is a schematic diagram of the average delay of different algorithms provided by the embodiment of the present invention, and as shown in fig. 4, the average delay of the four algorithms is low at the beginning because the number of SFC requests is small. As SFC increases, the delay increases with different magnitudes. When the number of SFCs is 200, the average delays of DDQN, DQN, QPA and RPA are 38ms, 43ms, 48ms, 56ms, respectively. Due to the randomness of the RPA, the delay performance of the RPA is poor. Although DQN and QPA take delay minimization into account, they ignore the impact of load balancing, which can lead to network congestion. In contrast, DDQN has better latency performance at different SFC request numbers.

FIG. 5 is a schematic diagram of the total cost of the different algorithms provided by the embodiment of the present invention, as shown in FIG. 5, the total cost of DDQN, DQN, QPA is always lower than RPA under different number of SFCs, since they all consider cost minimization of the objective function. For example, when the number of SFCs is 300, the cost of RPA is 16%, 20%, 27% higher than QPA, DQN, DDQN. Of these three algorithms, DDQN has the best cost efficiency, as it overcomes the overestimation problem and focuses on load balancing, which QPA and DQN ignore. Therefore, DDQN can achieve optimal delay and cost in SFC orchestration.

For the load balancing state, taking SFCs as an example of 300, the variance of the DDQN link usage rates is 62%, 55%, 41% lower than that of RPA, SPA and DQN. Similarly, the DDQN server usage rate varies less than 81%, 65%, and 48%. In the SFC orchestration optimization model, phi_vAnd Θ_ijThe design is used for maintaining network balance, so that the DDQN can effectively avoid network congestion.

Fig. 6 is a schematic structural diagram of an intent-based cloud resource service chaining system according to an embodiment of the present invention, as shown in fig. 6, including: a service module 601 and an orchestration adjustment module 602, wherein:

the service module 601 is used for providing end-to-end service for cloud network resources based on a preset northbound interface reference architecture;

the orchestration adjustment module 602 is configured to perform online orchestration and dynamic adjustment on the end-to-end service based on a service chain orchestration framework of deep reinforcement learning, where in the online orchestration and dynamic adjustment, a preset multi-objective optimization problem model is solved to minimize a service chain orchestration cost and delay.

Specifically, how to use the service module 601 and the orchestration adjustment module 602 to execute the technical solution of the intent-based cloud network resource service chaining method embodiment shown in fig. 1 is similar, and the implementation principle and the technical effect are similar, and are not described herein again.

The intention-based cloud network resource service chain arrangement system provided by the embodiment of the invention has the advantages that a preset northbound interface reference framework and an SFC arrangement framework based on DRL are provided, and a multi-objective optimization problem model is constructed, so that the long-term service chain arrangement cost is reduced to the maximum extent.

On the basis of the above embodiment, the multi-objective optimization problem model is expressed as:

min{cost(server)+cost(link)}

On the basis of the above embodiment, the arrangement adjustment module includes:

and the DDQN unit is used for obtaining the optimal solution of the multi-objective optimization problem model based on a preset double-layer depth Q network algorithm.

On the basis of the above embodiment, the DDQN unit includes:

the initialization part is used for initializing the business process;

and the service arrangement part is used for carrying out service arrangement on the initialized service flow based on the preset double-layer depth Q network.

On the basis of the above embodiment, the initialization section is specifically configured to:

the orchestration costs for all service chains are calculated.

On the basis of the above embodiment, the service orchestration part is specifically configured to:

An embodiment of the present invention provides an electronic device, including: at least one processor; and at least one memory communicatively coupled to the processor, wherein:

fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention, and referring to fig. 7, the electronic device includes: a processor (processor)701, a communication Interface (Communications Interface)702, a memory (memory)703 and a bus 704, wherein the processor 701, the communication Interface 702 and the memory 703 complete communication with each other through the bus 704. The processor 701 may call logic instructions in the memory 703 to perform the following method: providing an end-to-end service for cloud network resources based on a preset northbound interface reference architecture; and performing online arrangement and dynamic adjustment on the end-to-end service based on a service chain arrangement framework of deep reinforcement learning, wherein in the online arrangement and dynamic adjustment, a preset multi-objective optimization problem model is solved to minimize the arrangement cost and delay of the service chain.

An embodiment of the present invention discloses a computer program product, which includes a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer can execute the methods provided by the above method embodiments, for example, the method includes: providing an end-to-end service for cloud network resources based on a preset northbound interface reference architecture; and performing online arrangement and dynamic adjustment on the end-to-end service based on a service chain arrangement framework of deep reinforcement learning, wherein in the online arrangement and dynamic adjustment, a preset multi-objective optimization problem model is solved to minimize the arrangement cost and delay of the service chain.

Embodiments of the present invention provide a non-transitory computer-readable storage medium, which stores computer instructions, where the computer instructions cause the computer to perform the methods provided by the above method embodiments, for example, the methods include: providing an end-to-end service for cloud network resources based on a preset northbound interface reference architecture; and performing online arrangement and dynamic adjustment on the end-to-end service based on a service chain arrangement framework of deep reinforcement learning, wherein in the online arrangement and dynamic adjustment, a preset multi-objective optimization problem model is solved to minimize the arrangement cost and delay of the service chain.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to each embodiment or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A cloud network resource service chain arranging method based on intents is characterized by comprising the following steps:

performing online arrangement and dynamic adjustment on the end-to-end service based on a service chain arrangement framework of deep reinforcement learning, wherein in the online arrangement and dynamic adjustment, a preset multi-objective optimization problem model is solved to minimize the arrangement cost and delay of the service chain;

the solving of the preset multi-objective optimization problem model to minimize service chain orchestration cost and delay includes:

acquiring an optimal solution of the multi-objective optimization problem model based on a preset double-layer depth Q network algorithm;

the multi-objective optimization problem model is expressed as:

wherein, cost (server) is related cost of server resource, cost (link) traffic forwarding cost, C₁,C₂,C₃,C₄,C₅,C₆,C₇Is a resource constraint condition;

wherein, C₁：

C₂:

C₃:

C₄:

C5:

C₆:

Wherein s { (v)_so,v_de)_s,F_s,r_sDenotes the SFC orchestration framework,

f representing the VNF in s maps to the cloud server v,

f, representing a VNF in s, is not mapped to cloud server v;

representing virtual links in s

Is mapped to physical link l_ij，

Representing virtual links in s

Not mapped to physical link l_ij，

Representing a virtual link between u and w of the VNF,/_ijRepresenting a physical link between nodes i and j; f represents an instance f of the VNF, u represents an instance u of the VNF, and w represents an instance w of the VNF.

2. The method for arranging the resource service chain of the cloud network based on the intention according to claim 1, wherein the obtaining the optimal solution of the multi-objective optimization problem model based on the preset double-layer depth Q network algorithm comprises:

initializing a business process;

3. The method for orchestrating intent-based cloud resource service chaining according to claim 2, wherein initializing a business process comprises:

the orchestration costs for all service chains are calculated.

4. The method for orchestrating intent-based cloud network resource service chain according to claim 2, wherein the performing business orchestration on the initialized business process based on a preset two-layer deep Q network comprises:

5. An intent-based cloud resource service chaining system, comprising:

the arrangement adjusting module is used for carrying out online arrangement and dynamic adjustment on the end-to-end service based on a service chain arrangement framework of deep reinforcement learning, wherein in the online arrangement and dynamic adjustment, a preset multi-objective optimization problem model is solved to minimize the arrangement cost and delay of the service chain;

wherein solving the predetermined multi-objective optimization problem model to minimize service chain orchestration cost and delay comprises: acquiring an optimal solution of the multi-objective optimization problem model based on a preset double-layer depth Q network algorithm;

the multi-objective optimization problem model is expressed as:

wherein, C₁：

C₂:

C₃:

C₄:

C5:

C₆:

Wherein s { (v)_so,v_de)_s,F_s,r_sDenotes the SFC orchestration framework,

f representing the VNF in s maps to the cloud server v,

f, representing a VNF in s, is not mapped to cloud server v;

representing virtual links in s

Is mapped to physical link l_ij，

Representing virtual links in s

Not mapped to physical link l_ij，

6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the method for intent based chain of services for cloud resources as claimed in any of claims 1 to 4.

7. A non-transitory computer readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method for intent based service chaining of cloud network resources according to any of claims 1 to 4.