CN116980298A

CN116980298A - Deterministic end-to-end slice flow arrangement strategy based on side drawing meaning force

Info

Publication number: CN116980298A
Application number: CN202310580629.7A
Authority: CN
Inventors: 陈宁江; 刘雨晖; 甘树锟; 蒙创颖; 袁雪梅; 涂欢
Original assignee: Guangxi University
Current assignee: Guangxi University
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2023-10-31

Abstract

The invention belongs to the technical field of network communication. A deterministic end-to-end slice traffic orchestration strategy based on edge map annotation forces, comprising: step 1, constructing an end-to-end software defined network architecture oriented to the industrial Internet of things; step 2, applying a joint arrangement strategy of end-to-end intra-slice service function chain deployment and route forwarding based on a deep reinforcement learning algorithm under the assistance of a side-view attention network; combining the topological characteristic of the physical basic network with the characteristic of the VNF-FG in the slice virtual network, placing the required VNF on a proper physical node, distributing computing power resources, and obtaining an optimal path for virtual link mapping; and 3, evaluating the arrangement strategy by using an end-to-end QoS analysis framework based on deterministic network calculation to obtain an optimal end-to-end slice flow configuration scheme. The invention can effectively reduce the total cost of slice flow arrangement and improve the total number of successful transmission of network flows, and effectively ensure the end-to-end user experience quality of the terminal equipment.

Description

Deterministic end-to-end slice flow arrangement strategy based on side drawing meaning force

Technical Field

The invention belongs to the technical field of network communication, and particularly relates to a deterministic end-to-end slice flow arrangement strategy based on edge drawing meaning force.

Background

In the industrial Internet of things, a terminal equipment layer formed by infrastructure such as industrial sensing equipment, artificial intelligent equipment and the like can provide a proper solution for intelligent manufacturing and intelligent logistics, and is accessed to an edge layer constructed by a mobile edge computing node through an edge gateway; the edge layer of the intelligent agent carrying the algorithm further cleans, models, analyzes and stores the acquired user data, and provides lightweight environment configuration for deploying intelligent industrial edge application; from the edge layer to the core cloud layer integrating virtualization technology, micro-service management, big data storage and network function service, describing the service collaboration function in cloud edge collaboration; the uppermost application layer provides a flexible device management and service operation framework for user device access. Therefore, the number of physical devices accessed by each layer and the variety of services provided are various, massive heterogeneous data analysis and processing need cross-layer multiparty collaboration, and the high-efficiency solution of task collaboration unloading in the cloud-edge collaborative architecture is provided, so that the reliability of real-time industrial business processing can be improved. In addition, the scale of the industrial Internet of things is continuously expanded to bring new challenges to the traffic engineering technology in the existing cloud edge cooperative architecture, and the multi-device aggregate traffic simultaneously accesses the network to perform customized arrangement management on traffic. The expansion of the 5G network scale causes that a new protocol is difficult to be deployed in the traditional closed-loop network control equipment, and meanwhile, a user puts high requirements on QoS of end-to-end service, so that the management difficulty of industrial network resources is increased.

In order to solve the above problem, the industrial internet of things adopts Network Function Virtualization (NFV) technology to construct a new generation Software Defined Network (SDN) slice architecture, and segments a traditional physical communication link into end-to-end virtual networks isolated from each other. Each virtual network is represented as a network slice serving a particular service, strictly guaranteeing the quality of service requirements of the application. Although SDN expands the resource allocation optimization scope under cloud-edge collaborative architecture, how to meet the end-to-end communication service quality and user experience quality of different types of industrial applications, which puts higher demands on accurate flow feature detection, efficient and flexible arrangement strategies and network resource management measures.

Disclosure of Invention

In order to solve the problems, the invention provides a deterministic end-to-end slice flow arranging strategy based on edge drawing injection force, coordinates the allocation of each slice flow resource from an end-to-end angle, can effectively reduce the total cost of slice flow arranging and improve the total number of successful transmission of network flows, rapidly completes the optimal configuration such as end-to-end slice deployment, improves the utilization rate of slice resources, reduces the total cost of flow arranging, and effectively ensures the end-to-end user experience quality of terminal equipment.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a deterministic end-to-end slice traffic orchestration strategy based on edge map annotation forces, comprising the steps of:

step 1, constructing an end-to-end software defined network architecture oriented to the industrial Internet of things;

step 2, applying a joint arrangement strategy of service function chain deployment and route forwarding in an end-to-end slice based on a deep reinforcement learning algorithm under the assistance of a side view attention network, and carrying out end-to-end slice life cycle management; combining the topological characteristic of the physical basic network with the characteristic of the VNF-FG in the slice virtual network, placing the required VNF on a proper physical node and distributing enough computing power resources to obtain an optimal path for virtual link mapping;

and 3, evaluating the arrangement strategy by using an end-to-end QoS analysis framework based on deterministic network calculation to obtain an optimal end-to-end slice flow configuration scheme.

Preferably, in the step 1, an end-to-end software defined network architecture oriented to the industrial internet of things is constructed, and specifically includes an infrastructure layer, a virtualization layer, a control layer and a service layer;

preferably, the life cycle management of the end-to-end slice comprises:

firstly, carrying out feature modeling and classification on network flow information in monitoring, constructing a feature extraction model based on a side drawing annotation force network, and modeling a single user request into a service function chain based on VNF-FG;

then, searching a proper physical node for the VNF virtual node in the SFC generated before for deployment, and optimizing the target to minimize the deployment cost of the SFC: each physical server may deploy a plurality of different VNF instances and allocate required computational and memory resources, most of which may be shared by all slices; meanwhile, in consideration of isolation among different tenants of different service slices, a part of VNF examples can only be shared by service function chains in the same slice;

and finally, connecting the needed VNF examples according to the SFC routing information, namely mapping virtual connection in the VNF-FG to the corresponding physical link topology, and evaluating by using an end-to-end QoS performance analysis framework to meet the end-to-end service quality requirements of different services.

Preferably, in step (2), modeling a service function chain dynamic deployment and route forwarding joint optimization problem based on VNF-FG to obtain a Markov decision process model; wherein, in service scene stateIn the flow arrangement algorithm, decision agent is added to the active space in time slot t>Is selected to perform an action->And obtain corresponding rewards->Final advantageThe goal is to maximize the system utility of traffic orchestration, then its state space, action space, rewards function, and state-action associated policies are defined as follows:

in the traffic schedule, the user traffic can be regarded as a service request for network resources, and is defined in the form of a service function chain, and the state space is defined as follows:

wherein, the collectionDetails of all network flows in the slot tservice queue are recorded, the method comprises the steps of containing graph structure information corresponding to service function chains and having strict end-to-end slicing service quality requirements on heterogeneous computing power resources, end-to-end time delay, buffer pressure upper bound and the like; />Status information representing the physical network topology at the time slot t-base layer;

the action space of the orchestrating agent is defined as follows:

wherein ,indicating that time slot t will be network stream F _q Any VNF virtual node V 'in the service function chain of (1)' _i,q Deployment to physical node V _n And allocate the required calculation force resource +.>But->Ensuring that the traffic can reach the physical node placed by the corresponding VNF instance and allocate buffer resources to establish a connection for creating a virtual route;

the bonus function is defined as follows:

the method comprises the steps that a time slot t algorithm decision agent obtains a reward value of environmental feedback after corresponding actions are executed; if the total end-to-end delay of the traffic arrangement scheme exceeds the maximum tolerance delay of the user request or the arrangement failure of the network flow is caused by the problems of insufficient computing power resources for supporting normal service operation and the like, a failure penalty epsilon (t) needs to be added for rewarding the action appropriately, and an Agent is guided to learn an optimal strategy for improving the traffic arrangement success rate;

policy pi:is a random environmental state->And the flow arrangement which the Agent may take in this state +.>The optimization target of the method is to evolve to the optimal strategy direction according to the returned rewarding value through continuous random exploration under the state space and the action space.

Preferably, the method for constructing the feature extraction model based on the graph annotation force network comprises the following steps of:

for physicsUndirected graph topology of a base layer network wherein /> and />Respectively node and edge sets, +.>Representing the number of nodes>Representing a neighbor node set of node i; the input of the EGAT layer is defined as node feature matrix +.> Sum-edge feature matrix->Wherein node characteristic column vector->ψ _h The number of node features; edge feature column vector +.>ψ _e The number of edge features; and the target output of the EGAT layer is a new node characteristic matrix for aggregating the characteristics of the neighbor nodes and the side information connected with the neighborhood> And a new edge feature matrix E '= [ …, E ]' _ab ,…] ^T ；

First, a learning weight matrix is used and />Respectively carrying out linear transformation on the two input feature matrixes, and successfully mapping the two input feature matrixes into a high-dimensional space with the respective dimension of psi' > psi; the modified single-head diagram attention mechanism p:>the method is applied to the high-dimensional feature space, and is characterized in that a feedforward neural network of a Leaky-ReLU activation function is configured for calculation, and a softmax layer is used for normalization processing of the calculated attention coefficient, so that the following formula is obtained:

wherein ,weight vector representing a single layer feedforward neural network>Performing transposition; [ x ] y]A concatenation operation representing the vector x and the vector y; the normalization with the softmax function is expressed as +.>The definition of the nonlinear activation function LeakyReLU (x) is as follows:

wherein the constant k can make the ReLU more biased to activate in the negative region;

finally, a nonlinear activation function sigma (such as ELU function) is applied to a weighted summation process of neighborhood nodes and edge features under a single-head attention P mechanism, and finally P attention head output node features are spliced to obtain a new node feature vector carrying the edge information of a connection neighborhood, wherein the new node feature vector is specifically expressed as:

the updating process of the edge features is similar to the information aggregation process of the node features, the adjacent edge features are embedded into the corresponding nodes by using a weight matrix, and the attention coefficients of the edges (a, b) are calculated by adopting the same graph attention mechanism as follows:

the hidden state is calculated by using P attention mechanisms, and the edge feature vector embedded on the node a is obtained, and is defined as follows:

finally, the application of the multi-layer perceptron (Multilayer Perceptron, MLP) to obtain higher-level edge features is:

e' _ab ＝MLP(h' _a ,h' _b ,e' _a ,e' _b ,e _ab )。 (9)

preferably, in step 2, the deep reinforcement learning algorithm includes a deep reinforcement learning decision based on a dual-delay deep deterministic strategy gradient algorithm, wherein the time slot t feature extraction network adopts an edge-aggregated graph attention network and transfer learning, the input data is spatial features and network traffic information of the current base layer physical network G, a virtual network topology map SFC representing a service function chain, and the aggregated neighbor node information and edge features construct an efficient hidden state representationIn the form, a new state matrix s= (S) is obtained by stitching _G ,S _SFC )＝([H' _G ,E' _G ],[H' _SFC ,E' _SFC ]) As the input of the deep reinforcement learning algorithm, the method is beneficial to simplifying the complex state space and improving the DRL calculation efficiency; finally, a DRL algorithm is used for providing a flow arrangement scheme of deploying end-to-end slices for the upcoming user request, and the strategy is improved according to rewarding feedback of environmental state transition;

deep reinforcement learning based on a dual-delay depth deterministic strategy gradient algorithm also adopts an Actor-Critic architecture, and an Actor network and a Critic network are respectively expressed as and />Wherein θ and θ' represent the policy network and target network parameters in the Actor network, respectively, then φ ₁ ,φ ₂ and φ'₁ ,φ' ₂ Corresponding parameters of the evaluation network and the target network for the Critic network; when the action space is searched for in an iteration, the execution strategy of the Actor network is +.> wherein Represents extra Gaussian noise after clipping +>The method is mainly used for smoothing the objective function value to further reduce variance, so that the exploratory degree is improved; thus, the loss function of the algorithm agent is defined as follows:

wherein ,representing playback buffer from experience->Wherein z is an index of the collection of sample data; delta is a weight factor (default value of 0.75), assigning greater weight, y, to the Critic network currently requiring the calculation of the Loss value _l Is an objective function value,/->For the corresponding discount factor, and the Agent's deterministic policy pi _θ The update process of the gradient is expressed as:

compared with the prior art, the invention has the following beneficial effects:

the invention provides a deterministic end-to-end slice flow arranging strategy based on edge graph meaning force, which models network flows into service function chains constructed based on VNF-FG, allows the shared instance of the successfully deployed VNF to be reused preferentially when the VNF is placed in a slice so as to improve the resource utilization efficiency, designs an edge graph meaning force network to extract heterogeneous node state and space structure information on a communication link to further perfect the embedding of SFC, and finally trains through a DRL algorithm to obtain a reasonable flow arranging strategy, thereby being capable of effectively reducing the total cost of slice flow arranging, improving the total number of successful transmission of network flows, completing the optimal configuration of end-to-end slice deployment and the like, improving the utilization rate of slice resources, reducing the total cost of flow arranging, and effectively guaranteeing the end-to-end user experience quality of terminal equipment.

The invention models the service function chain dynamic deployment and route forwarding joint optimization problem based on the VNF-FG as a Markov decision process model, and adopts a dual-delay depth deterministic strategy gradient algorithm to carry out optimization solution on the Markov decision process model, thereby overcoming the defect of the existing deep reinforcement learning and providing an effective solution for end-to-end network slicing.

The end-to-end software network architecture comprises an infrastructure layer, a virtualization layer, a control layer and a service layer from bottom to top, and realizes full service between the control layer and the service layer by referring to the idea of service, thereby providing better service experience for users.

Drawings

FIG. 1 is a schematic diagram of an end-to-end software defined network architecture for an industrial Internet of things in an embodiment of the present invention;

FIG. 2 is a schematic diagram of an end-to-end slice traffic orchestration model according to an embodiment of the invention;

FIG. 3 is a schematic diagram of an end-to-end slice flow programming algorithm framework based on edge map annotation force deep reinforcement learning in an embodiment of the invention;

FIG. 4 is a flow chart of a deterministic end-to-end slice traffic orchestration strategy based on edge map annotation forces according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating a deterministic end-to-end slice traffic orchestration strategy based on edge map annotation forces, according to an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples. It should be noted that the specific embodiments of the present invention are only for describing the technical solution more clearly, and should not be taken as limiting the scope of the present invention.

In the field of industrial internet of things, offloading tasks to an edge network by a large-scale terminal device at the same time easily causes problems such as network congestion and data loss, and the like, and puts higher requirements on quality of service (QoS) and quality of experience (QoE) guarantee of network communication, so that an efficient end-to-end traffic scheduling optimization method is urgently needed. The traditional flow arrangement optimization method without intelligent decision making capability is difficult to adapt to the complex dynamic environment of the industrial Internet of things, and the real-time performance and reliability of factory production data can not be guaranteed by corresponding QoS.

In order to perform optimal configuration on task data flows, the invention provides a deterministic end-to-end slice flow scheduling strategy based on edge graph annotation forces, which comprises the following steps as shown in fig. 4 and 5:

step 1, constructing an end-to-end software defined network architecture oriented to the industrial Internet of things; the method mainly comprises the steps of from bottom to top dividing a physical infrastructure layer which is communicated with a wireless access network, an edge cloud to a core cloud and provides heterogeneous resources of computation and the like, an end-to-end slice virtualization layer which is constructed on basic network communication equipment, a control layer which is integrated by an elastic distributed programmable SDN controller and a service layer which provides diversified applications for the industrial Internet of things, and providing a virtualization technology and corresponding software and hardware support for realizing the arrangement of network data flows;

and 3, evaluating the selectable arrangement strategy by using an end-to-end QoS analysis framework based on deterministic network calculation to obtain an optimal end-to-end slice flow configuration scheme. FIG. 1 is a schematic diagram of an end-to-end software defined network architecture for an industrial Internet of things in an embodiment of the present invention; aiming at the cloud edge cooperative network architecture designed in the step (1) and capable of rapidly processing user data streams and completing end-to-end resource allocation, the cloud edge cooperative network architecture specifically comprises:

the end-to-end software defined network architecture for the industrial Internet of things is mainly divided into four layers from bottom to top: a physical infrastructure Layer (Physical Network Infrastructure Layer) that communicates the radio access network, edge cloud to the core cloud and provides heterogeneous resources for computing, networking, and storage, an End-to-End slice virtualization Layer (End-to-End Slicing Virtualization Layer) built on the base Layer network communication device, a control Layer (Controller Layer) integrated by the resilient distributed programmable SDN Controller, and a Service Layer (Service Layer) that provides diverse applications to the industrial internet of things.

Traffic orchestration policies work in end-to-end software defined network architecture: the dynamic flow arrangement system at the service layer extracts flow characteristics of user data flows through a northbound interface of the SDN controller, and establishes end-to-end network slices by transmitting intelligent management rules through a southbound interface of the SDN controller after QoS statistical analysis.

In the life cycle management process of end-to-end slicing, firstly, carrying out feature modeling and classification on network flow information in monitoring, constructing a feature extraction model based on a side drawing annotation force network, and modeling a single user request into a service function chain based on VNF-FG; then, searching a proper physical node for the VNF virtual node in the SFC generated before for deployment, and optimizing the target to minimize the deployment cost of the SFC: each physical serverCan be used forDeploying a plurality of different VNF instances and allocating required computational and memory resources, most of the VNF instances being sharable by all slices; meanwhile, in consideration of isolation among different tenants of different service slices, a part of VNF examples can only be shared by service function chains in the same slice; and finally, connecting the needed VNF examples according to the SFC routing information, namely mapping virtual connection in the VNF-FG to the corresponding physical link topology, and evaluating by using an end-to-end QoS performance analysis framework to meet the end-to-end service quality requirements of different services.

FIG. 2 is a schematic diagram of an end-to-end slice traffic orchestration model according to an embodiment of the invention; in the step (2), modeling is carried out on the service function chain dynamic deployment and route forwarding joint optimization problem based on the VNF-FG to obtain a Markov decision process model, and the following table 1 is defined for service function chain symbols:

TABLE 1

The deployment and connection order of VNFs in the SFC reflects the logic to handle network flow requests, whereas actions taken by the SFC orchestration at the current moment are affected by VNF instances that have been successfully deployed at a previous moment, i.e. the markov property is met.

The Markov decision process is a basic model for solving sequential decision problems in dynamic complex environments, and describes the random behavior exploration of single agents (agents) in a specified environment and acquires relevant dynamic information such as corresponding returns thereof, and is formally described asThe state space is a set of environmental states observed by the agent The method comprises the steps of configuring heterogeneous resources such as computing power, cache and the like of each physical service node in a base layer network and the characteristics of user data streams; action space->The action set possibly taken by the representative algorithm agent (Algorithmic Agent) is mainly divided into two parts, namely VNF placement and routing forwarding; state transition probability function For solving out time slot t algorithm intelligent agent selecting execution action +.>The flow arrangement system state of the rear industrial Internet of things is from ∈10>Transfer to->Probability of (2); reward functionThe method is used for dynamically deploying service function chains and optimizing rationality and effectiveness of decisions by combining route forwarding; and eta is the discount factor of the reward function.

Software-defined network service scene state of industrial Internet of things based on SDN/NFV network sliceIn the flow arrangement algorithm decision Agent (Agent) is +.>Is selected to perform an action->And obtain corresponding rewards->The final optimization objective is to maximize the system utility of traffic orchestration, then its state space, action space, rewards function, and strategy associated with state-action, etc. are defined as follows:

(1) state space: in the traffic arrangement, user traffic may be regarded as a service request for network resources, defined in the form of a service function chain. Therefore, before each time slot t starts, the Agent collects the complete characteristic information of the SFC corresponding to the network flow and the heterogeneous resource condition of the physical base layer network where the network flow is located, and the state information data are used for training and outputting to the service function chain dynamic deployment and route forwarding joint optimization problem (Joint Optimization Problem of Dynamic Deployment of VNF-FG-based Service Function Chains and Routing, JOSFET CDR) based on the VNF-FG, and the optimal arrangement strategy of the application scene is adopted. The environmental information that can be observed by the agent in the software defined traffic orchestration controller for time slot t includes the availability of computing resources and cache resources of the service function chain and physical nodes that characterize the user data flow, and the state space is defined as follows:

(2) action space: observing the current environmental state at time slot tAnd then, the Agent decides a deployment position and a route forwarding path mapping scheme of a service function chain constructed based on the VNF-FG according to the current strategy. Under the condition of executing traffic arrangement, the Agent firstly accurately finds the deployment position corresponding to each VNF virtual node in the service function chain and completes resource allocation, and then completes virtual link mapping of the VNF-FG according to the dependency relationship so that the network flow can smoothly pass through the constructed routing path, thus the action space of the arrangement Agent is defined as follows:

(3) bonus function: in order to meet the end-to-end QoS requirements of user data flows, the objective of the reward function is to maximize the number of network flows that can be serviced after orchestration while minimizing the total cost of traffic orchestration consisting of end-to-end delay and resource consumption, the reward function is defined as follows:

(4) strategy: policy pi:is a random environmental state->And the flow arrangement which the Agent may take in this state +.>The optimization target of the method is to evolve to the optimal strategy direction according to the returned rewarding value through continuous random exploration under the state space and the action space.

FIG. 3 is a schematic diagram of an end-to-end slice flow programming algorithm framework based on edge map annotation force deep reinforcement learning in an embodiment of the invention; the configuration for completing the user request to end-to-end slicing in the step (2) specifically comprises the following steps:

(1) constructing a feature extraction model (Feature extraction model based on edge-graph attention network) based on a side drawing annotation force network; the specific method comprises the following steps:

undirected graph topology for physical base layer networks wherein /> and />Respectively node and edge sets, +.>Representing the number of nodes>Representing a neighbor node set of node i; the input of the EGAT layer is defined as node feature matrix +.> Sum-edge feature matrix->Wherein node characteristic column vector->ψ _h The number of node features; edge feature column vector +.>ψ _e The number of edge features; and the target output of the EGAT layer is a new node characteristic matrix for aggregating the characteristics of the neighbor nodes and the side information connected with the neighborhood> And a new edge feature matrix E '= [ …, E ]' _ab ,…] ^T ；

finally, a nonlinear activation function sigma (ELU function is used in the embodiment of the invention) is applied to a weighted summation process of neighborhood nodes and edge features under a single-head attention P mechanism, and finally P attention head output node features are spliced to obtain a new node feature vector carrying the edge information of a connection neighborhood, wherein the new node feature vector is specifically expressed as:

e' _ab ＝MLP(h' _a ,h' _b ,e' _a ,e' _b ,e _ab )。 (9)

(2) deep reinforcement learning decision (TD 3-Based DRL for Decision-Making) based on a dual delay depth deterministic strategy gradient algorithm (Twin Delayed Deep Deterministic policy gradient, TD 3)

The deep reinforcement learning algorithm includes a deep reinforcement learning decision based on a dual-delay depth deterministic strategy gradient algorithm. In this embodiment, a service function chain dynamic deployment and route forwarding joint optimization problem based on VNF-FG is modeled as a markov decision process model, and a dual-delay depth deterministic strategy gradient algorithm is adopted to perform optimization solution on the markov decision process model. The time slot t feature extraction network adopts an edge aggregation graph attention network and transfer learning, input data is the spatial feature and network flow information of the current base layer physical network G, a virtual network topological graph SFC representing a service function chain, neighbor node information and edge feature are aggregated to construct an efficient hidden state representation form, and a new state matrix S= (S) is obtained through splicing _G ,S _SFC )＝([H' _G ,E' _G ],[H' _SFC ,E' _SFC ]) As the input of the deep reinforcement learning algorithm, the method is beneficial to simplifying the complex state space and improving the DRL calculation efficiency; finally, a DRL algorithm is used for providing a flow arrangement scheme of deploying end-to-end slices for the upcoming user request, and the strategy is improved according to rewarding feedback of environmental state transition;

deep reinforcement learning based on TD3 algorithm also adopts an Actor-Critic architecture, an Actor network and Critic networks are respectively represented as and />Wherein θ and θ' represent the policy network and target network parameters in the Actor network, respectively, then φ ₁ ,φ ₂ and φ'₁ ,φ' ₂ Corresponding parameters of the evaluation network and the target network for the Critic network; when the action space is searched for in an iteration, the execution strategy of the Actor network is +.> wherein />Represents extra Gaussian noise after clipping +> The method is mainly used for smoothing the objective function value to further reduce variance, so that the exploratory degree is improved; thus, the loss function of the algorithm agent is defined as follows:

wherein ,representing playback buffer from experience->In a collection of sample data taken, z is thisIndexing of batch sample data; delta is a weight factor (default value of 0.75), assigning greater weight, y, to the Critic network currently requiring the calculation of the Loss value _l Is an objective function value,/->For the corresponding discount factor, and the Agent's deterministic policy pi _θ The update process of the gradient is expressed as: />

FIG. 4 is a flow chart of a deterministic end-to-end slice traffic orchestration strategy based on edge map annotation forces according to an embodiment of the present invention; evaluating the arrangement strategy in the step (3), and selecting the optimal deployment scheme to finish the end-to-end slice configuration specifically comprises the following steps:

(1.1) in an industrial Internet of things end-to-end slice deployment communication scenario, using a leaky bucket rectifier to distribute all traffic F entering a VNF virtual node queue _q Arrival procedure A of (2) _q (t) constrained to a more accurate TSPEC (ρ, σ, γ, δ) model (Traffic Specification), the expression of the corresponding arrival curve is as follows:

wherein ,ρ_q Representing the expected value, sigma, of the long-term stable transmission rate _q Tolerance of burst data messages; and r is _q Representing the maximum transmission rate of the channel, delta _q Represents the maximum value of the amount of transmission data; network data flow F _q Service function chain SFC for providing service _q Inner ith VNF virtual node V' _i,q Is modeled as a delay-rate service curve, defined as:

wherein ,indicating the maximum delay time for an arriving flow to wait for the start of processing in the VNF serving node. The service rate (user data packet processing rate) of a VNF node is linearly related to the amount of allocated computational power resources, defined as wherein />Representing the amount of computational resources allocated to the node, delta is a service rate coefficient representing the ratio between the amount of computational resources and the service rate.

(1.2) due to the adoption of the shared VNF policy, we also need to deduce the node V 'through the VNF' _i,q Specific target network flow F of the aggregate flows of (a) _q The service profile obtained. At this time, the target network Flow to be analyzed is called Through Flow (TF), and the remaining flows entering the same queue are called Cross Flow (CF). To simplify the target flowThe CF stream is also typically constrained to a TSPEC (ρ, σ, γ, δ) model, expressed as:

and (2.1) integrating the arrival curve of the traffic and the service curve related definition of the traffic at the virtual node of the VNF, and mainly analyzing the end-to-end time delay upper bound and the traffic backlog of the network flow through a service function chain SFC formed by two modules of a wireless access network communication system and an application service communication system by using a deterministic network algorithm.

The arrival curve is a _q Network flow F of (t) _q Through the service curve s ₁ VNF of (t) ₁ After the node processing is finished, according to the weight parameters configured in advance and />Split into two flows to the VNF ₂ and VNF₆ . Can be equivalently the first arriving stream F _q Split into two arrival substreams-> and />Then independently receives the service of the VNF node, and the relation between the VNF node and the VNF node is as follows:

/>

assuming that all network flows accessing the VNF virtual nodes are rectified to a TSPEC (ρ, σ, γ, δ) model, the target network flow is F _q At the VNF ₁ Equivalent split molecular flow at the node is and />The rest of the cross flow is F _CF The arrival curve of the equivalent split molecular flow is therefore:

wherein L^packet Is the size of the data packet, and indicates that the burst transmission amount and the maximum transmission amount of the split sub-stream cannot be smaller than the size of a single data packet. Similarly, substreamsThrough VNF ₁ Node processing and then entering VNF ₂ After a node, a stream splitting is also required, so the network sub-stream is +.>Continuing to split equally into ++> and />Two substreams.

(2.2) suppose a target straight-through flow F _q And cross flow F _CF Shared FNF of passing _i In combination with service curves The network flow F can be deduced _q At FNF _i The upper bound of the cache backlog of the node is:

according to analysis of the antithetical couplet service system, the expression of the end-to-end equivalent service curve is as follows:

(2.3) remotely controlled end-to-end delay upper bound inequality is:

processing sub-streams for audio-video streamsEnd-to-end delay analysis of (2) is equivalent to +.> and />The maximum value in the end-to-end time delay of the two substreams, therefore, the processing function chain of the audio and video streams provides the end-to-end service curve for the two substreams as follows: />

Similarly, the remote controlled end-to-end delay upper boundFormalized definition of (c) is:

in summary, the service function end-to-end delay upper limit for realizing the user request is the maximum value of the two-part service end-to-end delay upper limit, and the final target network flow F _q The end-to-end delay experienced is:

according to the invention, based on a deterministic end-to-end slice flow arrangement strategy of edge graph meaning force, network flows are modeled into a service function chain constructed based on VNF-FG, so that a successfully deployed VNF sharing instance is preferentially reused when the VNF is placed in a slice, the resource utilization efficiency is improved, an edge graph meaning force network is designed to extract heterogeneous node state and space structure information on a communication link to further perfect SFC embedding, and finally a reasonable flow arrangement strategy is obtained through DRL algorithm training, so that the total slice flow arrangement cost can be effectively reduced, the total network flow successful transmission cost can be improved, the optimal configuration such as end-to-end slice deployment and the like can be rapidly completed, the slice resource utilization rate can be improved, the total flow arrangement cost can be reduced, and the end-to-end user experience quality of terminal equipment can be effectively ensured.

The foregoing description is directed to the preferred embodiments of the present invention, but the embodiments are not intended to limit the scope of the invention, and all equivalent changes or modifications made under the technical spirit of the present invention should be construed to fall within the scope of the present invention.

Claims

1. A deterministic end-to-end slice flow orchestration strategy based on edge map annotation forces, comprising the steps of:

2. The deterministic end-to-end slice traffic orchestration strategy based on edge graph annotation forces according to claim 1, wherein in step 1, an end-to-end software defined network architecture oriented to industrial internet of things is constructed, specifically comprising an infrastructure layer, a virtualization layer, a control layer and a service layer.

3. The deterministic end-to-end slice traffic orchestration strategy based on edge graph annotation forces according to claim 1, wherein end-to-end slice lifecycle management comprises:

then, searching a proper physical node for the VNF virtual node in the SFC generated before for deployment, and optimizing the target to minimize the deployment cost of the SFC: each physical server deploys multiple different VNF instances and allocates the required computational and storage resources,

4. The deterministic end-to-end slice traffic orchestration strategy based on edge graph annotation forces according to claim 1, wherein in step (2), a markov decision process model is obtained by modeling the joint optimization problem of dynamic deployment and route forwarding of VNF-FG-based service function chains; wherein, in service scene stateIn the flow arrangement algorithm, decision agent is added to the active space in time slot t>Is selected to perform an action->And obtain corresponding rewards->The final optimization objective is to maximize the system utility of traffic orchestration, then its state space, action space, rewards function, and state-action associated policies are defined as follows:

in the traffic arrangement, the user traffic is regarded as a service request for network resources, and is defined in the form of a service function chain, and the state space is defined as follows:

wherein, the collectionDetails of all network flows in the slot tservice queue are recorded, < >> The method comprises the steps of containing graph structure information corresponding to service function chains and having strict end-to-end slicing service quality requirements on heterogeneous computing power resources, end-to-end time delay, buffer pressure upper bound and the like; />Status information representing the physical network topology at the time slot t-base layer;

the action space of the orchestrating agent is defined as follows:

the bonus function is defined as follows:

strategyIs a random environmental state->And the flow arrangement which the Agent may take in this state +.>The optimization target of the method is to evolve to the optimal strategy direction according to the returned rewarding value through continuous random exploration under the state space and the action space.

5. A deterministic end-to-end slice traffic orchestration strategy based on edge graph annotation forces according to claim 3, wherein the method of constructing the feature extraction model based on edge graph annotation forces network is:

undirected graph topology for physical base layer networks wherein /> and />Respectively a set of nodes and edges,representing the number of nodes>Representing a neighbor node set of node i; the input of the EGAT layer is defined as a node feature matrix Sum-edge feature matrix->Wherein node characteristic column vector->ψ _h The number of node features; edge feature column vector +.>ψ _e The number of edge features; while the target output of the EGAT layer is to aggregate the characteristics of the neighbor nodes and connect with the neighborhoodNovel node characteristic matrix of side information of (a) And a new edge feature matrix E '= [ …, E ]' _ab ,…] ^T ；

First, a learning weight matrix is used and />Respectively carrying out linear transformation on the two input feature matrixes, and successfully mapping the two input feature matrixes into a high-dimensional space with the respective dimension of psi' > psi; the modified single-head diagram attention mechanism is then +.>The method is applied to the high-dimensional feature space, and is characterized in that a feedforward neural network of a Leaky-ReLU activation function is configured for calculation, and a softmax layer is used for normalization processing of the calculated attention coefficient, so that the following formula is obtained:

finally, the nonlinear activation function sigma is applied to a weighted summation process of neighborhood nodes and edge features under a single-head attention P mechanism, and finally P attention head output node features are spliced to obtain a new node feature vector carrying the edge information of a connection neighborhood, wherein the new node feature vector is specifically expressed as:

e' _ab ＝MLP(h' _a ,h' _b ,e' _a ,e' _b ,e _ab )。 (9)

6. the deterministic end-to-end slice traffic orchestration strategy based on edge graph annotation forces according to claim 1, wherein in step 2, the deep reinforcement learning algorithm comprises a deep reinforcement learning decision based on a dual-delay deep deterministic strategy gradient algorithm, wherein the time slot t feature extraction network adopts an edge-aggregated graph attention network and transfer learning, the input data is spatial features and network traffic information of the current base layer physical network G, a virtual network topology graph SFC representing a service function chain, an efficient hidden state representation is constructed by aggregating neighbor node information and edge features, and a new state matrix s= (S = _G ,S _SFC )＝([H' _G ,E' _G ],[H' _SFC ,E' _SFC ]) As input to a deep reinforcement learning algorithm; finally, a DRL algorithm is used for providing a flow arrangement scheme of deploying end-to-end slices for the upcoming user request, and the strategy is improved according to rewarding feedback of environmental state transition;

deep reinforcement learning based on a dual-delay depth deterministic strategy gradient algorithm also adopts an Actor-Critic architecture, and an Actor network and a Critic network are respectively expressed as and />Wherein θ and θ' represent the policy network and target network parameters in the Actor network, respectively, then φ ₁ ,φ ₂ and φ'₁ ,φ' ₂ Corresponding parameters of the evaluation network and the target network for the Critic network; when the action space is searched for in an iteration, the execution strategy of the Actor network is +.> wherein />Represents extra Gaussian noise after clipping +>The objective function value is used for smoothing to further reduce variance, so that the exploratory degree is improved; thus, the loss function of the algorithm agent is defined as follows:

wherein ,representing playback buffer from experience->Wherein z is an index of the collection of sample data; delta is a weight factor, and larger weight is allocated to the Critic network which needs to calculate the Loss value at present, y _l Is the value of the objective function,for the corresponding discount factor, and the Agent's deterministic policy pi _θ The update process of the gradient is expressed as: