CN115361726B

CN115361726B - Intelligent access control method and system for network slice

Info

Publication number: CN115361726B
Application number: CN202210540268.9A
Authority: CN
Inventors: 孙罡; 王宇辉; 李晴; 任婧; 虞红芳; 孙健
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2024-03-01
Anticipated expiration: 2042-05-17
Also published as: CN115361726A

Abstract

The invention discloses a network slice intelligent access control method and system, and belongs to the technical field of communication. The invention provides a cross-time window intelligent access control method and a cross-time window intelligent access control system for network slices according to the characteristic of dynamic change of network slice resource requirements.

Description

Intelligent access control method and system for network slice

Technical Field

The invention belongs to the technical field of communication, and particularly relates to a network slice intelligent access control method and system.

Background

The fifth generation telecommunication network (5G) has driven the development of various applications such as autopilot, unmanned aerial vehicle, telemedicine, mass Internet of things and the like. Service application scenes in the 5G era can be mainly divided into three major categories, namely: enhanced mobile broadband (enhance Mobile Broadband, emmbb), large-scale Machine-type communication (mctc), and ultra-Reliable Low-latency and Low-Latency Communication (uilllc). Different service types have different network performance requirements. The emmbb requires a high communication rate, the URLLC requires extremely reliable, low latency, and mctc has low rate, tolerable latency, and high dynamic characteristics, which means that the deployment of network functions and the demand of network resources are highly heterogeneous for different types of application scenarios.

As a key technology of 5G, the network slicing technology enables the 5G network to flexibly and efficiently meet heterogeneous demands of different users, and by the network slicing technology, a physical network infrastructure is divided into a plurality of virtual network slices independent of each other, and each virtual network slice is a logical network customized for network service demands. Network slicing techniques increase resource efficiency while its heterogeneity results in high complexity of network resource management. Resource allocation for network slices presents various challenges in terms of isolation (inter-slice), customization, resiliency, and end-to-end coordination.

In resource management of a network slicing system, access control is indispensable, and is important for avoiding overload of traffic load and ensuring stability of the system. The decision problems of whether a new slice request is accessed and how to access the new slice request form a core task of access control, and the principle is as follows: the new access request should not be at the expense of the service quality of the existing slice, and the network slice access control technology with the effective perfection can still well ensure the service quality of the existing slice and maximally improve the utilization rate of system resources under the condition of limited system capacity. I.e. no longer accommodates new connection requests when the system capacity is in saturation, to guarantee the service level agreement (Service Level Agreement, SLA) requirements of the existing users; when the system capacity is not saturated, as many new slice requests as possible are accommodated to fully utilize the network resources.

Infrastructure providers and service providers benefit from being able to share physical resources and have a degree of resilience to network slicing. But when multiple slices are present at the same time, they also cause overload of the shared resource pool, which in turn causes SLA violations to occur, which means that the involved clients are paid fines, which risk must be taken into account as an opportunistic cost of maintaining the slices. In extreme cases, the opportunity cost of accepting requests for new network slice instances is higher than the revenue generated by the corresponding slices, and therefore, network slice access control cannot use a greedy algorithm that is too aggressive. On the other hand, too conservative in admission control may also result in losses of infrastructure providers or service providers, one, the conservative access policy naturally means low resource utilization and low revenue, and the second, the acceptance of the conservative access policy is low, which may lead to serious request congestion and user churn. In order to achieve the best balance between resource flexibility and network slice access rate, service providers must have a profound knowledge of customer behavior, including characteristics of active slices (e.g., dynamically changing resource requirements, duration distributions, etc.) and characteristics of network slice requests (e.g., arrival rate, user patience, etc.).

In the prior art, the auction model is utilized to carry out network slice access control, the access control method based on the auction avoids the high complexity of the access control problem through the carefully designed auction rule and allocation rule, reduces the computational complexity, and as the auction mechanism completes the resource allocation decision while carrying out the access control, after the auction is successful, the auctioneer must allocate the auctioneer to the auctioneer according to the fixed requirement. The defects are that: auction-based access control methods are subsequently often used in combination with static resource allocation methods, such that access control decisions are highly coupled with resource allocation decisions, which, if the auction is successful, means that the corresponding request is accessed and subsequently will be allocated a fixed amount of network resources, resulting in a low network resource utilization.

Also included in the prior art is reinforcement learning, which is an efficient method based on empirical assistance decisions, that can be used to assist access control decisions for network slice requests with low computational complexity. The network slice access decision is modeled as a semi-Markov decision process, and an adaptive network slice access algorithm based on Q learning is designed to search the optimal decision of network slice request access control, so that the benefit of an infrastructure provider is maximized while the service quality of the network slice is ensured, and the Q learning algorithm has environment self-adaptation capability and can keep suboptimal decisions in a continuously-changing environment. The defects are that: q learning algorithms lack scalability. When the state space becomes too large, the training time of Q learning can be rapidly increased, so that the problem that whether a network slice request is accessed is only solved, the long-term optimization of the time dimension is not considered, the problem of when the network slice request is accessed to be optimal is not considered, and the exploration performance and the convergence performance are insufficient in a complex scene.

Disclosure of Invention

Aiming at the defects in the prior art, the intelligent access control method and the intelligent access control system for the network slice provided by the invention solve the problems of how to design an efficient and intelligent access control system for the network slice under the condition of time-varying and heterogeneous resource requirements of the network slice, thereby avoiding wasting network resources and finally improving the access rate, the system capacity and the benefit of service providers of the network slice on the premise of ensuring SLA.

In order to achieve the aim of the invention, the invention adopts the following technical scheme: a network slice intelligent access control method comprises the following steps:

s1, a preparation stage: initializing parameters, and respectively training a resource demand predictor and an access control intelligent agent;

s2, operation stage: and according to the trained resource demand predictor and the access control intelligent agent, predicting the resource occupation amount and requesting access control decision, updating the resource occupation amount and carrying out instantiation and configuration on the network slice according to the request access control decision, and carrying out online training on the access control intelligent agent to complete the control of intelligent access of the network slice.

Further, the step S1 includes the steps of:

s101, initializing parameters, wherein the parameters are Comprising the following steps: access control periodPenalty force for service provider to violate SLA>Discount factor->Node resource upper limit allocated to network slice +.>And link resource upper limit assigned to network slice +.>；

S102, taking the resource demand data as a training set of a resource demand predictor LSTM-P, and taking the following formula as a loss function of a training process to train the resource demand predictor LSTM-P:

wherein,loss function representing resource demand predictor LSTM-P +.>Representing a prediction window->Representing the respective moments within the prediction window, +.>Representing the influence of prediction errors on the benefit, +.>A true value representing network slice resource requirements, < >>A predicted value representing network slice resource requirements;

s103, according to the access control periodA state space and an action space are set>And determining the neuron numbers of an input layer and an output layer, constructing an access control intelligent agent D3QN, and pre-training the access control intelligent agent D3 QN.

Still further, the step S102 includes the steps of:

s1021, taking historical network slice node resource demand data as a training set of a node resource demand predictor LSTM-P; or (b)

Taking the link resource demand data as a training set of a link resource demand predictor LSTM-P;

S1022, training the resource demand predictor LSTM-P by utilizing the training set.

Still further, the pre-training the access control agent D3QN in step S103 includes the following steps:

a1, performing preferential experience playback by using a reinforcement learning method sum-tree, and obtaining the experience from an experience poolExtracting experience bars;

a2, sampling noise parameters according to noise network parameter distribution requirements for action selection when accessing to a behavior network, a target network and calculating training labels of the control intelligent agent D3QN respectively、/>And->；

A3, optimizing the neural network model parameters one by using the extracted experience strips, and judging the firstExperience stripMiddle->Whether the state is a termination state, if so, controlling training label ++Tight of agent D3QN>Otherwise, calculating and controlling the intelligent agent D3QN training label according to the following formula>：

Wherein,indicating the current state +.>Representing the action taken,/->Rewards representing this action, ++>Representing the next state after taking action, +.>Representing discount factors->Representing neural netsQ value of the complex output->Representing the optimal action of the next state corresponding to the maximum Q value, < >>Indicating the optimal action of the next state, +.>Representing access control agent D3QN behavioural network parameters, +.>Representing access control agent target network parameters;

A4, calculating a loss function of the intelligent agent D3QN according to the training label, performing gradient descent processing on the loss function, copying parameters of the behavior network to the target network in a preset period, and completing pre-training of the access control intelligent agent D3 QN:

wherein,representing the loss function of the access control agent D3 QN.

Still further, the step S2 includes the steps of:

s201, setting each access control period as when operating for service providerAnd judges whether or not +/at each access control moment>Executing an access control decision, if yes, entering a step S202, otherwise, accessing the control decision for the non-access control moment, and entering a step S206;

s202, at the access control momentPredicting future time window using resource demand predictor LSTM-P for each active network slice in the system>Node resource occupation and link resource occupation in the network;

s203, calculating a future time window according to the node resource occupation amount and the link resource occupation amountThe total node resource quantity and the total link resource quantity to be occupied by all active network slices in the system at each moment in time, and the environment state of the access control intelligent agent D3QN is updated according to the total node resource quantity and the total link resource quantity >Wherein the environmental stateIncluding network slice request information->Resource requirement upper limit required for new network slice request +.>And the amount of resources reserved for the existing active network slice +.>Wherein->、/>、/>And->All represent the length of access control time window +.>One-dimensional vector of size, +.>Represent the firstiNetwork slice request->Indicating the waiting time for the network slice request.

S204, according to the updated environment state of the access control agent D3QN, starting from the head of the network slice request pool, sequentially judging whether the request decides an access control decision for the network slice requests in the queue, if so, determining the network slice instantiation time corresponding to the request, signing a corresponding SLA, and entering a step S205, otherwise, continuously judging whether the next request decides the access control decision until all the requests in the network slice request pool are traversed, and entering the step S205;

s205, after the access control decision aiming at the access network slice, calculating the node resource quantity and the link resource quantity occupied by the new access network slice, and updating the environment state of the access control intelligent agent D3QNJudging whether all access control decision requests in the network slice request pool are traversed, if yes, entering step S206, otherwise, returning to step S201;

S206, judging whether the system runs to the instantiation time of the access control decision, if so, instantiating a corresponding network slice according to the corresponding SLA requirement, configuring corresponding node resources and link resources, and entering step S207, otherwise, entering step S208;

s207, according to the instantiation and resource allocation of the network slice, counting the network slice information of the service completed by the system, calculating rewards of the network slice of the service completed, and storing the network slice information and rewards of the service completed into an experience poolWherein the network slice information for completing the service includes: the state of the intelligent agent D3QN is accessed before and after the network slice is accessed, and the action of selecting the intelligent agent D3QN is accessed when the network slice is accessed;

s208, from experience poolThe experience bar is extracted, and the access control intelligent agent D3QN is trained;

s209, judging whether the operation period is in the operation period, if so, returning to the step S201, otherwise, closing the network slicing system to complete the control of intelligent network slicing access after all SLAs have been signed according to the trained access control intelligent agent D3 QN.

Still further, the expression of the total node resource amount and the total link resource amount in the step S203 is as follows:

Wherein,representing the total node resource amount occupied by the active network slice, < >>Representing the number of active network slices in the system, +.>Represent the firstiActive network slice,/->Indicating the total amount of link resources occupied by the active network slice, < >>Representing the respective moments in the future time window, < +.>Represent the firstiActive network slice is->Node resource amount to be occupied at moment, +.>Represent the firstiActive slice in->Link resources to be occupied at the moment.

Still further, the expression of the node resource amount and the link resource amount in the step S205 is as follows:

wherein,indicating +.>，/>Indicating access to the firstiPost-network slice->Node resource demand upper limit at moment, +.>Indicating access to the firstiBefore slicing the networktNode resource demand upper limit at moment, +.>Represent the firstiIndividual network slice is attMaximum value of node resource quantity which can be obtained by time network slice,/-for>Indicating access to the firstiAfter slicing the networktLink resource requirement upper limit of time, +.>Indicating access to the firstiBefore slicing the networktLink resource requirement upper limit of time, +.>Represent the firstiIndividual network slice is attThe maximum amount of link resources available at the moment.

Based on the method, the invention also provides a network slice intelligent access control system, which comprises the following steps:

The first processing module is used for initializing parameters and respectively training a resource demand predictor and an access control intelligent agent;

and the second processing module is used for predicting the resource occupation amount and requesting access control decision according to the trained resource demand predictor and the access control intelligent agent, updating the resource occupation amount and carrying out instantiation and configuration on the network slice according to the requested access control decision, and carrying out online training on the access control intelligent agent to complete the control of the intelligent access of the network slice.

The beneficial effects of the invention are as follows:

(1) The invention utilizes a resource demand predictor LSTM-P to predict available resources in a next access control time window, provides an information basis for fully utilizing idle resource blocks crossing the time window, utilizes deep reinforcement learning to realize autonomous decision-making of network slice access control, can effectively schedule network slice instantiation time in the time window, avoids excessively conservative to cause low acceptance rate and excessively aggressive to cause high default rate, fully utilizes network resources and improves network slice scheduling system capacity.

(2) The invention has the advantage of high access rate of the network slice. The actual resource requirement of the network slice is dynamically changed, and the traditional access control technology regards the resource requirement of the network slice as a larger fixed value, and the adopted access strategy is too conservative.

(3) The invention has the advantage of low SLA violation rate. The invention designs the access control agent D3QN, which can adapt to different network environments and load conditions, intelligently schedule network slice access and reduce SLA violation rate.

(4) The invention has the advantage of high resource utilization rate. The network slice access control module constructed by the invention can dynamically schedule network node resources and link resources at each time point by utilizing the resource demand prediction information, avoids resource waste during access, improves the resource utilization rate and reduces the network slice resource cost.

(5) The invention has the advantage of long waiting time of the user. The access control algorithm provided by the invention can utilize the prediction information to assist the access decision, avoid the waste of system resources caused by the excessively conservative access strategy, and reduce the waiting time of the user request.

(6) The invention has the advantage of large system capacity. The high-efficiency access control module and the resource demand prediction module adopted by the invention work together to improve the network slicing capacity of the network slicing system.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a schematic diagram of a system structure according to the present invention.

Fig. 3 is a schematic diagram of a network slice intelligent access control system in this embodiment.

Fig. 4 is a schematic diagram of a neural network structure of an access control agent D3QN in this embodiment.

Fig. 5 is a schematic diagram of access control agent D3QN access decision control in the present embodiment.

Fig. 6 is a flow chart of the network slice intelligent access control system in this embodiment.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.

As shown in fig. 1, the invention provides a network slice intelligent access control method, which comprises the following steps:

s1, a preparation stage: initializing parameters, respectively training a resource demand predictor and an access control intelligent agent, and realizing the method as follows:

s101, initializing parameters, wherein the parameters comprise: access control periodPenalty force for service provider to violate SLA>Discount factor->Node resource upper limit allocated to network slice +.>And link resource upper limit assigned to network slice +.>；

wherein,loss function representing resource demand predictor LSTM-P +.>Representing a prediction window->Representing the respective moments within the prediction window, +.>Representing the influence of prediction errors on the benefit, +.>A true value representing network slice resource requirements, < >>The predicted value representing the resource requirement of the network slice is realized by the following steps:

S1022, training a resource demand predictor LSTM-P by utilizing the training set;

s103, according to the access control periodA state space and an action space are set>Determining the number of neurons of an input layer and an output layer, and building an access control intelligent agentD3QN, and pretraining the access control agent D3QN, wherein pretraining the access control agent D3QN comprises the steps of:

the pre-training of the access control agent D3QN comprises the following steps:

Wherein,indicating the current state +.>Representing the action taken,/->Rewards representing this action, ++>Representing the next state after taking action, +. >Representing discount factors->Q value representing neural network output, +.>Representing the optimal action of the next state corresponding to the maximum Q value, < >>Indicating the optimal action of the next state, +.>Representing access control agent D3QN behavioural network parameters, +.>Representing access control agent target network parameters;

wherein,a loss function representing the access control agent D3 QN;

s2, operation stage: according to the trained resource demand predictor and the access control intelligent agent, the resource occupation amount and the request access control decision are estimated, the resource occupation amount is updated according to the request access control decision, the network slice is instantiated and configured, the access control intelligent agent is trained on line, and the intelligent access control of the network slice is completed, and the implementation method is as follows:

s201, setting each access control period as when operating for service providerAnd judges whether or not +/at each access control moment >Executing an access control decision, if yes, entering a step S202, otherwise, accessing the control decision for the non-access control moment, and entering a step S206;

s203, calculating a future time window according to the node resource occupation amount and the link resource occupation amountThe total node resource quantity and the total link resource quantity to be occupied by all active network slices in the system at each moment in time, and the access control intelligent agent is updated according to the total node resource quantity and the total link resource quantityEnvironmental status of D3 QN->Wherein the environmental stateIncluding network slice request information->Resource requirement upper limit required for new network slice request +.>And the amount of resources reserved for the existing active network slice +.>Wherein->、/>、/>And->All represent the length of access control time window +.>One-dimensional vector of size, +.>Represent the firstiNetwork slice request->Indicating the waiting time for the network slice request.

The expression of the total node resource amount and the total link resource amount is as follows:

wherein,representing the total node resource amount occupied by the active network slice, < > >Representing the number of active network slices in the system, +.>Represent the firstiActive network slice,/->Indicating the total amount of link resources occupied by the active network slice, < >>Representing the respective moments in the future time window, < +.>Represent the firstiActive network slice is->Node resource amount to be occupied at moment, +.>Represent the firstiActive slice in->The amount of link resources to be occupied at the moment;

The expressions of the node resource amount and the link resource amount are as follows:

wherein,indicating +.>，/>Indicating access to the firstiPost-network slice->Node resource demand upper limit at moment, +.>Indicating access to the firstiBefore slicing the networktNode resource demand upper limit at moment, +.>Represent the firstiIndividual network slice is attMaximum value of node resource quantity which can be obtained by time network slice,/-for>Indicating access to the firstiAfter slicing the networktLink resource requirement upper limit of time, +.>Indicating access to the firstiBefore slicing the networktLink resource requirement upper limit of time, +.>Represent the firstiIndividual network slice is attThe maximum value of link resource quantity which can be obtained at the moment;

The invention provides a cross-time window intelligent access control method for network slices, which aims at the characteristic of dynamic change of network slice resource requirements.

Example 2

As shown in fig. 2, the present invention provides a network slice intelligent access control system, which includes:

In one embodiment of the invention, the invention provides a network slice access control system based on deep reinforcement learning, which uses a neural network method to analyze the use condition of resources and the condition of a network slice request pool so as to autonomously decide whether each network slice request is accessed or not and instantiate time by an agent for reinforcement learning. The intelligent access control system for the network slice mainly comprises access control and network resource demand prediction. Wherein the access control will decide which network slice request the service provider (Service Providers, SP) accesses and when to instantiate the corresponding network slice in the system to guarantee its SLA requirements. After the SP instantiates the network slice according to the requirement, the resource use condition of the active network slice in the system and the global resource use condition of the system are monitored. Active network slices refer to network slices in the system that have not been destroyed after instantiation, which will occupy network resources during their lifecycle to implement the corresponding network functions. Based on the resource usage of the active network slices, the network resource demand prediction will predict the network resource demand of each active network slice in the next access control period. By utilizing the total resource occupation condition of the existing active network slice, the system realizes the network slice request access control decision across the access control time window, fully utilizes the idle resource blocks across the access control time window and improves the capacity of the network slice system.

The system has the advantages that: (1) The autonomous decision of network slice access control is realized by deep reinforcement learning, and the network slice instantiation time can be effectively scheduled in a time window, so that the low acceptance rate caused by over conservation and the high default rate caused by over excitation are avoided, and the system slice access rate and the income are improved; (2) And predicting available resources in the next access control time window by using a resource demand predictor LSTM-P, and providing an information basis for fully utilizing idle resource blocks crossing the time window.

In one embodiment of the present invention, as shown in fig. 3, fig. 3 is a schematic diagram of a network slice intelligent access control system. The service provider leases a certain amount of physical network or virtual network resources from the infrastructure provider, provides three types of slicing services for users, divides network slicing requests into three queues according to types, and fills different representative network slicing types; the arrow in the access control time window represents the network slice request instantiation time of the access control module decision. In demand prediction, the solid line part represents real historical data and is obtained by performance monitoring; the dashed line represents the future demand predicted based on historical data.

In one embodiment of the invention, the SP provides various types of network slices to the user based on the physical network using network slicing techniques. Based on 5G three major application scenes eMBB, mMTC and uRLLC proposed by the third generation partnership project (3rd Generation Partnership Project,3GPP), the corresponding three types of slices form a typical scene with heterogeneous network slice types. Different types of network slice performance requirements are different, the resource overhead modes are different, and the user charges are different. The network slice request submitted by the user to the SP should include the network slice type and network slice duration of its request toThe first term represents the type of slice requested by the user and the second term represents the type of slice requested by the userThe duration of the sheet. The network slice charge charged by the SP to the user is related to the type of network slice requested by the user and the required length of the network slice.

In one embodiment of the invention, a network slice request issued by a user will be temporarily stored in the request pool of the system. At intervals, the SP uniformly executes access control once, makes access decisions for all network slice requests in the request pool, and the time interval of each access control is called as an access control period . The decision content comprises: each network slice request is accessed, and if the access is when to instantiate the corresponding slice to meet the requirements, the network slice instantiation time should be within the access control time window. />An access control decision made at a moment that determines the moment when an accessed network slice requests to instantiate a corresponding slice>. When the slicing requirements are excessive, network slicing request congestion is likely to occur, and the waiting time of part of users is long, so that the users are lost. The longest waiting time of the request access which can be accepted by the user is recorded as +.>. If the user request is waiting +.>And the time is still not accessed by the intelligent access control system of the network slice, the network slice request is automatically withdrawn, and the network slice system is left.

In one embodiment of the invention, when the access control is decided, the system accesses the network slicing request as much as possible on the premise of ensuring the SLA requirement of each access request so as to avoid idle system resources and excessive user loss, thereby realizing maximization of request access rate, system resource utilization rate and long-term profit.

In one embodiment of the invention, the invention is access control based on deep reinforcement learning. In the face of resource time-varying heterogeneous network slice requests, the invention designs a network slice access control module based on deep reinforcement learning, and an intelligent agent for reinforcement learning makes an autonomous decision. In reinforcement learning, an agent makes a decision about a fixed problem based on the observed state of the environment, after the access control agent D3QN performs a decision-making action, the state of the environment is transformed according to the environment model thereof and rewards or penalizes the access control agent D3QN, in the process of interacting with the environment, the access control agent D3QN continuously accumulates action experience, learns an optimal action strategy capable of maximizing long-term rewards from the experience, and under the guidance of the strategy, the access control agent D3QN can make an optimal decision in the current state in different states of the environment, and the optimal action is selected.

In one embodiment of the invention, the access control agent D3QN interacts with the environment at a point in time. At a certain interaction time, the access control intelligent agent D3QN obtains environmental state informationWherein->Is a set of all possible environmental states. Then, the access control agent D3QN is selected from the discrete action set +.>Is selected to be an action->And then to obtain the rewards of this action>The goal of the access control agent D3QN activity is to maximize long term benefits.

In one embodiment of the invention, in access control, the access control agent D3QN decides at each access control decisionDetermining whether all network slice requests in the request pool are accessed and the slice instantiation time after the access so as to learn effective action strategies more quickly, and the access control agent D3QN sequentially decides whether each network slice request in the request pool is accessed and the slice instantiation time after the access, wherein each action only decides one network slice request. Specifically, the action space is defined asWherein->On behalf of refusing access to the request, the remainder of the actionsOn behalf of accessing the request and instantiating the corresponding network slice at the corresponding time slot.

In one embodiment of the invention, the environmental status of the network slice access control problem is modeled as the current processing network slice request and the current network resource usage, wherein the network resources include node resources and link resources, the current access control decision is the first The status of the individual network slice request is denoted +.>. The currently processed network slice request details include the type of network slice request, the slice duration, and the user's waiting time to make the request, i.e. According to different occupation properties, the current network resource use condition is divided into the upper limit of the reserved resource quantity for the existing active slice and the resource quantity required by the new access slice, namely +.>，/>Andrepresenting the amount of resources occupied by active slices in the system within a future period of time predicted by a resource demand predictor,/>And->Before the access control decision>The upper limit of node resources and the upper limit of link resources required by the total of the accessed network slice requests are determined in the requests.

In one embodiment of the invention, the access control agent D3QN will learn an action strategy that maximizes long-term rewards in the process of constant exploration. Thus, considering that the optimization objective of the present invention is to maximize SP benefit, a reward function is defined to generate a corresponding benefit upon access or rejection of a network slice request. Wherein the method comprises the steps ofPay the price of the network slice to the service provider for the user +.>Is the cost of network slicing using network resources, +.>Representing the amount of default to be paid when a slice is violated, and +. >。

（1）

In one embodiment of the invention, the access control agent D3QN is designed. The access control agent D3QN adopts a NoisyNet-lasting neural network structure and is updated in the DDQN manner. This is referred to herein as Double Dueling NoisyNet DQN (D3 QN), whose neural network model is shown in fig. 4. The D3QN neural network model comprises an input layer, two full-connection layers (hidden layers) for extracting deep features, a single neuron full-connection layer for independently estimating a state value function, a full-connection layer for independently estimating an action dominant function, and an output layer for calculating a Q value by combining the state value function and the action dominant value function.

In one embodiment of the present invention, a batch of noise parameters is sampled according to the noise network parameter distribution requirements and the neural network calculations are continued using the noise parameters before each training of the access control agent D3QN and calculating the optimal actions. The noise network introduces strategy randomness by adding noise to the reinforcement learning neural network model parameters, so that the environment is effectively explored, the performance is improved, the noise network can be used for optimizing the exploration process of the reinforcement learning method, and an external randomness source is provided for reinforcement learning environment exploration.

In one embodiment of the invention, noisyNet adds Gaussian noise to the weights and biases of the deep neural network, the noise weights and biases shaped asWherein->Is a group of parameters which can be learned by training, < >>A gaussian white noise vector with an average value of 0. To simplify the expression, the expression is unified with +.>Representing neural network parameters. Obtaining a batch of noise parameters from noise distribution of the noise network during each training optimization>Then a noise net is obtained by using Monte Carlo approximationThe gradient of the complex is shown in formula (2). No need to use->The greedy strategy explores the environmental model, but instead adopts a greedy strategy based on the noise network. And secondly, parameterizing the last layer of fully-connected network, namely a value function network, into a noise network, wherein parameters in the noise network acquire real values from noise distribution before each action and training optimization.

（2）

In one embodiment of the present invention, as shown in fig. 4, the number of neurons of the input layer in the access control agent D3QN is the same as the state vector size; 128 neurons are arranged on each full-connection layer in the hidden layer; the fully connected layer of the estimated state value function has 1 neuron; the number of neurons of the full-connection layer for estimating the action dominant function and the output layer for calculating the Q value is 16, and the number is the same as the action space. D3QN optimizes neural network model parameters using Adam optimizer.

In one embodiment of the invention, the status isAs input data of access control agent D3QN, after extracting features via two-layer fully connected network, calculating state value function ++>And action space->Action dominance function of each action in (a)>The full connection layer network output responsible for dominance function +.>The dimension vector is denoted->The scalar of the full-connection layer network output responsible for the state value function is noted +.>. Wherein (1)>Parameters representing shallow networks, +.>Parameters representing a fully connected layer network responsible for the dominance function,/->Parameters representing a fully connected layer network responsible for the state value function. Calculating and outputting Q value of each state action pair according to formula (3), namely +.>. The action with the maximum Q value is the best action in the current state->。

（3）

In the embodiment of the invention, the access control agent D3QN is trained, a priority experience playback mechanism is adopted to help the access control agent D3QN train and converge, during the training process of the access control agent D3QN, the access control agent D3QN uniformly extracts small batches of experience strips from an experience pool, an environment model and an advantageous action strategy are learned from the experience strips, the process of learning the experience strips generated by past behaviors is called experience playback, and the experience playback improves the data efficiency by repeatedly using experience samples in a plurality of updates. In the priority experience replay mechanism, the replay weight of the important experience bar is higher, so that the important experience bar has higher replay frequency, and the learning efficiency of the reinforcement learning agent is improved.

At the bookIn one embodiment of the invention, for the training process of the access control agent D3QN, priority experience playback is first performed by a sum-tree method, and the experience pool is used for the priority experience playbackSmall batches of experience bars are drawn. Then selecting sampling noise parameter according to noise network parameter distribution requirement for action when respectively behaving network, target network and calculating label>、/>And->And optimizing the neural network model parameters one by using the extracted experience strips. For->Strip experience strip->If->For the end state, training tag->Otherwise calculate training tag +.>. In formula (4)>For discounts factor->For the D3QN behavioural network parameters, +.>Is a D3QN target network parameter. Finally, according to the formula%5) The loss function is calculated and gradient descent is performed. Parameters of the behavioural network are regularly copied to the target network.

（4）

（5）

In one embodiment of the present invention, the access control agent D3QN access control decision process, as shown in fig. 5, uses the access control agent D3QN to perform a schematic diagram of a network slice access control decision. Acquiring network slice request information of current decision from network slice request poolCalculating the upper limit of resource requirement required by new network slice request from the historical decision result of the time window >Obtaining the resource quantity which needs to be reserved for the existing active network slice from the system resource demand prediction link>The three information together form the environmental state of the D3QN intelligent agent decision +.>。And->Are all in length of access control time window +.>One-dimensional vectors of size, which can be expressed asAnd->The calculation modes of the node resource requirement upper limit and the link resource requirement upper limit at each moment are shown in a formula (6). />And->The maximum value of node resource quantity and the maximum value of link resource quantity which can be obtained by the network slice corresponding to the request are respectively, and the values of different types of network slices are different, < >>And->Instantiation time and slice duration for the corresponding network slice, respectively,/->And->Also all are access control time windows of lengthA one-dimensional vector of size. />

（6）

In one embodiment of the present invention, as shown in fig. 5, the system obtains the network slice request information of the current decision from the request pool, obtains the environment information through the system resource demand prediction module, and inputs the environment information into the access control agent D3QN to obtain the access control action decision, and instantiates the network slice at the corresponding time point according to the action.

In one embodiment of the invention, the status will be collected from the system environmentAs input data of the access control intelligent agent D3QN behavior network, the neural network calculates the optimal action under the state, namely, the current network slice access control decision is completed, then, state transition occurs in the system environment, the resource demand of the new access slice and the network slice request details are changed, if yes, the network slice request details are changed, and the network slice request details are changed >，/>Number representing network slice, +.>Representing a set of network slice requests, < +.>And if not, continuing to finish the access control decision. />The calculation mode of the new access slice resource demand is shown in a formula (7),the details of the medium network slice request are the information of the next network slice request +.>. Thus far get the%>Strip experience strip->Store it in experience pool->. Experience pool->The experience strip in the process trains a D3QN model on line according to the access control intelligent agent D3QN training process,to continuously optimize its performance.

（7）

In one embodiment of the invention, the system resource demand is predicted, the network slice requests from the users are all stored in the request pool, the access control is based on the total resource quantity occupied by active slices in the system in a long time in the future, the pre-judging information is used for the user request access control decision, and the user request processing speed is improved.

In one embodiment of the invention, the invention builds an Encoder-Decoder long-term memory neural network (Encoder-Decoder LSTM) to improve the performance of multi-step time series prediction. For network slicing service, when the resource demand is higher than the actual demand, the high cost and low resource utilization rate of slicing service can be caused, but too low resource demand prediction can cause more SLA violations, and the SLA violations can cause high default punishments. Aiming at the characteristics, the invention carries out targeted design on the loss function of the Encoder-Decoder LSTM network, the loss function is shown in a formula (8), and a network slice resource demand predictor LSTM-P (Encoder-Decoder LSTM with Preference) with prediction tendency is built.

Wherein,

representing the penalty strength for a service provider to violate an SLA,/->For the true value of network slice resource requirements, +.>For the predicted value of network slice resource demand, +.>Loss function representing resource demand predictor LSTM-P +.>Representing a prediction window->Representing the respective moments within the prediction window, +.>Representing the influence of prediction errors on the benefit, +.>A true value representing network slice resource requirements, < >>A predicted value representing network slice resource requirements.

In one embodiment of the invention, LSTM-P is used to predict all active network slices in the system in the futureNode resource occupation amount and link resource occupation amount in time; then, calculate +.>The total node resource amount and the total link resource amount occupied by all active slices in the system in time; subsequently, it is decided in turn whether the request in the request queue is in future +.>Accessing in time, and determining the instantiation time of a corresponding network slice; and finally, updating the system node resource occupation amount and the link resource occupation amount after each time of determining to access a certain request.

In one embodiment of the inventionIn an embodiment, at the access control instantFor the system prediction time window +.>In the method, the calculation formulas of the total node resource quantity and the total link resource quantity occupied by the active slices in the system are as follows:

In one embodiment of the invention, the service provider uses historical network slice resource usage data as a training set,as a loss function in the training process, a gradient descent method is adopted to train the predictor LSTM-P, and the parameters of the neural network are continuously adjusted and the loss function value is reduced in the iterative training process so as to ensure the high accuracy and the low SLA violation rate of the predictor.

In one embodiment of the invention, the use condition of future network resources in the system is predicted in real time by using the built LSTM-P predictor according to the data collected by the performance monitor, and the predicted information provides environment information for the network slice access control module to assist the intelligent agent in making reasonable decisions.

In one embodiment of the present invention, the running flow of the network slice intelligent access control system is as follows: the operation flow of the intelligent access control system for network slice is shown in fig. 6, and mainly comprises two stages: a preparation stage and an operation stage. The preparation stage comprises initializing super-parameters, training a resource demand predictor LSTM-P and pre-training an access control agent D3QN, and the operation stage comprises system resource demand prediction, network slice access control and online learning of the agent D3 QN. After the service provider completes the system initialization in the preparation phase, the user will send out a network slice request to the service provider, and the request from the user will be cached in the request pool to wait for access. During operation, the service provider monitors the performance and resource overhead of all active network slices in the system for agent access control decisions and online training, which is specifically as follows:

Stage one preparation stage

Step one: initializing the super-ginseng. Calculating or setting relevant parameters of system operation according to the historical information, including: access control periodPenalty force for service provider to violate SLA>Discount factor->Node resource upper limit allocated to network slice +.>And link resource upper limit assigned to network slice +.>。

Step two: the resource demand predictor is trained. For the access control training required node resource demand predictor and link resource demand predictor, historical network slice node resource demand data (or link resource demand data) is used as the training set of the node resource demand predictor (or link resource demand predictor), as shown in formula (8)The resource demand predictor is trained using a gradient descent method or a variant thereof as a loss function of the training process.

Step three: the access control agent is pre-trained. According to the access control periodA state space and an action space are set>Thereby determining the neuron numbers of the input layer and the output layer, and constructing the access control intelligent agent D3QN. And extracting experience strips from the historical data experience pool, and pre-training the intelligent agent according to the D3QN training process.

Stage two operation stage

Step four: after the service provider starts to operate, each access control period. The decision of access control is performed at each access control instant (including steps five to eight), and if not, the process goes to step nine.

Step five: predicting long-term resource requirements of the system. At the time of access controlPredicting in a system prediction time window using a resource demand predictor for each active network slice in the system>Internal node resource requirements and link resource requirements.

Step six: and (5) predicting the occupation condition of the resources. Calculating a future time window in the system according to a formula (10)The total node resource amount and the total link resource amount to be occupied by all active slices in the system at each moment in time. Based on this, the environmental status of the D3QN is updated.

Step seven: requesting an access decision. Starting from the head of the request pool, judging whether the request is accessed according to a D3QN access control decision process for the network slice requests in the queue in sequence, if so, determining that the request corresponds to the network slice instantiation moment, signing a corresponding SLA, and then turning to the step eight; otherwise, continuing to judge whether the next request is accessed or not until all the requests in the request pool are traversed.

Step eight: updating the resource occupation amount. After the network slice is accessed to the request, calculating the node resource quantity and the link resource quantity occupied by the new access slice according to a formula (7), updating the environment state, and if all the requests in the request pool are traversed, continuing the following steps; otherwise, returning to the step four.

Step nine: network slice instantiation and configuration. And judging whether the system runs to the instantiation time decided by the access control module, instantiating the corresponding network slice according to the corresponding SLA requirement, and configuring corresponding node resources and link resources.

Step ten: and accessing the control intelligent agent through online training. Counting the information of the service slices completed by the system, calculating rewards of the completed slices according to the formula (1), and storing the information in an experience poolIn (1) from experience pool->And extracting small-batch experience bars, and performing online training by the intelligent body according to the D3QN training process.

Step eleven: judging whether the service provider is still in the operation period, and returning to the step four if the service provider is still in the operation period; otherwise, closing the network slicing system after finishing all signed SLAs.

In one embodiment of the invention, a deployment scheme is implemented for a network slice intelligent access control system. The inventive technique may be deployed under 5G network architecture supported by virtual network function technologies (VNF) and software defined network technologies (SDN), and may be used by network application providers, service agents, virtual network operators as service providers in the above description. A service provider using the network slice intelligent access control system should first lease a certain amount of node resources and link resources from an infrastructure provider for instantiating network slices meeting user requirements, and the service provider implements various logical operations (including VNF orchestration, resource configuration, stopping network slice operation, etc. in the network slices) in combination with an operation support system, an SDN controller, a network service orchestrator and a resource orchestrator to support network services required by the user, and the resource orchestrator is capable of acquiring resource usage information of each network slice and providing each slice with required resources through an interface with the network service orchestrator of each slice. The service provider predicts the future resource demand based on the information, and further realizes efficient access control decision, the instantiation and operation of the network slice after the access control decision are realized by cooperation of an operation support system, an SDN controller and a network service orchestrator, and for the accessed network slice request, the service provider and a user initiating the request sign a service level agreement SLA, wherein the method comprises the following steps: the type of function of the network slice, the performance level of the network slice (i.e., the network slice quality), the length of survival of the network slice, and the amount of violations due to SLA violations, etc.

In one embodiment of the invention, the invention helps the service provider to fully utilize node resources and link resources to provide services to its users in the form of network slices, optimizing the benefits of the service provider. The service provider can acquire available resource information in an access control time window by using a resource demand predictor LSTM-P, and then uses deep reinforcement learning to carry out network slice access control autonomous decision and schedule network slice instantiation time after access, thereby improving request acceptance rate and service provider benefits, fully utilizing node resources and link resources in the system, providing network slice service for more users, and improving service provider overall benefits.

Claims

1. The intelligent access control method for the network slice is characterized by comprising the following steps of:

s2, operation stage: according to the trained resource demand predictor and the access control intelligent agent, predicting the resource occupation amount and requesting access control decision, updating the resource occupation amount and carrying out instantiation and configuration on the network slice according to the request access control decision, and carrying out online training on the access control intelligent agent to complete the control of intelligent access of the network slice;

The step S1 includes the steps of:

s101, initializingParameters, wherein the parameters include: access control periodPenalty force for service provider to violate SLA>Discount factor->Node resource upper limit allocated to network slice +.>And link resource upper limit assigned to network slice +.>；

s103, according to the access control periodA state space and an action space are set>Determining the number of neurons of an input layer and an output layer, constructing an access control intelligent agent D3QN, and pre-training the access control intelligent agent D3 QN;

the step S102 includes the steps of:

the pre-training of the access control agent D3QN in step S103 includes the following steps:

A3, optimizing the neural network model parameters one by using the extracted experience strips, and judging the firstStrip experience strip->Middle->Whether the state is a termination state, if so, controlling training label ++Tight of agent D3QN>Otherwise, calculating and controlling the intelligent agent D3QN training label according to the following formula>：

Wherein,indicating the current state +.>Representing the action taken,/->Rewards representing this action, ++>Representing the next state after taking action, +.>Representing discount factors->Q value representing neural network output, +.>Representing the optimal action of the next state corresponding to the maximum Q value, < >>Indicating the optimal action of the next state, +. >Representing access control agent D3QN behavioural network parameters, +.>Representing access control agent target network parameters;

wherein,a loss function representing the access control agent D3 QN;

the step S2 includes the steps of:

s201, setting each time when operating for service providerThe access control period is as followsAnd judges whether or not +/at each access control moment>Executing an access control decision, if yes, entering a step S202, otherwise, accessing the control decision for the non-access control moment, and entering a step S206;

s203, calculating a future time window according to the node resource occupation amount and the link resource occupation amountThe total node resource quantity and the total link resource quantity to be occupied by all active network slices in the system at each moment in time, and the environment state of the access control intelligent agent D3QN is updated according to the total node resource quantity and the total link resource quantity >Wherein the environmental status->Including network slice request information->Resource requirement upper limit required for new network slice request +.>AndResource amount reserved for the existing active network slice +.>Wherein->、/>、/>And->All represent the length of access control time window +.>One-dimensional vector of size, +.>Represent the firstiNetwork slice request->Representing a waiting time of the network slice request;

s209, judging whether the operation period is in the operation period, if so, returning to the step S201, otherwise, closing a network slicing system to complete the control of intelligent network slicing access after all SLAs have been signed according to the trained access control intelligent agent D3 QN;

the expression of the total node resource amount and the total link resource amount in the step S203 is as follows:

Wherein,representing the total node resource amount occupied by the active network slice, < >>Representing the number of active network slices in the system, +.>Represent the firstiActive network slice,/->Indicating the total amount of link resources occupied by the active network slice, < >>Representing the respective moments in the future time window, < +.>Represent the firstiActive network slice is->Node resource amount to be occupied at moment, +.>Represent the firstiActive slice in->The amount of link resources to be occupied at the moment;

the expressions of the node resource amount and the link resource amount in the step S205 are as follows:

wherein,indicating +.>，/>Indicating access to the firstiPost-network slice->The upper limit of node resource demand at the moment,indicating access to the firstiBefore slicing the networktNode resource demand upper limit at moment, +.>Represent the firstiIndividual network slice is attMaximum value of node resource quantity which can be obtained by time network slice,/-for>Indicating access to the firstiAfter slicing the networktLink resource requirement upper limit of time, +.>Indicating access to the firstiBefore slicing the networktLink resource requirement upper limit of time, +.>Represent the firstiIndividual network slice is attThe maximum amount of link resources available at the moment.

2. The control system of a network slice intelligent access control method according to claim 1, comprising:

the second processing module predicts the resource occupation amount and requests access control decision according to the trained resource demand predictor and the access control agent, updates the resource occupation amount and instantiates and configures the network slice according to the requests access control decision, trains the access control agent on line and completes the control of the intelligent access of the network slice.