CN116828607A

CN116828607A - Radio access network slice resource allocation method adapting to different channel characteristics

Info

Publication number: CN116828607A
Application number: CN202310741049.1A
Authority: CN
Inventors: 孙君; 王科
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-06-21
Filing date: 2023-06-21
Publication date: 2023-09-29

Abstract

The invention provides a method for distributing radio access network slice resources adapting to different channel characteristics, which comprises the following steps: establishing a multi-base station cellular network downlink scene; the base station collects slice minimum rate requirement, user minimum rate requirement, maximum delay threshold tolerated by the user and incomplete CSI condition information; initializing the weight of the depth Q learning DQN; initializing, namely distributing resources and calculating the user rate, the interference of the base station to the user and the total throughput of the base station slice at the moment; the base station calculates rewards according to the state and makes decisions by using a greedy strategy; updating the state of the environment and rewards; and playing back the calculated loss function of the pool according to experience, and updating the weight until the loss function reaches a set convergence condition or the program itself reaches the maximum iteration number. The present invention also ensures isolation between slices and users while meeting the QoS requirements of each slice and user.

Description

Radio access network slice resource allocation method adapting to different channel characteristics

Technical Field

The invention relates to a radio access network slice resource allocation method adapting to different channel characteristics, belonging to the technical field of wireless communication.

Background

The essence of network slicing is that virtual network operators (MVNOs, mobile virtual network operator) map physical resource abstractions into virtual resources, which are then assigned to service providers (SPs, service providers). These requirements are determined between SPs and tenants in the form of service level agreements (SLAs, service Level Agreement), which specify key performance metrics such as throughput, latency, reliability, etc. To achieve these SLAs, network slices will be introduced from the core network to the radio access network RAN (RadioAccess Network) domain. Network slicing in the RAN domain remains a challenging problem due to the resource coupling and randomness of the radio channels. Most of the existing work is mainly focused on the architecture of the RAN, while research on resource allocation and optimization of the RAN chip is still underway. Network slices need to provide isolation between slices to prevent congestion of one slice from affecting the performance of other slices.

In most of the existing researches, RAN slice isolation only considers slice requirements, but does not consider the characteristics of each user although the performance of the aggregated slices is guaranteed, so that the QoS guarantee of the users in the slices cannot be guaranteed.

In view of the foregoing, it is desirable to provide a method for allocating radio access network slice resources to accommodate different channel characteristics.

Disclosure of Invention

The invention aims to provide a radio access network slice resource allocation method suitable for different channel characteristics, which can ensure isolation among slices and meet QoS requirements of all users in the slices.

In order to achieve the above object, the present invention provides a method for allocating radio access network slice resources adapted to different channel characteristics, which mainly includes the following steps:

step 1, establishing a multi-base station cellular network downlink scene;

step 2, minimum rate requirement for base station collection sliceUser minimum rate requirement->The user can tolerate a maximum delay threshold D _max And incomplete CSI condition information;

step 3, initializing weights theta and Q (s, a; theta) of the deep Q learning DQN;

step 4, initializing a _t Namely, the resource is allocated, and the user rate, the interference of the base station to the user and the total throughput of the base station slice are calculated at the moment;

step 5, the base station according to the state s at the moment _t And calculating a prize r _t And making a decision using an epsilon greedy strategy;

step 6, updating the state s of the environment _t+1 Prize r _t+1 ；

And 7, according to the loss function L (theta) calculated by the experience playback pool, updating the weight theta, and repeating the step 5 until the loss function L (theta) reaches a set convergence condition or the program itself reaches the maximum iteration number T.

As a further improvement of the present invention, in step 1, the multi-base station cellular network downlink scenario includes a set of b= {1, base station BSs of the above, b., and adjacent BSs interfere with each other, the user set is denoted as u= { 1..u., u.}, the total number of users is denoted as U, the total bandwidth W is divided into a set of identical subchannels j= {1, J, where J is the total number of subchannels, each subchannel having a bandwidth ofR _j Denoted as the bandwidth of subchannel j, the total number of slices is S, and the slices are denoted as s= {1, once again, S, once again, each slice S has a set of users M _s ＝{1,2,…,m _s … }, where m _s Is the mth user in slice s, M _s Is the total number of users in slice s, then Σ _s∈S M _s ＝U。

As a further improvement of the present invention, step 2 further includes:

definition of binary variablesIf user m _s Requesting slice s in base station b>1, otherwise 0, constraint C1 is introduced in order to ensure that the user can only request one slice:

definition of binary variablesIf sub-channel j is allocated to user m at base station b _s Then 1, otherwise 0, in order to ensure that a subchannel can be allocated to only one user in the base station, constraint C2 is introduced:

to ensure that the transmit power of each base station does not exceed its maximum transmit powerExtraction constraint C3:

as a further improvement of the invention, in step 2, incomplete CSI in the base station is taken into account and user m is calculated under this condition _s The worst rate of incomplete CSI is expressed as follows:

wherein ,representing estimated channel gain,/>Representing the error in estimating the channel gain.

As a further improvement of the invention, in step 3, in deep Q learning DQN, the training data is expressed as an action value and is called a target value, and the loss function to be minimized is:

wherein ,y_t For the target value, θ represents a parameter of the neural network, and the agent approaches y by updating θ _t To learn the action values.

As a further development of the invention, in step 4, the base station b to user m on subchannel j is calculated _s And expressed as:

wherein ,representing base station b and user m on subchannel j _s Transmission power between->Representing base station b and user m on subchannel j _s Channel gain between them, and total bandwidth of the system is W.

As a further improvement of the present invention, step 4 further includes: each sub-channel bandwidth is equal, then each sub-channel bandwidth isCalculating the base station b to the user m on the sub-channel j _s Interference of (1)>And is expressed as:

the base station slice total throughput is modeled and the goal is to maximize the total throughput, and constraints are considered,

constraint C1 indicates that each user can only request one slice, constraint C2 indicates that each sub-channel can only be allocated to one user in a base station, constraint C3 indicates that the transmitting power of each base station cannot exceed the maximum transmitting power of each base station, constraint C4 ensures the QoS requirement of the slice, constraint C5 ensures the QoS requirement of the user, and constraint C6 indicates the minimum transmission rate required by the user to meet delay constraint.

As a further improvement of the present invention, in step 5, deep reinforcement learning is used to find the optimal action, the network input being action a _t Sum state s _t Output is Q value of action, i.e. Q _k (s _t ,a _t ) The method comprises the steps of carrying out a first treatment on the surface of the And calculating the next loading state s by using the target neural network _t+1 Q value Q of (2) _k (s _t+！ ,a _t ) And updated by the following expression:

wherein ,α_k And gamma is the learning rate and discount factor, s, respectively _t+1 and r_t+1 Representing the next state and in state s _t Awards obtained after taking action, a representing the state s _t+1 The next executable action, a is the set of executable actions,representing state s _t+1 The maximum Q value in the lower action set a.

As a further improvement of the present invention, in step 6, the slice optimization problem can be described as a separate markov decision process, which can be formalized as a 4-tuple < S, a, pi, R >, where S is the state space, a is the action space, pi is the policy space, R is the direct reward, defining the state set S in the markov decision process: setting the association relation between the base station b and the user u and the channel gain as the state input of the proxy, wherein the state space is defined as follows:

wherein ,indicating that base station b and user u are associated with 1, otherwise 0.

As a further improvement of the present invention, in step 6, the final reward function is obtained by a trade-off factor δ, which is defined by the local Utility availability _b And the weighted sum of the average utility of the other base station agents:

where B is the total number of agents.

The beneficial effects of the invention are as follows: the present invention also ensures isolation between slices and users while meeting the QoS requirements of each slice and user.

Drawings

Fig. 1 is a network model diagram of a radio access network slice resource allocation method of the present invention adapted to different channel characteristics.

Fig. 2 is an ape-x architecture diagram of a radio access network slice resource allocation method of the present invention that accommodates different channel characteristics.

Fig. 3 is a system model diagram of a radio access network slice resource allocation method adapted to different channel characteristics according to the present invention.

Fig. 4 is a flow chart of DQN algorithm of the radio access network slice resource allocation method of the invention adapted to different channel characteristics.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

In this case, in order to avoid obscuring the present invention due to unnecessary details, only the structures and/or processing steps closely related to the aspects of the present invention are shown in the drawings, and other details not greatly related to the present invention are omitted.

In addition, it should be further noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1 to 4, the present invention discloses a method for allocating radio access network slice resources adapted to different channel characteristics, comprising the following steps:

step 1, a multi-base station cellular network downlink scenario is established, specifically, the scenario considers a set of b= {1, base station BSs of the second order, B order, and adjacent BSs interfere with each other. The user set is denoted as u= { 1..u, total number of users denoted U. The total bandwidth W is divided into a set of identical subchannels j= {1, J, where J is the total number of subchannels. Thus, the bandwidth of each sub-channel isR _j Represented as the bandwidth of subchannel j. The total number of slices is S, and the slices are denoted s= {1, once more, S. Each slice s has a set of users M _s ＝{1,2,...,m _s ,..}, wherein m _s Is the mth user in slice s, M _s Is the total number of users in slice s. Then sigma _s∈S M _s ＝U。

Step 2, minimum rate requirement for base station collection sliceUser minimum rate requirement->The user can tolerate a maximum delay threshold D _max And incomplete CSI condition information. And sets constraints based on the collected information, in particular,

definition of binary variablesIf user m _s Requesting slice s in base station b>1, otherwise 0. To ensure that a user can only request one slice, constraint C1 is introduced:

definition of binary variablesIf sub-channel j is allocated to user m at base station b _s Then 1 and otherwise 0. To ensure that a subchannel can only be allocated to one user in the base station, constraint C2 is introduced:

to ensure that the transmit power of each base station does not exceed its maximumTransmitting powerExtraction constraint C3:

CSI uncertainty may be caused by various factors in the base station such as user mobility, estimation errors, feedback channel delays, etc. Perfect CSI is almost not available in the base station. For this, we consider incomplete CSI in the base station and calculate user m under this condition _s Is the worst rate of (a). The incomplete CSI is expressed as follows:

wherein ,representing estimated channel gain,/>Representing the error in estimating the channel gain. Estimated channel gain error is trapped in the bounded region, then-> Expressed as:

wherein ,is a small constant expressed as a channel uncertainty bound. Thus, under CSI uncertainty conditions, user m _s The worst rate of (2) can be expressed as:

to guarantee the QoS requirements of each slice at the slice level, the total rate R of each slice s _s Should all reach its minimum rateNamely:

at the user level, each user should reach its minimum rate in order to guarantee its QoS requirementsNamely:

calculating user m _s Is used to determine the actual delay of (a). There are two types of delays in the RAN domain: propagation delayAnd transmission delayThe propagation delay is base station b and user m _s Propagation delay between base station b and user m on channel j _s Is used for the transmission delay of the (c). User m _s The actual delay of (2) is expressed as:

wherein ,denoted as base station b and user m _s The distance between them is in meters. c is denoted as light speed, ">Is the size of the data packet in bits.

The minimum rate required to meet the user delay requirement is calculated. For a user, we want to maximize his transmission rate while meeting the probability delay requirement, which is as follows:

Pr(d _ms ＞D _max )＜q，

wherein ,D_max Is the maximum delay threshold that can be tolerated. q is the maximum probability that the delay exceeds the threshold.

Next we have led to an effective bandwidth function:

wherein user m _s The average packet arrival rate of (a) is assumed to be lambda _u ，L _avr Is the average length of the packet. When the user actually transmits the rateHigher than F _u User delay d _u May be limited to q.

And 3, initializing weights theta and Q (s, a; theta) of the deep Q learning DQN. Specifically, DQN applies neural networks to Q learning. The function of the action value approximated by the neural network is called the Q network. In this neural network, the weights of the model are updated so that the model approaches the training data by calculating the error of the training data. The error is defined as a loss function and is minimized such that the loss function is zero. I.e. to make Q _k (s _t+！ ,a _t ) And Q is equal to _k (s _t ,a _t ) The difference between them is minimal. In DQN, training data is represented as an action value and is referred to as a target value. Therefore, the loss function that needs to be minimized is:

wherein ,y_t For the target value, θ representsParameters of the neural network. The proxy approaches y by updating θ _t To learn the action values.

Step 4, initializing a _t I.e. allocation of resources and calculation of the user rate at this time, the interference of the base station to the user and the total throughput of the base station slices. Specifically, the base station b to user m on subchannel j is calculated _s And expressed as:

wherein ,representing base station b and user m on subchannel j _s Transmission power therebetween. />Representing base station b and user m on subchannel j _s Channel gain between. The total bandwidth of the system is W. Each sub-channel bandwidth is equal, then each sub-channel bandwidth is +>

Calculating the base station b to the user m on the sub-channel j _s Interference of (a)And is expressed as:

the base station slice total throughput is modeled and the goal is to maximize the total throughput and consider constraints.

Constraint C1 indicates that each user can only request one slice, constraint C2 indicates that each sub-channel can only be allocated to one user in a base station, constraint C3 indicates that the transmitting power of each base station cannot exceed the maximum transmitting power of each base station, constraint C4 ensures the QoS requirement of the slice, constraint C5 ensures the QoS requirement of the user, and constraint C6 indicates that the user meets the minimum transmission rate required by delay constraint.

Step 5, the base station according to the state s at the moment _t And calculating a prize r _t And makes decisions using an epsilon greedy strategy. Specifically, because the state set and the action set are large in scale, deep reinforcement learning is adopted to find the optimal action, and the network input is action a _t Sum state s _t Output is Q value of action, i.e. Q _k (s _t ,a _t ) The method comprises the steps of carrying out a first treatment on the surface of the And calculating the next loading state s by using the target neural network _t+1 Q value Q of (2) _k (s _t+！ ,a _t ) And updated by the following expression:

wherein ,α_k And gamma is learning respectivelyRate and discount factors. s is(s) _t+1 and r_t+1 Representing the next state and in state s _t And the rewards obtained after taking action are downloaded. a represents a state s _t+1 The next executable action, a is the set of executable actions,representing state s _t+1 The maximum Q value in the lower action set a. An epsilon greedy strategy is employed in finding the maximum. In the epsilon greedy strategy, an action with the highest probability of action value of 1 epsilon is executed, and random actions following the uniform distribution of epsilon probabilities are explored. RL learns the best actions of a state by exploration and development.

Step 6, updating the state s of the environment _t+1 Prize r _t+1 . Will s _t 、a _t 、s _t+1 、r _t+1 These parameter values are stored for playback as experience. The learner randomly extracts a certain number of sample studies from the accumulated experience and transmits the experience to the remaining base stations. In particular, the slice optimization problem may be described as a separate Markov decision process MDP (Markov Decision Process). MDP may be formed as a 4-tuple < S, A, pi, R >, where S is the state space, A is the action space, pi is the policy space, and R is the direct prize. At each step, the slicing agent takes an action a e A based on the current policy pi (a|s) and its observations S e S, the underlying environment will generate an immediate prize R, and the state will transition to a new state S ^* E S. In our scenario we will define three components of the MDP, namely a state set S, an action set a, and a reward set R.

Defining a state set S in MDP: the association relationship between the base station b and the user u and the channel gain are set as the state inputs of the proxy. The state space is defined as:

wherein ,indicating base station b and useIf there is an association, user u is 1, otherwise it is 0.

Defining an action set A in MDP, wherein the agent observes the state information of the environment and selects an action in the action space. For base station b, its actions are defined as sub-channel allocation and power allocation between base station and user. RB allocation is noted asRepresenting user m _s Associated with subchannel j, otherwise +.>Power P assigned to subcarrier j by base station b _j J ε R is represented.

Defining a set of rewards R in the MDP, whether the behavior is positive or negative to the agent surface. The invention maximizes the total throughput, if the user rate in the base station does not reach the minimum threshold value, the user throughput is contained in the total throughput as negative rewards, and the rewards of the base station b are defined as follows:

t (x) is defined as:

wherein the parameter c in T (x) is a constant coefficient for controlling the inclination of the curve.

The above equation shows that the goal of the proxy for base station b is to maximize the overall slice throughput of base station b. However, due to the interaction between the base station agents, the local resource allocation scheme may cause significant interference to other resources. Thus, each agent must consider its own utility as well as its impact on other agents. We introduce a trade-off factor delta to obtain the final bonus function, which is derived from the local Utility availability _b And the weighted sum of the average utility of the other base station agents:

where B is the total number of agents. The reward is not a global reward, but merely represents a local base station agent's reward. It is equivalent to replacing partial utility of local agent with average utility of other agents, and the proportion of the replaced part depends on the weighing factor delta epsilon (0, 1) and is set according to different practical application scenes.

Due to the existence of multi-base station proxy learning, ape-x distributed learning is applied here. ape-x is composed of three parts, a learner, a participant, and an experience pool, the model learned by the learner comes from various experiences and strategies collected by the participant. Because of the existence of a plurality of base stations, the base stations collect the requirements and states of all slice users in the base stations first and transmit the requirements and states to the participants corresponding to the base stations. The participants output the output actions to the slice manager of the base station according to the policies learned by the learner. During learning, rewards, status and behavior are transferred as experience to an experience pool.

And S7, according to the loss function L (theta) calculated by the experience playback pool, updating the weight theta, and repeating the step 5 until the loss function L (theta) reaches a set convergence condition or the program itself reaches the maximum iteration number T.

In summary, the present invention defines a mathematical formula for RAN slice isolation in a multi-element scenario, and ensures the isolation between slices and users when the QoS requirements of each slice and user are satisfied. Meanwhile, the invention also considers the decision of resource allocation under the imperfect CSI condition, solves the problem of random optimization by using DRL, overcomes the randomness of a wireless channel, and aims at maximizing the total throughput of the base station slice and ensuring the throughput requirement of individual users. Because of the multi-base station training learning, an ape-x distributed learning system is also introduced to accelerate the learning speed. Finally, the invention considers the interference existing between the base stations, namely the resource allocation strategy of each base station not only depends on the base station but also depends on other base stations, thereby being more in line with the actual application scene.

The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. A radio access network slice resource allocation method adapted to different channel characteristics, comprising the steps of:

step 1, establishing a multi-base station cellular network downlink scene;

step 6, updating the state s of the environment _t+1 Prize r _t+1 ；

2. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 1, wherein: in step 1, the multi-base station cellular network downlink scenario includes a set of b= {1, the term, BBase stations BSs, and adjacent BSs interfere with each other, the user set is denoted as u= { 1..u., }, the total number of users is denoted as U, the total bandwidth W is divided into a set of identical sub-channels j= {1,..j, where J is the total number of subchannels, each subchannel having a bandwidth ofR _j Denoted as the bandwidth of subchannel j, the total number of slices is S, and the slices are denoted as s= {1, once again, S, once again, each slice S has a set of users M _s ＝{1,2,...,m _s ,..}, wherein m _s Is the mth user in slice s, M _s Is the total number of users in slice s, then Σ _s∈S M _s ＝U。

3. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 1, wherein: the step 2 further comprises:

C1:

C2:

C3:

4. a radio access network slice resource allocation method adapted to different channel characteristics according to claim 3, characterized by: in step 2, consider incomplete CSI in the base station and calculate user m under this condition _s The worst rate of incomplete CSI is expressed as follows:

5. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 1, wherein: in step 3, in deep Q learning DQN, training data is expressed as an action value and is referred to as a target value, and the loss function to be minimized is:

6. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 1, wherein: in step 4, the base station b to user m on subchannel j is calculated _s And expressed as:

7. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 6, wherein: step 4 further comprises: each sub-channel bandwidth is equal, then each sub-channel bandwidth isCalculating the base station b to the user m on the sub-channel j _s Interference of (1)>And is expressed as:

C1:

C2:

C3:

C4:

C5:

C6:

8. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 1, wherein: in step 5Deep reinforcement learning is adopted to find the optimal action, and the network input is action a _t Sum state s _t Output is Q value of action, i.e. Q _k (s _t ,a _t ) The method comprises the steps of carrying out a first treatment on the surface of the And calculating the next loading state s by using the target neural network _t+1 Q value Q of (2) _k (s _t+！ ,a _t ) And updated by the following expression:

9. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 1, wherein: in step 6, the slice optimization problem can be described as an independent Markov decision process, which can be formalized as a 4-tuple < S, A, pi, R >, where S is the state space, A is the action space, pi is the policy space, R is the direct reward, defining the state set S in the Markov decision process: setting the association relation between the base station b and the user u and the channel gain as the state input of the proxy, wherein the state space is defined as follows:

wherein ,indicating base station b and user uIf there is an association then it is 1, otherwise it is 0.

10. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 9, wherein: in step 6, the final reward function is obtained by a trade-off factor δ, which is defined by the local Utility availability _b And the weighted sum of the average utility of the other base station agents:

where B is the total number of agents.