CN116828607A - Radio access network slice resource allocation method adapting to different channel characteristics - Google Patents

Radio access network slice resource allocation method adapting to different channel characteristics Download PDF

Info

Publication number
CN116828607A
CN116828607A CN202310741049.1A CN202310741049A CN116828607A CN 116828607 A CN116828607 A CN 116828607A CN 202310741049 A CN202310741049 A CN 202310741049A CN 116828607 A CN116828607 A CN 116828607A
Authority
CN
China
Prior art keywords
base station
user
slice
state
radio access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310741049.1A
Other languages
Chinese (zh)
Inventor
孙君
王科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202310741049.1A priority Critical patent/CN116828607A/en
Publication of CN116828607A publication Critical patent/CN116828607A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0453Resources in frequency domain, e.g. a carrier in FDMA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0958Management thereof based on metrics or performance parameters
    • H04W28/0967Quality of Service [QoS] parameters
    • H04W28/0975Quality of Service [QoS] parameters for reducing delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0958Management thereof based on metrics or performance parameters
    • H04W28/0967Quality of Service [QoS] parameters
    • H04W28/0983Quality of Service [QoS] parameters for optimizing bandwidth or throughput
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/54Allocation or scheduling criteria for wireless resources based on quality criteria
    • H04W72/543Allocation or scheduling criteria for wireless resources based on quality criteria based on requested quality, e.g. QoS
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides a method for distributing radio access network slice resources adapting to different channel characteristics, which comprises the following steps: establishing a multi-base station cellular network downlink scene; the base station collects slice minimum rate requirement, user minimum rate requirement, maximum delay threshold tolerated by the user and incomplete CSI condition information; initializing the weight of the depth Q learning DQN; initializing, namely distributing resources and calculating the user rate, the interference of the base station to the user and the total throughput of the base station slice at the moment; the base station calculates rewards according to the state and makes decisions by using a greedy strategy; updating the state of the environment and rewards; and playing back the calculated loss function of the pool according to experience, and updating the weight until the loss function reaches a set convergence condition or the program itself reaches the maximum iteration number. The present invention also ensures isolation between slices and users while meeting the QoS requirements of each slice and user.

Description

Radio access network slice resource allocation method adapting to different channel characteristics
Technical Field
The invention relates to a radio access network slice resource allocation method adapting to different channel characteristics, belonging to the technical field of wireless communication.
Background
The essence of network slicing is that virtual network operators (MVNOs, mobile virtual network operator) map physical resource abstractions into virtual resources, which are then assigned to service providers (SPs, service providers). These requirements are determined between SPs and tenants in the form of service level agreements (SLAs, service Level Agreement), which specify key performance metrics such as throughput, latency, reliability, etc. To achieve these SLAs, network slices will be introduced from the core network to the radio access network RAN (RadioAccess Network) domain. Network slicing in the RAN domain remains a challenging problem due to the resource coupling and randomness of the radio channels. Most of the existing work is mainly focused on the architecture of the RAN, while research on resource allocation and optimization of the RAN chip is still underway. Network slices need to provide isolation between slices to prevent congestion of one slice from affecting the performance of other slices.
In most of the existing researches, RAN slice isolation only considers slice requirements, but does not consider the characteristics of each user although the performance of the aggregated slices is guaranteed, so that the QoS guarantee of the users in the slices cannot be guaranteed.
In view of the foregoing, it is desirable to provide a method for allocating radio access network slice resources to accommodate different channel characteristics.
Disclosure of Invention
The invention aims to provide a radio access network slice resource allocation method suitable for different channel characteristics, which can ensure isolation among slices and meet QoS requirements of all users in the slices.
In order to achieve the above object, the present invention provides a method for allocating radio access network slice resources adapted to different channel characteristics, which mainly includes the following steps:
step 1, establishing a multi-base station cellular network downlink scene;
step 2, minimum rate requirement for base station collection sliceUser minimum rate requirement->The user can tolerate a maximum delay threshold D max And incomplete CSI condition information;
step 3, initializing weights theta and Q (s, a; theta) of the deep Q learning DQN;
step 4, initializing a t Namely, the resource is allocated, and the user rate, the interference of the base station to the user and the total throughput of the base station slice are calculated at the moment;
step 5, the base station according to the state s at the moment t And calculating a prize r t And making a decision using an epsilon greedy strategy;
step 6, updating the state s of the environment t+1 Prize r t+1
And 7, according to the loss function L (theta) calculated by the experience playback pool, updating the weight theta, and repeating the step 5 until the loss function L (theta) reaches a set convergence condition or the program itself reaches the maximum iteration number T.
As a further improvement of the present invention, in step 1, the multi-base station cellular network downlink scenario includes a set of b= {1, base station BSs of the above, b., and adjacent BSs interfere with each other, the user set is denoted as u= { 1..u., u.}, the total number of users is denoted as U, the total bandwidth W is divided into a set of identical subchannels j= {1, J, where J is the total number of subchannels, each subchannel having a bandwidth ofR j Denoted as the bandwidth of subchannel j, the total number of slices is S, and the slices are denoted as s= {1, once again, S, once again, each slice S has a set of users M s ={1,2,…,m s … }, where m s Is the mth user in slice s, M s Is the total number of users in slice s, then Σ s∈S M s =U。
As a further improvement of the present invention, step 2 further includes:
definition of binary variablesIf user m s Requesting slice s in base station b>1, otherwise 0, constraint C1 is introduced in order to ensure that the user can only request one slice:
definition of binary variablesIf sub-channel j is allocated to user m at base station b s Then 1, otherwise 0, in order to ensure that a subchannel can be allocated to only one user in the base station, constraint C2 is introduced:
to ensure that the transmit power of each base station does not exceed its maximum transmit powerExtraction constraint C3:
as a further improvement of the invention, in step 2, incomplete CSI in the base station is taken into account and user m is calculated under this condition s The worst rate of incomplete CSI is expressed as follows:
wherein ,representing estimated channel gain,/>Representing the error in estimating the channel gain.
As a further improvement of the invention, in step 3, in deep Q learning DQN, the training data is expressed as an action value and is called a target value, and the loss function to be minimized is:
wherein ,yt For the target value, θ represents a parameter of the neural network, and the agent approaches y by updating θ t To learn the action values.
As a further development of the invention, in step 4, the base station b to user m on subchannel j is calculated s And expressed as:
wherein ,representing base station b and user m on subchannel j s Transmission power between->Representing base station b and user m on subchannel j s Channel gain between them, and total bandwidth of the system is W.
As a further improvement of the present invention, step 4 further includes: each sub-channel bandwidth is equal, then each sub-channel bandwidth isCalculating the base station b to the user m on the sub-channel j s Interference of (1)>And is expressed as:
the base station slice total throughput is modeled and the goal is to maximize the total throughput, and constraints are considered,
constraint C1 indicates that each user can only request one slice, constraint C2 indicates that each sub-channel can only be allocated to one user in a base station, constraint C3 indicates that the transmitting power of each base station cannot exceed the maximum transmitting power of each base station, constraint C4 ensures the QoS requirement of the slice, constraint C5 ensures the QoS requirement of the user, and constraint C6 indicates the minimum transmission rate required by the user to meet delay constraint.
As a further improvement of the present invention, in step 5, deep reinforcement learning is used to find the optimal action, the network input being action a t Sum state s t Output is Q value of action, i.e. Q k (s t ,a t ) The method comprises the steps of carrying out a first treatment on the surface of the And calculating the next loading state s by using the target neural network t+1 Q value Q of (2) k (s t+! ,a t ) And updated by the following expression:
wherein ,αk And gamma is the learning rate and discount factor, s, respectively t+1 and rt+1 Representing the next state and in state s t Awards obtained after taking action, a representing the state s t+1 The next executable action, a is the set of executable actions,representing state s t+1 The maximum Q value in the lower action set a.
As a further improvement of the present invention, in step 6, the slice optimization problem can be described as a separate markov decision process, which can be formalized as a 4-tuple < S, a, pi, R >, where S is the state space, a is the action space, pi is the policy space, R is the direct reward, defining the state set S in the markov decision process: setting the association relation between the base station b and the user u and the channel gain as the state input of the proxy, wherein the state space is defined as follows:
wherein ,indicating that base station b and user u are associated with 1, otherwise 0.
As a further improvement of the present invention, in step 6, the final reward function is obtained by a trade-off factor δ, which is defined by the local Utility availability b And the weighted sum of the average utility of the other base station agents:
where B is the total number of agents.
The beneficial effects of the invention are as follows: the present invention also ensures isolation between slices and users while meeting the QoS requirements of each slice and user.
Drawings
Fig. 1 is a network model diagram of a radio access network slice resource allocation method of the present invention adapted to different channel characteristics.
Fig. 2 is an ape-x architecture diagram of a radio access network slice resource allocation method of the present invention that accommodates different channel characteristics.
Fig. 3 is a system model diagram of a radio access network slice resource allocation method adapted to different channel characteristics according to the present invention.
Fig. 4 is a flow chart of DQN algorithm of the radio access network slice resource allocation method of the invention adapted to different channel characteristics.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
In this case, in order to avoid obscuring the present invention due to unnecessary details, only the structures and/or processing steps closely related to the aspects of the present invention are shown in the drawings, and other details not greatly related to the present invention are omitted.
In addition, it should be further noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
As shown in fig. 1 to 4, the present invention discloses a method for allocating radio access network slice resources adapted to different channel characteristics, comprising the following steps:
step 1, a multi-base station cellular network downlink scenario is established, specifically, the scenario considers a set of b= {1, base station BSs of the second order, B order, and adjacent BSs interfere with each other. The user set is denoted as u= { 1..u, total number of users denoted U. The total bandwidth W is divided into a set of identical subchannels j= {1, J, where J is the total number of subchannels. Thus, the bandwidth of each sub-channel isR j Represented as the bandwidth of subchannel j. The total number of slices is S, and the slices are denoted s= {1, once more, S. Each slice s has a set of users M s ={1,2,...,m s ,..}, wherein m s Is the mth user in slice s, M s Is the total number of users in slice s. Then sigma s∈S M s =U。
Step 2, minimum rate requirement for base station collection sliceUser minimum rate requirement->The user can tolerate a maximum delay threshold D max And incomplete CSI condition information. And sets constraints based on the collected information, in particular,
definition of binary variablesIf user m s Requesting slice s in base station b>1, otherwise 0. To ensure that a user can only request one slice, constraint C1 is introduced:
definition of binary variablesIf sub-channel j is allocated to user m at base station b s Then 1 and otherwise 0. To ensure that a subchannel can only be allocated to one user in the base station, constraint C2 is introduced:
to ensure that the transmit power of each base station does not exceed its maximumTransmitting powerExtraction constraint C3:
CSI uncertainty may be caused by various factors in the base station such as user mobility, estimation errors, feedback channel delays, etc. Perfect CSI is almost not available in the base station. For this, we consider incomplete CSI in the base station and calculate user m under this condition s Is the worst rate of (a). The incomplete CSI is expressed as follows:
wherein ,representing estimated channel gain,/>Representing the error in estimating the channel gain. Estimated channel gain error is trapped in the bounded region, then-> Expressed as:
wherein ,is a small constant expressed as a channel uncertainty bound. Thus, under CSI uncertainty conditions, user m s The worst rate of (2) can be expressed as:
to guarantee the QoS requirements of each slice at the slice level, the total rate R of each slice s s Should all reach its minimum rateNamely:
at the user level, each user should reach its minimum rate in order to guarantee its QoS requirementsNamely:
calculating user m s Is used to determine the actual delay of (a). There are two types of delays in the RAN domain: propagation delayAnd transmission delayThe propagation delay is base station b and user m s Propagation delay between base station b and user m on channel j s Is used for the transmission delay of the (c). User m s The actual delay of (2) is expressed as:
wherein ,denoted as base station b and user m s The distance between them is in meters. c is denoted as light speed, ">Is the size of the data packet in bits.
The minimum rate required to meet the user delay requirement is calculated. For a user, we want to maximize his transmission rate while meeting the probability delay requirement, which is as follows:
Pr(d ms >D max )<q,
wherein ,Dmax Is the maximum delay threshold that can be tolerated. q is the maximum probability that the delay exceeds the threshold.
Next we have led to an effective bandwidth function:
wherein user m s The average packet arrival rate of (a) is assumed to be lambda u ,L avr Is the average length of the packet. When the user actually transmits the rateHigher than F u User delay d u May be limited to q.
And 3, initializing weights theta and Q (s, a; theta) of the deep Q learning DQN. Specifically, DQN applies neural networks to Q learning. The function of the action value approximated by the neural network is called the Q network. In this neural network, the weights of the model are updated so that the model approaches the training data by calculating the error of the training data. The error is defined as a loss function and is minimized such that the loss function is zero. I.e. to make Q k (s t+! ,a t ) And Q is equal to k (s t ,a t ) The difference between them is minimal. In DQN, training data is represented as an action value and is referred to as a target value. Therefore, the loss function that needs to be minimized is:
wherein ,yt For the target value, θ representsParameters of the neural network. The proxy approaches y by updating θ t To learn the action values.
Step 4, initializing a t I.e. allocation of resources and calculation of the user rate at this time, the interference of the base station to the user and the total throughput of the base station slices. Specifically, the base station b to user m on subchannel j is calculated s And expressed as:
wherein ,representing base station b and user m on subchannel j s Transmission power therebetween. />Representing base station b and user m on subchannel j s Channel gain between. The total bandwidth of the system is W. Each sub-channel bandwidth is equal, then each sub-channel bandwidth is +>
Calculating the base station b to the user m on the sub-channel j s Interference of (a)And is expressed as:
the base station slice total throughput is modeled and the goal is to maximize the total throughput and consider constraints.
Constraint C1 indicates that each user can only request one slice, constraint C2 indicates that each sub-channel can only be allocated to one user in a base station, constraint C3 indicates that the transmitting power of each base station cannot exceed the maximum transmitting power of each base station, constraint C4 ensures the QoS requirement of the slice, constraint C5 ensures the QoS requirement of the user, and constraint C6 indicates that the user meets the minimum transmission rate required by delay constraint.
Step 5, the base station according to the state s at the moment t And calculating a prize r t And makes decisions using an epsilon greedy strategy. Specifically, because the state set and the action set are large in scale, deep reinforcement learning is adopted to find the optimal action, and the network input is action a t Sum state s t Output is Q value of action, i.e. Q k (s t ,a t ) The method comprises the steps of carrying out a first treatment on the surface of the And calculating the next loading state s by using the target neural network t+1 Q value Q of (2) k (s t+! ,a t ) And updated by the following expression:
wherein ,αk And gamma is learning respectivelyRate and discount factors. s is(s) t+1 and rt+1 Representing the next state and in state s t And the rewards obtained after taking action are downloaded. a represents a state s t+1 The next executable action, a is the set of executable actions,representing state s t+1 The maximum Q value in the lower action set a. An epsilon greedy strategy is employed in finding the maximum. In the epsilon greedy strategy, an action with the highest probability of action value of 1 epsilon is executed, and random actions following the uniform distribution of epsilon probabilities are explored. RL learns the best actions of a state by exploration and development.
Step 6, updating the state s of the environment t+1 Prize r t+1 . Will s t 、a t 、s t+1 、r t+1 These parameter values are stored for playback as experience. The learner randomly extracts a certain number of sample studies from the accumulated experience and transmits the experience to the remaining base stations. In particular, the slice optimization problem may be described as a separate Markov decision process MDP (Markov Decision Process). MDP may be formed as a 4-tuple < S, A, pi, R >, where S is the state space, A is the action space, pi is the policy space, and R is the direct prize. At each step, the slicing agent takes an action a e A based on the current policy pi (a|s) and its observations S e S, the underlying environment will generate an immediate prize R, and the state will transition to a new state S * E S. In our scenario we will define three components of the MDP, namely a state set S, an action set a, and a reward set R.
Defining a state set S in MDP: the association relationship between the base station b and the user u and the channel gain are set as the state inputs of the proxy. The state space is defined as:
wherein ,indicating base station b and useIf there is an association, user u is 1, otherwise it is 0.
Defining an action set A in MDP, wherein the agent observes the state information of the environment and selects an action in the action space. For base station b, its actions are defined as sub-channel allocation and power allocation between base station and user. RB allocation is noted asRepresenting user m s Associated with subchannel j, otherwise +.>Power P assigned to subcarrier j by base station b j J ε R is represented.
Defining a set of rewards R in the MDP, whether the behavior is positive or negative to the agent surface. The invention maximizes the total throughput, if the user rate in the base station does not reach the minimum threshold value, the user throughput is contained in the total throughput as negative rewards, and the rewards of the base station b are defined as follows:
t (x) is defined as:
wherein the parameter c in T (x) is a constant coefficient for controlling the inclination of the curve.
The above equation shows that the goal of the proxy for base station b is to maximize the overall slice throughput of base station b. However, due to the interaction between the base station agents, the local resource allocation scheme may cause significant interference to other resources. Thus, each agent must consider its own utility as well as its impact on other agents. We introduce a trade-off factor delta to obtain the final bonus function, which is derived from the local Utility availability b And the weighted sum of the average utility of the other base station agents:
where B is the total number of agents. The reward is not a global reward, but merely represents a local base station agent's reward. It is equivalent to replacing partial utility of local agent with average utility of other agents, and the proportion of the replaced part depends on the weighing factor delta epsilon (0, 1) and is set according to different practical application scenes.
Due to the existence of multi-base station proxy learning, ape-x distributed learning is applied here. ape-x is composed of three parts, a learner, a participant, and an experience pool, the model learned by the learner comes from various experiences and strategies collected by the participant. Because of the existence of a plurality of base stations, the base stations collect the requirements and states of all slice users in the base stations first and transmit the requirements and states to the participants corresponding to the base stations. The participants output the output actions to the slice manager of the base station according to the policies learned by the learner. During learning, rewards, status and behavior are transferred as experience to an experience pool.
And S7, according to the loss function L (theta) calculated by the experience playback pool, updating the weight theta, and repeating the step 5 until the loss function L (theta) reaches a set convergence condition or the program itself reaches the maximum iteration number T.
In summary, the present invention defines a mathematical formula for RAN slice isolation in a multi-element scenario, and ensures the isolation between slices and users when the QoS requirements of each slice and user are satisfied. Meanwhile, the invention also considers the decision of resource allocation under the imperfect CSI condition, solves the problem of random optimization by using DRL, overcomes the randomness of a wireless channel, and aims at maximizing the total throughput of the base station slice and ensuring the throughput requirement of individual users. Because of the multi-base station training learning, an ape-x distributed learning system is also introduced to accelerate the learning speed. Finally, the invention considers the interference existing between the base stations, namely the resource allocation strategy of each base station not only depends on the base station but also depends on other base stations, thereby being more in line with the actual application scene.
The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. A radio access network slice resource allocation method adapted to different channel characteristics, comprising the steps of:
step 1, establishing a multi-base station cellular network downlink scene;
step 2, minimum rate requirement for base station collection sliceUser minimum rate requirement->The user can tolerate a maximum delay threshold D max And incomplete CSI condition information;
step 3, initializing weights theta and Q (s, a; theta) of the deep Q learning DQN;
step 4, initializing a t Namely, the resource is allocated, and the user rate, the interference of the base station to the user and the total throughput of the base station slice are calculated at the moment;
step 5, the base station according to the state s at the moment t And calculating a prize r t And making a decision using an epsilon greedy strategy;
step 6, updating the state s of the environment t+1 Prize r t+1
And 7, according to the loss function L (theta) calculated by the experience playback pool, updating the weight theta, and repeating the step 5 until the loss function L (theta) reaches a set convergence condition or the program itself reaches the maximum iteration number T.
2. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 1, wherein: in step 1, the multi-base station cellular network downlink scenario includes a set of b= {1, the term, BBase stations BSs, and adjacent BSs interfere with each other, the user set is denoted as u= { 1..u., }, the total number of users is denoted as U, the total bandwidth W is divided into a set of identical sub-channels j= {1,..j, where J is the total number of subchannels, each subchannel having a bandwidth ofR j Denoted as the bandwidth of subchannel j, the total number of slices is S, and the slices are denoted as s= {1, once again, S, once again, each slice S has a set of users M s ={1,2,...,m s ,..}, wherein m s Is the mth user in slice s, M s Is the total number of users in slice s, then Σ s∈S M s =U。
3. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 1, wherein: the step 2 further comprises:
definition of binary variablesIf user m s Requesting slice s in base station b>1, otherwise 0, constraint C1 is introduced in order to ensure that the user can only request one slice:
C1:
definition of binary variablesIf sub-channel j is allocated to user m at base station b s Then 1, otherwise 0, in order to ensure that a subchannel can be allocated to only one user in the base station, constraint C2 is introduced:
C2:
to ensure that the transmit power of each base station does not exceed its maximum transmit powerExtraction constraint C3:
C3:
4. a radio access network slice resource allocation method adapted to different channel characteristics according to claim 3, characterized by: in step 2, consider incomplete CSI in the base station and calculate user m under this condition s The worst rate of incomplete CSI is expressed as follows:
wherein ,representing estimated channel gain,/>Representing the error in estimating the channel gain.
5. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 1, wherein: in step 3, in deep Q learning DQN, training data is expressed as an action value and is referred to as a target value, and the loss function to be minimized is:
wherein ,yt For the target value, θ represents a parameter of the neural network, and the agent approaches y by updating θ t To learn the action values.
6. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 1, wherein: in step 4, the base station b to user m on subchannel j is calculated s And expressed as:
wherein ,representing base station b and user m on subchannel j s Transmission power between->Representing base station b and user m on subchannel j s Channel gain between them, and total bandwidth of the system is W.
7. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 6, wherein: step 4 further comprises: each sub-channel bandwidth is equal, then each sub-channel bandwidth isCalculating the base station b to the user m on the sub-channel j s Interference of (1)>And is expressed as:
the base station slice total throughput is modeled and the goal is to maximize the total throughput, and constraints are considered,
C1:
C2:
C3:
C4:
C5:
C6:
constraint C1 indicates that each user can only request one slice, constraint C2 indicates that each sub-channel can only be allocated to one user in a base station, constraint C3 indicates that the transmitting power of each base station cannot exceed the maximum transmitting power of each base station, constraint C4 ensures the QoS requirement of the slice, constraint C5 ensures the QoS requirement of the user, and constraint C6 indicates the minimum transmission rate required by the user to meet delay constraint.
8. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 1, wherein: in step 5Deep reinforcement learning is adopted to find the optimal action, and the network input is action a t Sum state s t Output is Q value of action, i.e. Q k (s t ,a t ) The method comprises the steps of carrying out a first treatment on the surface of the And calculating the next loading state s by using the target neural network t+1 Q value Q of (2) k (s t+! ,a t ) And updated by the following expression:
wherein ,αk And gamma is the learning rate and discount factor, s, respectively t+1 and rt+1 Representing the next state and in state s t Awards obtained after taking action, a representing the state s t+1 The next executable action, a is the set of executable actions,representing state s t+1 The maximum Q value in the lower action set a.
9. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 1, wherein: in step 6, the slice optimization problem can be described as an independent Markov decision process, which can be formalized as a 4-tuple < S, A, pi, R >, where S is the state space, A is the action space, pi is the policy space, R is the direct reward, defining the state set S in the Markov decision process: setting the association relation between the base station b and the user u and the channel gain as the state input of the proxy, wherein the state space is defined as follows:
wherein ,indicating base station b and user uIf there is an association then it is 1, otherwise it is 0.
10. The radio access network slice resource allocation method adapted to different channel characteristics according to claim 9, wherein: in step 6, the final reward function is obtained by a trade-off factor δ, which is defined by the local Utility availability b And the weighted sum of the average utility of the other base station agents:
where B is the total number of agents.
CN202310741049.1A 2023-06-21 2023-06-21 Radio access network slice resource allocation method adapting to different channel characteristics Pending CN116828607A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310741049.1A CN116828607A (en) 2023-06-21 2023-06-21 Radio access network slice resource allocation method adapting to different channel characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310741049.1A CN116828607A (en) 2023-06-21 2023-06-21 Radio access network slice resource allocation method adapting to different channel characteristics

Publications (1)

Publication Number Publication Date
CN116828607A true CN116828607A (en) 2023-09-29

Family

ID=88121530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310741049.1A Pending CN116828607A (en) 2023-06-21 2023-06-21 Radio access network slice resource allocation method adapting to different channel characteristics

Country Status (1)

Country Link
CN (1) CN116828607A (en)

Similar Documents

Publication Publication Date Title
CN110267338B (en) Joint resource allocation and power control method in D2D communication
CN111414252B (en) Task unloading method based on deep reinforcement learning
Wang et al. Carrier load balancing and packet scheduling for multi-carrier systems
CN111314889A (en) Task unloading and resource allocation method based on mobile edge calculation in Internet of vehicles
CN111726811B (en) Slice resource allocation method and system for cognitive wireless network
KR101567368B1 (en) Apparatus and method for managing resource to decrasse inter cell interference in a broadband wireless commmunication system
US9723572B2 (en) Systems and methods for uplink power control and scheduling in a wireless network
CN113038616B (en) Frequency spectrum resource management and allocation method based on federal learning
CN110492955B (en) Spectrum prediction switching method based on transfer learning strategy
CN110121213B (en) Multi-service resource scheduling method and device
CN114340017B (en) Heterogeneous network resource slicing method with eMBB and URLLC mixed service
Elsayed et al. Deep reinforcement learning for reducing latency in mission critical services
CN116582860A (en) Link resource allocation method based on information age constraint
CN113395723A (en) 5G NR downlink scheduling delay optimization system based on reinforcement learning
CN112153744A (en) Physical layer security resource allocation method in ICV network
CN111935825A (en) Depth value network-based cooperative resource allocation method in mobile edge computing system
CN115103326A (en) Internet of vehicles task unloading and resource management method and device based on alliance game
Shekhawat et al. A reinforcement learning framework for qos-driven radio resource scheduler
Ng et al. QoS‐based radio network dimensioning for LTE networks with heavy real‐time traffic
Gao et al. Reinforcement learning based resource allocation in cache-enabled small cell networks with mobile users
US11864158B2 (en) Distributed method for allocating transmission resources to D2D terminals in a cellular access network
US11160035B2 (en) Centralized method for allocating transmission resources to D2D terminals in a cellular access network
CN116567667A (en) Heterogeneous network resource energy efficiency optimization method based on deep reinforcement learning
Bellone et al. Deep reinforcement learning for combined coverage and resource allocation in uav-aided ran-slicing
CN116347635A (en) NB-IoT wireless resource allocation method based on NOMA and multi-agent reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination