CN112566261A - Deep reinforcement learning-based uplink NOMA resource allocation method - Google Patents
Deep reinforcement learning-based uplink NOMA resource allocation method Download PDFInfo
- Publication number
- CN112566261A CN112566261A CN202011445582.6A CN202011445582A CN112566261A CN 112566261 A CN112566261 A CN 112566261A CN 202011445582 A CN202011445582 A CN 202011445582A CN 112566261 A CN112566261 A CN 112566261A
- Authority
- CN
- China
- Prior art keywords
- network
- allocation
- power
- channel
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000013468 resource allocation Methods 0.000 title claims abstract description 45
- 230000002787 reinforcement Effects 0.000 title claims abstract description 14
- 206010042135 Stomatitis necrotising Diseases 0.000 title claims abstract description 11
- 201000008585 noma Diseases 0.000 title claims abstract description 11
- 230000005540 biological transmission Effects 0.000 claims abstract description 31
- 238000013528 artificial neural network Methods 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 15
- 238000005562 fading Methods 0.000 claims description 14
- 238000011478 gradient descent method Methods 0.000 claims description 11
- 230000003213 activating effect Effects 0.000 claims description 4
- 238000004891 communication Methods 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 238000005457 optimization Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000004088 simulation Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 241000086951 Spialia ali Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/53—Allocation or scheduling criteria for wireless resources based on regulatory allocation policies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0473—Wireless resource allocation based on the type of the allocated resource the resource being transmission power
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/70—Smart grids as climate change mitigation technology in the energy generation sector
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Abstract
The invention discloses an uplink NOMA resource allocation method based on deep reinforcement learning. The method improves the energy efficiency of the whole system and effectively reduces the power consumed by transmission by selecting the optimal sub-channel allocation strategy and power allocation strategy under the condition of meeting the minimum transmission rate of each user. The method is based on the deep Q network in the deep reinforcement learning, and the network parameters are adjusted according to the feedback of the NOMA system, so that the optimal sub-channel and power distribution are realized. The method adapts the depth Q network to the continuous resource allocation task through power discretization, reduces the output dimension of the network by utilizing a distributed network structure, and further improves the performance of the whole resource allocation network. Compared with other methods, the method can achieve better average overall energy efficiency and achieve good performance under different transmission power limits.
Description
Technical Field
The invention relates to a mobile communication and reinforcement learning neighborhood, in particular to an uplink NOMA wireless resource allocation method based on deep reinforcement learning.
Background
Fifth generation communication networks (5G) are required to meet the rapidly increasing demand for wireless data traffic, support high-density mobile subscriber communications, and provide various wireless network services. A recently proposed Non-Orthogonal Multiple Access (NOMA) technology is considered as an emerging technology that can effectively increase network capacity, and meet low latency, large-scale connection and high throughput. On one hand, compared with the conventional Orthogonal Multiple Access (OMA), the NOMA uses the Superposition Coding (SC) technique at the transmitting end, uses different power levels to allocate the same sub-channel to Multiple users for simultaneous transmission, shares channel resources, and then uses the Successive Interference Cancellation (SIC) technique at the receiving end to cancel Interference, so that the spectrum efficiency and the system capacity are greatly improved, and the NOMA is very suitable for future mobile communication.
On the other hand, since the performance gain of the NOMA system is closely related to the allocation mode of the sub-channels and the transmission power, the energy efficiency of the whole NOMA system can be maximized by designing a reasonable resource allocation scheme. Therefore, the higher transmission rate is obtained by using the lower sending power, and unnecessary resource waste is reduced while the advantages of the NOMA technology are fully utilized. Different approaches have been proposed in the present research to study the optimal resource allocation scheme of NOMA systems.
Found by searching the existing literature. Manglayev et al published a text entitled "optimal Power Allocation for non-orthogonal multiple Access (NOMA)" in IEEE International Conference on Application of Information and Communication Technologies, Oct.2016, pp.1-4 (International Conference on Information and Communication technology applications, 2016, 10.10.2016). This article presents a power allocation algorithm that maximizes capacity in combination with a fairness factor, and simulations demonstrate that higher spectral efficiency can be achieved using NOMA technology than with the original OMA technology. Zhang et al, in IEEE Transactions on Vehicular Technology, Mar.2017, vol.66, No.3, pp.2852-2857 (journal of on-board Technology of the institute of Electrical and electronics Engineers, 2017, 3 rd Vol.66, 3 rd Vol. 2852 and 2857), published a article entitled "Energy-efficiency transmission design in non-orthogonal multiple access". This document proposes a power allocation strategy that maximizes energy efficiency to meet the minimum rate requirements of the user. In addition, it is found that a document entitled "Downlink power allocation for CoMP-NOMA in multi-cell networks" (Downlink power allocation of coordinated multi-point NOMA in multi-cell networks) "published by m.s.ali et al in IEEE Transactions on Communications, sep.2018, vol.66, No.9pp.3982-3998 (journal of Communications of the institute of electrical and electronics engineers, 9 years 2018, volume 66, 9 th, page 3982-3998), researches a Downlink power allocation scheme of multi-cell, proposes a distributed power optimization algorithm to reduce the calculation complexity, and analyzes the spectrum efficiency and energy efficiency performance of the multi-cell NOMA system through simulation. All three documents only focus on the power allocation scheme in the NOMA system, however, the quality of the sub-channel allocation scheme has a great influence on the improvement of the overall system efficiency.
It was also found through searching that c.l. wang et al published a text entitled "Low-complexity Resource Allocation for Downlink multi-carrier NOMA Systems" in IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, sep.2018, pp.1-6 (society of electrical and electronics engineers, Personal Indoor and Mobile Radio Communications, year 2018, 9, pages 1-6), which, on the basis of general power Allocation studies, proposed a method of Low-complexity joint subchannel and power Allocation in NOMA Systems. Under the method, the optimal power distribution factor is obtained by closed-form solution, and the optimal subcarrier is obtained based on the low-complexity channel gain criterion, and the system capacity better than that of the traditional orthogonal frequency division multiple access scheme can be obtained. Although the method has low computational complexity, it cannot guarantee that an optimal resource allocation scheme is found.
Through the patent search, Zhu Rong et al, Nanjing post and telecommunications university invented "a resource allocation method under downlink MIMO-NOMA network" (publication No. 109922487A). The invention discloses a resource allocation method in a downlink NOMA system. The method comprises the steps of clustering users by acquiring channel state information of the users, and then distributing beam directions to the clustered users by utilizing a zero-forcing beam forming theory. And obtaining an optimal channel allocation scheme and a power allocation scheme by respectively using a Hungarian algorithm and a sub-gradient algorithm on the premise of determining power allocation and channel allocation, and alternately iterating until the user capacity is converged, thereby obtaining the optimal resource allocation scheme. In addition, the research also finds that Down Jie et al of the university of southern China's worker invented a resource allocation method of a deep learning-based energy-carrying NOMA system (publication No. 108924935A). The invention discloses a combined resource allocation method based on deep learning, which minimizes the transmission power on the premise of meeting the Quality of service (QoS) of a user. The method firstly constructs a mathematical optimization problem of joint resource allocation based on transmission power minimization in the energy-carrying NOMA system, wherein the mathematical optimization problem comprises optimization variables, an optimization objective function and constraint conditions. Then, a large amount of sample data is obtained by adopting a genetic algorithm, and a deep confidence network is trained to obtain potential information between input and output of the data sample. And finally, in the operation stage, directly outputting the optimal carrier and power allocation strategy by using the trained network. The method can efficiently obtain the resource allocation scheme under the condition that the network training is finished, realizes the resource allocation with low power consumption, and better meets the requirement of low time delay.
Although these existing resource allocation schemes improve the energy efficiency or other indexes of the entire NOMA system to some extent, these schemes have certain limitations. For example, for a conventional model-based resource allocation scheme, the computational complexity of the optimization process is high, and the time taken for the iterative algorithm is long. Although the optimization algorithm based on deep learning reduces the computational complexity, a large amount of time is still needed to construct enough sample data training networks to achieve good performance.
Disclosure of Invention
The technical problem to be solved by the present invention is to overcome the defects of the prior art, and provide a method for joint sub-channel allocation and power allocation in an uplink NOMA multi-user scenario based on Deep Reinforcement Learning (DRL), which maximizes the energy efficiency of the whole system while ensuring the minimum rate requirement of the user. As a big branch of machine learning, the DRL combines the neural networks in the traditional reinforcement learning and deep learning, collects the feedback information of the system through continuous interaction, and dynamically adjusts the parameters to make better decisions, thereby maximizing the performance of the system. Therefore, the DRL does not need a mathematical model or prior knowledge of the system, and is more suitable for solving the problem of dynamic resource allocation of an unknown system. According to the method, a Deep Q Network (DQN) in the DRL is utilized, a proper sub-channel allocation strategy is selected firstly according to channel gain information of a user, then a proper power allocation strategy is selected, and finally parameters of the allocation strategy are updated according to feedback of the system, so that optimal sub-channel allocation and power allocation are achieved, and energy efficiency of the system is improved.
The invention is realized by the following technical scheme:
the invention relates to a sub-channel allocation and power allocation method of an uplink NOMA system based on DRL, which is used for solving the problem of resource allocation of an uplink of a multi-user NOMA wireless communication system and comprises the following steps:
s1, acquiring the state: at time t, the base station acquires channel gain information of all users in the cell on different sub-channels as the current state st。
S2, sub-channel allocation: sub-channel distribution network at base station selects optimal sub-channel distribution scheme according to epsilon-greedy strategy
S3, power distribution: deriving a sub-channel allocation schemeThen, activating a power distribution network at the base station, and selecting an optimal power distribution scheme according to an epsilon-greedy strategy
S4, feedback acquisition: all ofResource allocation scheme for user according to two network outputsData is transmitted to the base station on a given subchannel at a given power. The base station returns corresponding feedback to the resource allocation network.
S5, updating parameters: and training the neural networks of all DQN units in the subchannel distribution network and the power distribution network based on two strategies of empirical replay and fixed Q values according to the obtained feedback, and updating the parameters of the networks, thereby better selecting a resource distribution scheme.
The S1) comprises the following specific steps:
at time t, the base station acquires the channel gain information of all users, and the state s at the current timetExpressed as the channel gain of all users on different sub-channels at the current time. By gk,m(t) represents the channel gain information for user m on subchannel k, then stIs represented as follows:
st={g1,1(t),g2,1(t),...,gk,m(t),..,gK,M(t)}
where K and M represent the number of subchannels and users in a cell, g, respectivelyk,m(t) includes large scale fading effects and small scale fading effects. The large-scale fading effect refers to fading caused by the shadow of a fixed obstacle on a channel path for communication between a user terminal and a base station, and comprises average path loss and shadow fading; small-scale fading is caused by multipath effects, and it is assumed that the effect on the user terminal follows rayleigh distribution.
The S2) comprises the following specific steps:
obtain the current state stThen, stIs transmitted to a subchannel assignment network at the base station. The network consists of one subchannel assignment DQN unit. The unit comprises two neural networks, namely a Q network Q (s, s; w) and a target Q network Q (s, a; w)-) W and w-Representing the parameters of the two neural networks, respectively.
Q network in sub-channel distribution DQN unit according to obtained shapeState stAnd estimating the Q values of all the sub-channel allocation schemes by using the network parameter w, namely:wherein A is1Representing the set of all possible sub-channel allocation schemes.
Then, the subchannel allocation DQN unit selects one of all subchannel allocation schemes as the current best allocation scheme following the epsilon-greedy policy. Wherein, the epsilon-greedy strategy refers to that: from A with a probability of 1-epsilon1Randomly selecting a sub-channel distribution scheme as the optimal sub-channel distribution scheme at the time tOutputting; or selecting the scheme with the maximum Q value according to the probability epsilon, namely selecting:
wherein 0 < epsilon < 1. Then, the sub-channel distribution network outputs the sub-channel distribution scheme at the time t
The S3) comprises the following specific steps:
in obtaining a sub-channel allocation schemeThereafter, the power distribution network at the base station is activated. The network consists of M power-distributing DQN units. Each power distribution DQN unit contains the same two neural networks as the subchannel allocation unit, but the parameters of these networks are different.
Using the same state stAs input, the Q-network of the mth power distribution DQN unit follows the epsilon-greedy strategy from the set a of all power distribution schemes using the same method in S22Is selected as the mth transmission powerAnd (6) outputting.
Then, the outputs of all M power distribution DQN units are combined into a power distribution scheme at time t by a power distribution networkNamely:
the S4) comprises the following specific steps:
resource allocation scheme for all users according to two network outputsData is transmitted to the base station on a given subchannel at a given power. If the transmission rate of each user can meet the minimum rate requirement, the base station calculates the sum of the energy efficiency of all users as the feedback r at the current time ttTo a subchannel distribution network and a power distribution network. If not, the feedback obtained by the two resource allocation networks is 0, i.e. the feedback is not satisfied
Wherein r istIndicating feedback at time t, RminIndicating a minimum rate requirement, Ek,mAnd Rk,mRespectively, energy efficiency and transmission rate of user m on subchannel k. Thereafter, the base station acquires new channel gain information as a new state s due to the movement of the usert+1。
The S5) comprises the following specific steps:
according to the obtained system feedback rtTraining the neural networks of all DQN units in the subchannel distribution network and the power distribution network based on two strategies of empirical replay and fixed Q value, and updating the parameters of the networks to better selectA resource allocation scheme is selected. The S of the specific parameter update includes:
(1) will be(s) at each momentt,at,rt,st+1) Storing the training samples into a memory library D as training samples of the neural network;
(2) randomly selecting N groups of samples(s) from Di,ai,ri,si+1) Training a neural network;
(3) for the subchannel allocation network, the parameter w of the Q network in the subchannel allocation DQN unit is updated by minimizing the loss function by a random gradient descent method. The loss function therein is expressed as follows:
using the stochastic gradient descent method, the update mode of the parameter w is represented as:
wherein y isiRepresenting the target Q network Q (s, alpha; w) within the DQN unit-) The resulting target Q value, α, represents the learning rate.
(4) For the power distribution network, the same random gradient descent method as (3) is used to minimize the loss functions of the M power distribution DQN units, and the neural network parameters are updated. For the mth power distribution unit, the loss function is expressed as follows:
where M ═ 1, 2,. multidot.m. And then updating the corresponding network parameters by using a random gradient descent method.
(5) For M +1 target Q networks in all resource allocation DQN units, assigning the parameter W of the corresponding Q network to the parameter W of the corresponding Q network within a fixed time W-And updating the target Q network parameters.
Compared with the prior art, the invention has the beneficial effects that: 1) the invention is a DRL-based model-free combined sub-channel allocation and power allocation method, has low calculation complexity, can efficiently obtain an optimal resource allocation scheme, and improves the energy efficiency of an uplink NOMA system. And good performance can be obtained under different transmission power limit conditions. 2) In order to apply the DQN to the power distribution task, the invention improves on the basis of the traditional DQN, provides a discretized and distributed DQN network, reduces the output dimension of the network, and further improves the performance of the whole power distribution network.
Drawings
FIG. 1 is a schematic diagram of an upstream multi-user NOMA system according to the present invention;
fig. 2 is a frame diagram of a DRL-based joint sub-channel and power allocation method according to the present invention;
FIG. 3 is a graphical illustration of the loss function over time for different learning rates according to the method of the present invention;
fig. 4 is a graph comparing the average total energy efficiency of the DRL-based joint subchannel and power allocation method of the present invention with other methods;
fig. 5 is a diagram illustrating average total energy efficiency of the DRL-based joint subchannel and power allocation method of the present invention and other methods under different transmission power constraints.
Detailed Description
The following is a detailed description of the embodiments of the present invention, which is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection authority of the present invention is not limited to the following embodiments. All other embodiments, which can be derived by a person skilled in the art from any of the embodiments of the invention without making any creative effort, fall within the protection scope of the invention.
The invention relates to a combined sub-channel allocation and power allocation method of an uplink NOMA system based on DRL. As shown in fig. 1, the base station in the NOMA wireless communication system is located at the center of a cell, and the subchannel allocation network and the power allocation network of the present invention are both located in a DRL controller at the base station. The M users are randomly distributed in the cell and randomly move between each time slot. The total bandwidth of the base station is divided equally into K mutually orthogonal subchannels. Each subchannel may serve multiple users simultaneously. Maximum transmission power per user terminal is Pmax. By bk,m(t) and pk,mAnd (t) respectively represents the subchannel allocation flag and the allocated power of the user m on the subchannel k at the time t. Wherein, bk,m(t)' 1 means that user m is allocated to subchannel k at time t, otherwise bk,m(t)=0。
The embodiment is realized by the following steps:
s1) state acquisition: the base station obtains the channel gain information of all users on different sub-channels in the cell at the time t as the current state st。
By gk,mAnd (t) represents the channel gain information of the user m on the sub-channel k at the time t. The information consists of two parts, respectively a large-scale fading beta at time tk,m(t) and small-scale fading hk,m(t) of (d). The large-scale fading refers to fading caused by the shadow of a fixed obstacle on a channel path of communication between a user terminal and a base station, and comprises average path loss and shadow fading; small-scale fading is caused by multipath effects, and it is assumed that the effect on the user terminal follows rayleigh distribution. Then gk,m(t) can be expressed as:
at the current moment tState of(s)tIs represented as follows:
st={g1,1(t),g2,1(t),...,gk,m(t),...,gK,M(t)}
s2) sub-channel allocation: according to the obtained stThe sub-channel allocation network within the DRL controller at the base station follows the epsilon-greedy policy to select the optimal sub-channel allocation scheme
wherein b isk,mThe value of (t) may be 0 or 1. All possible allocation schemes constitute a set a of sub-channel allocation schemes1。
State s obtained by the base stationtIs transmitted to a subchannel allocation network within the DRL controller, which network consists of one subchannel allocation DQN unit. The unit comprises two neural networks, namely a Q network Q (s, a; w) and a target Q network Q (s, a; w)-) W and w-Respectively representing the network parameters of the two networks. The Q network is used to estimate the Q value of the selected action, and the target Q network is used to generate a target Q value to train the network parameters.
Using the obtained stAs input, the Q network in the subchannel allocation DQN unit outputs estimated Q values for all subchannel allocation schemes using parameter w, i.e.:after all estimated Q values are obtained, the sub-channel assignment DQN unit follows the epsilon-greedy strategy from A1One scheme is selected as the optimal scheme at the current moment tSub-channel allocation scheme
Wherein, the epsilon-greedy strategy is as follows: from A with a probability of 1-epsilon1In the scheme of randomly selecting a sub-schemeOr selecting the scheme with the maximum estimated Q value with the probability epsilon, i.e.
Wherein the value range of epsilon is more than 0 and less than 1. The smaller epsilon, the more likely the base station is to attempt to select other allocation schemes, and the larger epsilon, the more likely the base station is to select the allocation scheme with the largest Q value.
Then, the sub-channel distribution network outputs the optimal sub-channel distribution scheme at the time t
S3) power allocation: in obtaining a sub-channel allocation schemeThen, activating a power distribution network in a DRL controller at the base station, and selecting an optimal power distribution scheme according to an epsilon-greedy strategy
Power allocation schemePower p allocable by each user on different sub-channelsk,m(t) is expressed as:
wherein 0 is not more than pk,m(t)≤Pmax. Since only the transmission power on the subchannel allocated to user m needs to be decided, and the power of user m on other subchannels may not need to be considered, let:
this can reduce the dimensionality of the DQN cell outputs to improve performance.
Furthermore, since the power interval available for allocation is a continuous value, the power has to be discretized to adapt the input and output of the DQN. However, power discretization causes exponential increase of output dimension, so the scheme uses a distributed architecture to solve the problem.
In the scheme, a power distribution network in a DRL controller comprises M power distribution DQN units, each unit is responsible for the power distribution task of one user, and then the power distribution schemeThe expression form of (a) is converted into:
whereinRepresenting the power distribution scheme made by the mth power distribution DQN unit at time t. Assuming that the power is discretized into L levelsThere are L alternative powers, denoted as:
the letter is obtained in S2Lane allocation schemeThereafter, M power distribution DQN units within the power distribution network are activated. Each power distribution DQN unit contains the same two neural networks as the subchannel distribution DQN unit described above, but the parameters of these neural networks are different. Using the same state stAs input, the Q network of the mth power allocation unit outputs the estimated Q value and selects one from all power allocation schemes as the transmission power of the mth user following the epsilon-greedy policyAnd (6) outputting. Combining the power of the M outputs into a power allocation schemeAs the optimal sub-channel allocation scheme at time tAnd (6) outputting.
S4) feedback acquisition: resource allocation scheme output by all users according to subchannel allocation network and power allocation networkData is transmitted to the base station on a given subchannel at a given power. The base station returns the sum of the energy efficiencies of all users as feedback.
Known subchannel allocation schemeAnd power allocation schemeAfter that, all bk,m(t) and pk,mThe value of (t) is known. According to the uplink NOMA transmission principle, the signal to interference plus noise ratio of user m on subchannel k is expressed as follows:
whereinRepresenting the variance of gaussian white noise. Using normalized bandwidth, the corresponding transmission rate is then expressed as:
Rk,m(t)=log(1+γk,m(t))
the uplink energy efficiency of the user m on the subchannel k is:
wherein P ismRepresenting a portion of the energy consumed by the user equipment operating itself.
the feedback at time t is defined as the sum of the energy efficiencies of all users on all sub-channels at the current time. If the transmission rate R of each userk,m(t) can satisfy the minimum rate requirement RminThen the base station calculates the sum of the energy efficiencies of all the users as the feedback r at the current time ttTo the subchannel allocation unit and all the power allocation units. If not, all resource allocation units obtain feedback rtIs equal to 0, i.e
Then, the channel gain information of all users changes due to the movement of the users, and the base station acquires the channel gain information of all users again as a new state st+1。
S5) parameter update: according to the system feedback r obtained in S4tTraining the neural networks of all DQN units in the subchannel distribution network and the power distribution network based on two strategies of empirical replay and fixed Q value, and updating the parameters of the networks to better select a resource distribution formulaA method for preparing a medical liquid. The S of the specific parameter update includes:
(1) will be(s) at each momentt,at,rt,st+1) Storing the training samples into a memory library D as training samples of the neural network;
(2) randomly selecting N groups of samples(s) from Di,ai,ri,si+1) Training a neural network;
(3) for the subchannel allocation network, the parameter w of the Q network of the subchannel allocation DQN unit is updated by minimizing a loss function through a random gradient descent method. The loss function of the subchannel assignment DQN unit is expressed as follows:
using the stochastic gradient descent method, the update mode of the parameter w is represented as:
wherein y isiRepresenting a target Q network Q (s, a; w) within the DQN unit-) The resulting target Q value, α, represents the learning rate.
(4) For the power distribution network, the same random gradient descent method as in S (3) is used to minimize the loss function of M power distribution DQN units, updating the neural network parameters. For the mth power distribution unit, its loss function is expressed as follows:
using a random gradient descent method, the parameter updating method of the mth power distribution unit is represented as:
where M ═ 1, 2,. multidot.m.
(5) For M +1 target Q networks in all resource allocation DQN units, assigning the parameter W of the corresponding Q network to the parameter W of the corresponding Q network within a fixed time W-And updating the target Q network parameters.
Fig. 2 is a frame diagram of a method for joint sub-channel and power allocation based on DRL according to the present invention.
In the example, a multi-user uplink NOMA scenario is considered, all users are optimized for joint sub-channel and power allocation, and main parameters of the simulation scenario of the example are shown in table 1.
TABLE 1 simulation scenario principal parameters
FIG. 3 is a graphical representation of the loss function over time for different learning rates according to the method of the present invention. The figure is from top to bottom for the case where the learning rate α in the method of the present invention is set to 0.001, 0.005 and 0.01, respectively. Simulation results show that the algorithm of the invention has good convergence. As shown in fig. 3, the loss functions for the three learning rates are initially large and decrease rapidly as the number of slots increases, and all converge within 20 steps. In particular, when α is 0.01, only a few steps are required to minimize the loss function and then stabilize. Therefore, using such a learning rate can provide a faster convergence rate to minimize the loss function, so that the prediction of the Q value becomes more accurate, and the performance of the network becomes better.
Fig. 4 is a graph comparing the average energy efficiency of the DRL-based joint subchannel and power allocation method of the present invention with other methods. The diagram is, from top to bottom, a DRL-based resource allocation method (DQN), a method using exhaustive search and random transmission power (OptRP), a method using exhaustive search and maximum transmission power (OptMP), and a method using random subchannels and maximum transmission power (RCMP), respectively, as proposed by the present invention. Where exhaustive search refers to a method of traversing all subchannel schemes and then selecting a subchannel allocation scheme that results in the highest energy efficiency. It should be noted that to better show the simulation results, the total energy efficiency is a running average taken every 100 steps. It can be seen from the figure that the energy efficiency performance of the NOMA system applying the resource allocation scheme of the present invention is much higher than that of other methods. The method of the invention can dynamically select the transmitting power according to the real-time channel information of the user, and adaptively adjust the resource allocation scheme. On the basis of meeting the requirement of the lowest rate, unnecessary transmission power is reduced, so that more energy efficiency can be provided. It can also be seen by comparison that the energy efficiency obtained using an exhaustive search far exceeds that of using random sub-channels. This also illustrates that the assignment of subchannels has a significant impact on the performance gain of the overall NOMA system.
Fig. 5 is a schematic diagram of the average energy efficiency of the DRL-based joint subchannel and power allocation method of the present invention and other methods under different transmission power constraints. The figure shows the average energy efficiency of each scheme over all time slots under different maximum transmission power constraints. It can be seen from the figure that as the maximum transmission power increases, the average energy efficiency of the method also increases and approaches a maximum value, while the average energy efficiency of the other three methods decreases to different degrees after increasing. Furthermore, it can be seen from the figure that the method of the present invention is superior to other methods under most maximum transmission power conditions.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention.
Claims (6)
1. An uplink NOMA resource allocation method based on deep reinforcement learning is characterized by comprising the following steps:
s1, acquiring the state: at time t, the base station acquires channel gain information of all users in the cell on different sub-channels as the current state st;
S2, sub-channel allocation: sub-channel distribution network at base station selects optimal sub-channel distribution scheme according to epsilon-greedy strategy
S3, power distribution: resulting subchannel allocation schemeThen, activating a power distribution network at the base station, and selecting an optimal power distribution scheme according to an epsilon-greedy strategy
S4, feedback acquisition: all users according to the resource allocation schemeTransmitting data to the base station on a given subchannel at a given power; the base station returns the sum of the energy efficiency of all the users as feedback;
s5, updating parameters: according to the system feedback r obtained in S4tBased on two strategies of empirical replay and fixed Q value, training sub-channel distribution DQN unitAnd a neural network within all power allocation DQN units, updating parameters of the network to better select a resource allocation scheme.
2. The deep reinforcement learning-based uplink NOMA resource allocation method according to claim 1, wherein the channel gain information in S1 includes large-scale fading and small-scale fading; at time t, the channel gain information of all users on different sub-channels constitutes state st。
3. The deep reinforcement learning-based uplink NOMA resource allocation method according to claim 1, wherein the sub-channel allocation in S2 comprises the following specific steps:
obtaining the current state stThen, stA subchannel assignment DQN unit communicated to a base station; the Q network Q (s, a; w) in the cell is based on the obtained state stEstimating Q values Q(s) of all sub-channel allocation schemes by using network parameters wt,a;w),a∈A1,A1Represents a set of all subchannel allocation schemes;
the sub-channel distribution DQN unit selects one of all sub-channel distribution schemes according to an epsilon-greedy strategy; the strategy is as follows: from A with a probability of 1-epsilon1In which a sub-channel allocation scheme is randomly selectedOr the scheme with the maximum Q value is selected with the probability epsilon, that is:wherein 0 < epsilon < 1.
4. The deep reinforcement learning-based uplink NOMA resource allocation method according to claim 1, wherein the specific steps of power allocation in S3 are:
in obtaining a sub-channel allocation schemeThen, activating M power distribution DQN units in a power distribution network at the base station; using the same state stAs input, the Q network of the mth power distribution DQN unit estimates the corresponding Q value, and then selects one from the set of all power distribution schemes according to the epsilon-greedy policy as the transmission power of the mth userOutputting; the output M powers are then combined into a power allocation schemeNamely:
5. the deep reinforcement learning-based uplink NOMA resource allocation method according to claim 1, wherein the feedback acquisition in S4 comprises the specific steps of:
resource allocation scheme selected by all users according to subchannel allocation network and power allocation networkTransmitting data to the base station on a given subchannel at a given power; if the transmission rate of each user can meet the minimum rate requirement, the base station calculates the sum of the energy efficiency of all users as the feedback r at the current time ttTo the subchannel allocation unit and all the power allocation units; if not, all resource allocation units get a feedback of 0, i.e.
Wherein r istIndicating feedback at time t, RminIndicating a minimum rate requirement, Ek,mAnd Rk,mRespectively representing the energy efficiency and the transmission rate of the user m on the subchannel k; as all users move, the base station acquires new channel gain information st+1。
6. The deep reinforcement learning-based uplink NOMA resource allocation method according to claim 1, wherein the specific step of updating S5 parameter includes:
(1) will be(s) at each momentt,at,rt,st+1) Storing the training samples into a memory library D as training samples of the neural network;
(2) randomly selecting N groups of samples(s) from Di,ai,ri,si+1) Training a neural network;
(3) for a subchannel allocation network, updating a parameter w of a subchannel allocation DQN unit Q network by minimizing a loss function through a random gradient descent method; the loss function of the subchannel assignment DQN unit is expressed as follows:
using the stochastic gradient descent method, the update mode of the parameter w is represented as:
wherein y isiRepresenting a network Q (s, a; w) formed by a target Q-) The resulting target Q value, α, represents the learning rate;
(4) for the power distribution network, minimizing loss functions of all power distribution DQN units by using the same random gradient descent method as (3), and updating neural network parameters; for the mth power distribution DQN unit, the loss function is expressed as follows:
wherein M ═ {1, 2,. said, M };
(5) for M +1 target Q networks in all resource allocation DQN units, assigning the parameter W of the corresponding Q network to the parameter W of the corresponding Q network within a fixed time W-And updating the target Q network parameters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011445582.6A CN112566261A (en) | 2020-12-08 | 2020-12-08 | Deep reinforcement learning-based uplink NOMA resource allocation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011445582.6A CN112566261A (en) | 2020-12-08 | 2020-12-08 | Deep reinforcement learning-based uplink NOMA resource allocation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112566261A true CN112566261A (en) | 2021-03-26 |
Family
ID=75061197
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011445582.6A Pending CN112566261A (en) | 2020-12-08 | 2020-12-08 | Deep reinforcement learning-based uplink NOMA resource allocation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112566261A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113162682A (en) * | 2021-05-13 | 2021-07-23 | 重庆邮电大学 | PD-NOMA-based multi-beam LEO satellite system resource allocation method |
CN113242602A (en) * | 2021-05-10 | 2021-08-10 | 内蒙古大学 | Millimeter wave large-scale MIMO-NOMA system resource allocation method and system |
CN113253249A (en) * | 2021-04-19 | 2021-08-13 | 中国电子科技集团公司第二十九研究所 | MIMO radar power distribution design method based on deep reinforcement learning |
CN113543271A (en) * | 2021-06-08 | 2021-10-22 | 西安交通大学 | Effective capacity-oriented resource allocation method and system |
CN113595609A (en) * | 2021-08-13 | 2021-11-02 | 电子科技大学长三角研究院(湖州) | Cellular mobile communication system cooperative signal sending method based on reinforcement learning |
CN114698077A (en) * | 2022-02-16 | 2022-07-01 | 东南大学 | Dynamic power distribution and energy level selection method in semi-authorization-free scene |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180027507A1 (en) * | 2016-07-19 | 2018-01-25 | Institut Mines-Telecom / Telecom Bretagne | Method and apparatus for power and user distribution to sub-bands in noma systems |
CN108737057A (en) * | 2018-04-27 | 2018-11-02 | 南京邮电大学 | Multicarrier based on deep learning recognizes NOMA resource allocation methods |
-
2020
- 2020-12-08 CN CN202011445582.6A patent/CN112566261A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180027507A1 (en) * | 2016-07-19 | 2018-01-25 | Institut Mines-Telecom / Telecom Bretagne | Method and apparatus for power and user distribution to sub-bands in noma systems |
CN108737057A (en) * | 2018-04-27 | 2018-11-02 | 南京邮电大学 | Multicarrier based on deep learning recognizes NOMA resource allocation methods |
Non-Patent Citations (1)
Title |
---|
XIAOMING WANG ET AL.: "DRL-Based Energy-Efficient Resource Allocation Frameworks for Uplink NOMA Systems", 《IEEE INTERNET OF THINGS JOURNAL》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113253249A (en) * | 2021-04-19 | 2021-08-13 | 中国电子科技集团公司第二十九研究所 | MIMO radar power distribution design method based on deep reinforcement learning |
CN113253249B (en) * | 2021-04-19 | 2023-04-28 | 中国电子科技集团公司第二十九研究所 | MIMO radar power distribution design method based on deep reinforcement learning |
CN113242602A (en) * | 2021-05-10 | 2021-08-10 | 内蒙古大学 | Millimeter wave large-scale MIMO-NOMA system resource allocation method and system |
CN113242602B (en) * | 2021-05-10 | 2022-04-22 | 内蒙古大学 | Millimeter wave large-scale MIMO-NOMA system resource allocation method and system |
CN113162682A (en) * | 2021-05-13 | 2021-07-23 | 重庆邮电大学 | PD-NOMA-based multi-beam LEO satellite system resource allocation method |
CN113162682B (en) * | 2021-05-13 | 2022-06-24 | 重庆邮电大学 | PD-NOMA-based multi-beam LEO satellite system resource allocation method |
CN113543271A (en) * | 2021-06-08 | 2021-10-22 | 西安交通大学 | Effective capacity-oriented resource allocation method and system |
CN113543271B (en) * | 2021-06-08 | 2022-06-07 | 西安交通大学 | Effective capacity-oriented resource allocation method and system |
CN113595609A (en) * | 2021-08-13 | 2021-11-02 | 电子科技大学长三角研究院(湖州) | Cellular mobile communication system cooperative signal sending method based on reinforcement learning |
CN113595609B (en) * | 2021-08-13 | 2024-01-19 | 电子科技大学长三角研究院(湖州) | Collaborative signal transmission method of cellular mobile communication system based on reinforcement learning |
CN114698077A (en) * | 2022-02-16 | 2022-07-01 | 东南大学 | Dynamic power distribution and energy level selection method in semi-authorization-free scene |
CN114698077B (en) * | 2022-02-16 | 2024-02-02 | 东南大学 | Dynamic power distribution and energy level selection method in semi-unlicensed scene |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112566261A (en) | Deep reinforcement learning-based uplink NOMA resource allocation method | |
CN108737057B (en) | Multi-carrier cognitive NOMA resource allocation method based on deep learning | |
CN104640220B (en) | A kind of frequency and power distribution method based on NOMA systems | |
Pietrzyk et al. | Multiuser subcarrier allocation for QoS provision in the OFDMA systems | |
US8174959B2 (en) | Auction based resource allocation in wireless systems | |
CN110418399B (en) | NOMA-based Internet of vehicles resource allocation method | |
CN104703270B (en) | User's access suitable for isomery wireless cellular network and power distribution method | |
CN109996264B (en) | Power allocation method for maximizing safe energy efficiency in non-orthogonal multiple access system | |
CN111586646B (en) | Resource allocation method for D2D communication combining uplink and downlink channels in cellular network | |
Guo et al. | Fairness-aware energy-efficient resource allocation in D2D communication networks | |
CN114885420A (en) | User grouping and resource allocation method and device in NOMA-MEC system | |
CN107484180B (en) | Resource allocation method based on D2D communication in very high frequency band | |
CN114423028B (en) | CoMP-NOMA cooperative clustering and power distribution method based on multi-agent deep reinforcement learning | |
CN113923767A (en) | Energy efficiency maximization method for multi-carrier cooperation non-orthogonal multiple access system | |
CN112367523B (en) | Resource management method in SVC multicast based on NOMA in heterogeneous wireless network | |
CN108419298B (en) | Power distribution method based on energy efficiency optimization in non-orthogonal multiple access system | |
CN113507716A (en) | SWIPT-based CR-NOMA network interruption and energy efficiency optimization method | |
Masaracchia et al. | The impact of user mobility into non-orthogonal multiple access (noma) transmission systems | |
CN107613565B (en) | Wireless resource management method in full-duplex ultra-dense network | |
CN115833886A (en) | Power control method of non-cellular large-scale MIMO system | |
CN107172574B (en) | Power distribution method for D2D user to sharing frequency spectrum with cellular user | |
CN115243234A (en) | User association and power control method and system for M2M heterogeneous network | |
Wang et al. | Throughput maximization-based optimal power allocation for energy-harvesting cognitive radio networks with multiusers | |
Masaracchia et al. | On the optimal user grouping in NOMA system technology | |
CN113141656B (en) | NOMA cross-layer power distribution method and device based on improved simulated annealing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210326 |
|
RJ01 | Rejection of invention patent application after publication |