CN112351433A - Heterogeneous network resource allocation method based on reinforcement learning - Google Patents
Heterogeneous network resource allocation method based on reinforcement learning Download PDFInfo
- Publication number
- CN112351433A CN112351433A CN202110006111.3A CN202110006111A CN112351433A CN 112351433 A CN112351433 A CN 112351433A CN 202110006111 A CN202110006111 A CN 202110006111A CN 112351433 A CN112351433 A CN 112351433A
- Authority
- CN
- China
- Prior art keywords
- base station
- user
- state
- network
- resource allocation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/02—Resource partitioning among network components, e.g. reuse partitioning
- H04W16/10—Dynamic resource partitioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/18—TPC being performed according to specific parameters
- H04W52/24—TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
- H04W52/243—TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account interferences
- H04W52/244—Interferences in heterogeneous networks, e.g. among macro and femto or pico cells or other sector / system interference [OSI]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/30—TPC using constraints in the total amount of available transmission power
- H04W52/34—TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading
- H04W52/346—TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading distributing total power among users or channels
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0453—Resources in frequency domain, e.g. a carrier in FDMA
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0473—Wireless resource allocation based on the type of the allocated resource the resource being transmission power
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a heterogeneous network resource allocation method based on reinforcement learning, which comprises the steps of firstly deploying a DNN framework on each base station, wherein the DNN framework is based on an ADMM algorithm and takes channel information as the weight of a network; giving an optimal resource allocation strategy in the current state according to data obtained by a base station, namely current user association information and average interference power; regarding each base station as an independent subject, and regarding the state of the base station as a modeling environment; a plurality of agent programs observe the same heterogeneous network environment and take action, and meanwhile, the agent programs are communicated with each other through the awards of the environment; the agent adjusts the policy according to the reward; the resource allocation method provided by the invention is based on the deep learning network, can provide a resource allocation scheme without all CSI information, considers the spectrum efficiency at the same time, sets the spectrum efficiency function as the reward of an agent, and can ensure the spectrum efficiency while ensuring the system throughput.
Description
Technical Field
The invention relates to the technical field of wireless communication, in particular to a heterogeneous network resource allocation method based on reinforcement learning.
Background
With the rapid growth of mobile devices and the emergence of the internet of things, next generation wireless networks face a great challenge to the proliferation of wireless applications. The most promising solution is to augment existing cellular networks with pico and femto cells with various transmission powers and coverage areas. These heterogeneous networks (hetnets) may transfer User Equipments (UEs) from Macro Base Stations (MBS) to Pico Base Stations (PBS), with different transmission powers and coverage. In addition, to achieve high spectral efficiency of the heterogeneous network, the PBS may reuse the MBS and share the same channel with the MBS. Therefore, heterogeneous networks are considered as a good strategy to increase the capacity of future wireless communication systems. There are some optimization problems in such heterogeneous networks, such as spectrum allocation and resource allocation. Recent studies have proposed new methods such as the game theory method, the linear programming method and the markov approximation strategy. However, these methods require almost complete information, which may not be generally available. Thus, it is challenging for the above-described approaches to achieve an optimal solution without such complete information.
Disclosure of Invention
The invention provides a dynamic resource allocation scheme aiming at the problem of downlink resource allocation in a heterogeneous cellular network. In particular, dynamic power allocation and channel allocation strategies are provided for the base station. To improve spectral efficiency, energy efficiency in heterogeneous cellular networks, an optimization framework based on Deep Neural Networks (DNN) is first composed of a series of multiplier alternating direction method (ADMM) iterative processes, making Channel State Information (CSI) the weight of learning. And applying a Deep Reinforcement Learning (DRL) framework to obtain a resource allocation scheme with Spectrum Efficiency (SE) and Energy Efficiency (EE).
In the downlink of a heterogeneous network with M base stations and N mobile users, the macro base station MBS hasA micro base station PBS hasAnd satisfy;
Setting upIndicating a base stationmAnd the usernThe association relationship between the two or more of the three,indicating a base stationmAnd the usernAssociating;indicating a base stationmAnd the usernIrrelevant;
setting upIndicating the state of the spectrum when the user is presentnAnd sub-carrierkBase station ofmAssociated, spectral stateThe following rules may be used to determine:representing a useriUsing channelsk;Indicating that the user is not using the channelk;
Setting upRepresenting a usernAnd a base stationmOn-channelkA transmission power of; the method comprises the following specific steps:
indicating that the total transmit power of each cell base station should be at a preset power limitBelow;
representing time slots using block fading modelstUser's devicenTo the base stationmThe downlink channel gains of (c) are as follows:
whereinRepresenting large-scale fading components including path loss and lognormal shading, and following a Jakes fading model; small scale Rayleigh fading componentExpressed as a first order gaussian-markov process:
whereinAre independent and have uniformly distributed circularly symmetric complex gaussian random variables of unit variance;whereinIs a first type of zero order bessel function,is the maximum doppler frequency;
the inter-cell interference ICI experienced when users in different cells are allocated the same sub-carriers is expressed as follows:
whereinIndicating a base stationmOn the sub-carriernUser of upper servicek(ii) experienced inter-cell interference;is represented in sub-carriersnUpper base stationm' To the usern' The transmission power of the antenna is set to be,is at the sub-carrierkUpper slave base stationm' To the usernThe square of the channel gain of (d); when in useFrom the base stationmOn the sub-carriernUser of upper servicekThe signal to interference plus noise ratio of (c) is as follows:
whereinIs from the base stationmTo the usernA power of additive white gaussian noise; when the base stationmTo a usernAnd base stationm' To a usern'Are allocated sub-carriers simultaneouslykWhen the temperature of the water is higher than the set temperature,will interfere with the base stationmTo a usernAnd is and;
step S1, deploying a DNN framework for each base station, wherein the DNN framework is based on an ADMM algorithm and takes channel information CSI as heterogeneous network weight; giving an optimal resource allocation strategy in the current state according to the user association information and the average interference power obtained by the base station; in particular, the amount of the solvent to be used,
the spectral efficiency objective optimization function is as follows:
the energy efficiency objective optimization function is as follows:
the multi-objective optimization function is solved based on an ADMM algorithm, and the augmented Lagrange function is as follows:
whereinThe values, representing the lagrangian multiplier,is a penalty parameter; at this time, the unconstrained optimization problem can be expressed as:
the following can be obtained:
wherein:
step S2, regarding each base station as an independent subject, and regarding the state of the base station as a modeling environment; a plurality of agent programs observe the same heterogeneous network environment and take action, and meanwhile, the agent programs are communicated with each other through the awards of the environment; the agent adjusts the policy according to the reward; specifically, the method comprises the following steps:
and (4) state set S: byA state component comprising(ii) a The state for characterizing the heterogeneous network environment comprises user association informationAnd interference powerThen the heterogeneous network state is represented as:;
action set A: based on the current state, the agent is pi-in based on a decision policyTaking an action; the actions include selecting a subcarrierAnd corresponding transmission power(ii) a Then the action is represented as;
Rewarding: agent computing environment reward after action is taken(ii) a Defining the energy efficiency function as a reward in the system model:
designing a DNN-based optimization framework, and combining Q learning to generate a strategy pi; wherein the input to the DNN-based optimization framework is the set of observed states S and the output of the DNN-based optimization framework is all executable actions in the set of actions A; each shapeThe dynamic action pairs all have corresponding Q values(ii) a Each step selects the action that achieves the maximum Q value at each state, as shown below
WhereinAndlearning rate and discount factor, respectively;andindicates the next state and is in the stateThe reward obtained after the action is taken is,indicating a stateThe following executable acts may be performed,is a set of executable actions;indicating a stateThe value of Q in the following (A),indicating a stateSet of executable actions underMaximum Q value of (1); the loss function in each agent can be expressed as:
whereinA weight representing the target network; use ofGreedy policy from online networkIn the selection actionThe target network isWhile the weights are fixed, multiple iterations are performed while updating the weights in the online network.
Further, the resource allocation method based on the ADMM algorithm in step S1 specifically includes the following steps:
Step S1.3, setting threshold valueMaximum number of iterationsStarting iteration; network computing based on DNN(ii) a When in useThen output the corresponding。
Further, the step S2 obtains an optimal resource allocation scheme by using the ADMM network that uses the channel state information as the network weight, and includes the following specific steps:
step S2.1, initializing the reproduction memoryDQN network parametersAnd target network replacement step size;
Step S2.2, initializing the on-line networkAnd weightInitializing an online networkAnd make the weight;
Step S2.4, each agent program is used according to the current state informationGreedy policy selection decision;
step S2.7, randomly sampling from D, and calculating loss functionAnd update the weightEvery other, atUpdating target network parametersUntil all agents meet a threshold or a maximum iteration step is reached.
Compared with the prior art, the invention has the following technical advantages:
(1) when the problem of resource allocation in a heterogeneous network is solved, because the traditional convex optimization method is difficult to provide a resource allocation scheme under the condition of incomplete CSI information, the method can provide the resource allocation scheme without all CSI information based on a deep learning network;
(2) when the resource allocation is considered, the spectrum efficiency is considered at the same time, and the method is applied to the deep reinforcement learning based on model driving, and the deep reinforcement learning method driven by the model is not applied to the resource allocation scheme of the heterogeneous network at present. The spectrum efficiency function is set as the reward of the agent, so that the spectrum efficiency can be ensured while the system throughput is ensured.
Drawings
FIG. 1 is a schematic diagram of a dual-layer heterogeneous cellular network provided by the present invention;
fig. 2 is a structural diagram of a DNN optimization framework based on an ADMM algorithm provided by the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
The dual-layer heterogeneous cellular network shown in FIG. 1 comprises M base stations and N mobile users, wherein the macro base station MBS hasA micro base station PBS hasAnd satisfy. Each cell site is located at the center of each cell and authorizes mobile users to be randomly distributed in the cell. It is assumed that there is an overlapping area between every two adjacent small cells. It is assumed that each communication terminal is equipped with an antenna for signal transmission. To maximize the use of radio resources and avoid trivial cases, the frequency reuse factor is set to 1, and to avoid intra-cell interference, it is assumed that each user in each cell is allocated only one subcarrier, so all signals are orthogonal in the same subcarrier. The N orthogonal subcarriers used in a cell may be reused in each neighboring cell. However, users in the overlapping area are served by the nearest small cell BS and may suffer from severe inter-cell interference (ICI) due to the fact that they may use the same spectral resources.
Setting upIndicating a base stationmAnd the usernThe association relationship between the two or more of the three,indicating a base stationmAnd the usernAssociating;indicating a base stationmAnd the usernIrrelevant;
setting upIndicating the state of the spectrum when the user is presentnAnd sub-carrierkBase station ofmAssociated, spectral stateThe following rules may be used to determine:for indicatingHouseholdiUsing channelsk;Indicating that the user is not using the channelk;
Setting upRepresenting a usernAnd a base stationmOn-channelkA transmission power of; the method comprises the following specific steps:
indicating that the total transmit power of each cell base station should be at a preset power limitBelow;
representing time slots using block fading modelstUser's devicenTo the base stationmThe downlink channel gains of (c) are as follows:
whereinRepresenting large-scale fading components including path loss and lognormal shading, and following a Jakes fading model; small scale Rayleigh fading componentExpressed as a first order gaussian-markov process:
whereinAre independent and have uniformly distributed circularly symmetric complex gaussian random variables of unit variance;whereinIs a first type of zero order bessel function,is the maximum doppler frequency;
the inter-cell interference ICI experienced when users in different cells are allocated the same sub-carriers is expressed as follows:
whereinIndicating a base stationmOn the sub-carriernUser of upper servicek(ii) experienced inter-cell interference;is represented in sub-carriersnUpper base stationm' To the usern' The transmission power of the antenna is set to be,is at the sub-carrierkUpper slave base stationm' To the usernThe square of the channel gain of (d); when in useFrom the base stationmOn the sub-carriernUser of upper servicekThe signal to interference plus noise ratio of (c) is as follows:
whereinIs from the base stationmTo the usernA power of additive white gaussian noise; when the base stationmTo a usernAnd base stationm' To a usern'Are allocated sub-carriers simultaneouslykWhen the temperature of the water is higher than the set temperature,will interfere with the base stationmTo a usernAnd is and。
the embodiment of the invention is divided into two parts, firstly, a DNN frame is deployed for each base station, the DNN frame is based on an ADMM algorithm, and channel information CSI is used as the weight of a heterogeneous network; it is assumed that the long term average interference power received by each UE can be estimated and fed back to the serving base station through a feedback channel. This information exchange requires very limited resources to be obtained with very low frequency compared to the required signal CSI. Giving an optimal resource allocation strategy in the current state according to the user association information and the average interference power obtained by the base station; in particular, the amount of the solvent to be used,
deploying a DNN framework for each base station, wherein the DNN framework is based on an ADMM algorithm and takes channel information CSI as heterogeneous network weight; giving an optimal resource allocation strategy in the current state according to the user association information and the average interference power obtained by the base station; in particular, the amount of the solvent to be used,
the spectral efficiency objective optimization function is as follows:
the energy efficiency objective optimization function is as follows:
the multi-objective optimization function is solved based on an ADMM algorithm, and the augmented Lagrange function is as follows:
whereinThe values, representing the lagrangian multiplier,is a penalty parameter; at this time, the unconstrained optimization problem can be expressed as:
the following can be obtained:
wherein:
the DNN-based optimization framework shown in fig. 2 includes neurons corresponding to different operations in the ADMM iteration process, and directed edges corresponding to the data flow between the operations. Thus, the first of the DNN-based optimization frameworkskLayer corresponds to the second of ADMM procedurekAnd (6) iteration. Upon entering the DNN-based optimization framework, the input data flows through multiple layers of repetition, which correspond to successive iterations in the ADMM. When the convergence condition is satisfied, the DNN-based optimization framework will generate a resource allocation result. Specifically, the resource allocation method based on the ADMM algorithm comprises the following specific steps:
Step S1.3, setting threshold valueMaximum number of iterationsStarting iteration; network computing based on DNN(ii) a When in useThen output the corresponding。
The second part is that each base station is regarded as an independent subject, and the state of the base station is used as a modeling environment; a plurality of agent programs observe the same heterogeneous network environment and take action, and meanwhile, the agent programs are communicated with each other through the awards of the environment; the agent adjusts the policy according to the reward; specifically, the method comprises the following steps:
and (4) state set S: byA state component comprising(ii) a The state for characterizing the heterogeneous network environment comprises user association informationAnd interference powerThen the heterogeneous network state is represented as:;
action set A: based on the current state, the agent is pi-in based on a decision policyTaking an action; the actions include selecting a subcarrierAnd corresponding transmission power(ii) a Then the action is represented as;
Rewarding: agent computing environment reward after action is taken(ii) a Defining the energy efficiency function as a reward in the system model:
designing a DNN-based optimization framework, and combining Q learning to generate a strategy pi; wherein the input to the DNN-based optimization framework is the set of observed states S and the output of the DNN-based optimization framework is all executable actions in the set of actions A; each state action pair has a corresponding Q value(ii) a Each step selects the action that achieves the maximum Q value at each state, as shown below
WhereinAndlearning rate and discount factor, respectively;andindicates the next state and is in the stateThe reward obtained after the action is taken is,indicating a stateThe following executable acts may be performed,is a set of executable actions;indicating a stateThe value of Q in the following (A),indicating a stateSet of executable actions underMaximum Q value of (1); the loss function in each agent can be expressed as:
whereinA weight representing the target network; use ofGreedy policy from online networkIn the selection actionThe target network isWhile the weights are fixed, multiple iterations are performed while updating the weights in the online network.
Specifically, the steps of obtaining the optimal resource allocation scheme by using the ADMM network using the channel state information as the network weight are as follows:
step S2.1, initializing the reproduction memoryDQN network parametersAnd target network replacement step size;
Step S2.2, initializing the on-line networkAnd weightInitialization ofOnline networkAnd make the weight;
Step S2.4, each agent program is used according to the current state informationGreedy policy selection decision;
Claims (3)
1. A heterogeneous network resource allocation method based on reinforcement learning is characterized in that in a downlink of a heterogeneous network with M base stations and N mobile users, a macro base station MBS hasA micro base station PBS hasAnd satisfy;
Setting upIndicating a base stationmAnd the usernThe association relationship between the two or more of the three,indicating a base stationmAnd the usernAssociating;indicating a base stationmAnd the usernIrrelevant;
setting upIndicating the state of the spectrum when the user is presentnAnd sub-carrierkBase station ofmAssociated, spectral stateThe following rules were used to determine:representing a useriUsing channelsk;Indicating that the user is not using the channelk;
Setting upRepresenting a usernAnd a base stationmOn-channelkA transmission power of; the method comprises the following specific steps:
indicating that the total transmit power of each cell base station should be at a preset power limitBelow;
representing time slots using block fading modelstUser's devicenTo the base stationmThe downlink channel gains of (c) are as follows:
whereinRepresenting large-scale fading components including path loss and lognormal shading, and following a Jakes fading model; small scale Rayleigh fading componentExpressed as a first order gaussian-markov process:
whereinAre independent and have uniformly distributed circularly symmetric complex gaussian random variables of unit variance;whereinIs a first type of zero order bessel function,is the maximum doppler frequency;
the inter-cell interference ICI experienced when users in different cells are allocated the same sub-carriers is expressed as follows:
whereinIndicating a base stationmOn the sub-carriernUser of upper servicek(ii) experienced inter-cell interference;is represented in sub-carriersnUpper base stationm' To the usern' The transmission power of the antenna is set to be,is at the sub-carrierkUpper slave base stationm' To the usernThe square of the channel gain of (d); when in useFrom the base stationmOn the sub-carriernUser of upper servicekThe signal to interference plus noise ratio of (c) is as follows:
whereinIs from the base stationmTo the usernA power of additive white gaussian noise; when the base stationmTo a usernAnd base stationm' To a usern'Are allocated sub-carriers simultaneouslykWhen the temperature of the water is higher than the set temperature,will interfere with the base stationmTo a usernAnd is and;
step S1, deploying a DNN framework for each base station, wherein the DNN framework is based on an ADMM algorithm and takes channel information CSI as heterogeneous network weight; giving an optimal resource allocation strategy in the current state according to the user association information and the average interference power obtained by the base station; in particular, the amount of the solvent to be used,
the spectral efficiency objective optimization function is as follows:
the energy efficiency objective optimization function is as follows:
the multi-objective optimization function is solved based on an ADMM algorithm, and the augmented Lagrange function is as follows:
whereinThe values, representing the lagrangian multiplier,is a penalty parameter; at this time, the unconstrained optimization problem can be expressed as:
the following can be obtained:
wherein:
step S2, regarding each base station as an independent subject, and regarding the state of the base station as a modeling environment; a plurality of agent programs observe the same heterogeneous network environment and take action, and meanwhile, the agent programs are communicated with each other through the awards of the environment; the agent adjusts the policy according to the reward; specifically, the method comprises the following steps:
and (4) state set S: byA state component comprising(ii) a The state for characterizing the heterogeneous network environment comprises user association informationAnd interference powerThen the heterogeneous network state is represented as:;
action set A: based on the current state, the agent is pi-in based on a decision policyTaking an action; the actions include selecting a subcarrierAnd corresponding transmission power(ii) a Then the action is represented as;
Rewarding: agent computing environment reward after action is taken(ii) a Defining the energy efficiency function as a reward in the system model:
designing a DNN-based optimization framework, and combining Q learning to generate a strategy pi; wherein the input to the DNN-based optimization framework is the set of observed states S and the output of the DNN-based optimization framework is all executable actions in the set of actions A; each state action pair has a corresponding Q value(ii) a Each step selects the action that achieves the maximum Q value at each state, as shown below
WhereinAndlearning rate and discount factor, respectively;andindicates the next state and is in the stateThe reward obtained after the action is taken is,indicating a stateThe following executable acts may be performed,is a set of executable actions;indicating a stateThe value of Q in the following (A),indicating a stateSet of executable actions underMaximum Q value of (1); the loss function in each agent can be expressed as:
2. The reinforcement learning-based heterogeneous network resource allocation method according to claim 1, wherein the resource allocation method based on the ADMM algorithm in step S1 specifically includes the following steps:
3. The method according to claim 1, wherein the step S2 obtains an optimal resource allocation scheme using an ADMM network that uses channel state information as a network weight, and includes the following steps:
step S2.1, initializing the reproduction memoryDQN network parametersAnd target network replacement step size;
Step S2.2, initializing the on-line networkAnd weightInitializing an online networkAnd make the weight;
Step S2.4, each agent program is used according to the current state informationGreedy policy selection decision;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110006111.3A CN112351433B (en) | 2021-01-05 | 2021-01-05 | Heterogeneous network resource allocation method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110006111.3A CN112351433B (en) | 2021-01-05 | 2021-01-05 | Heterogeneous network resource allocation method based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112351433A true CN112351433A (en) | 2021-02-09 |
CN112351433B CN112351433B (en) | 2021-05-25 |
Family
ID=74427832
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110006111.3A Active CN112351433B (en) | 2021-01-05 | 2021-01-05 | Heterogeneous network resource allocation method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112351433B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113162682A (en) * | 2021-05-13 | 2021-07-23 | 重庆邮电大学 | PD-NOMA-based multi-beam LEO satellite system resource allocation method |
CN113242602A (en) * | 2021-05-10 | 2021-08-10 | 内蒙古大学 | Millimeter wave large-scale MIMO-NOMA system resource allocation method and system |
CN113473580A (en) * | 2021-05-14 | 2021-10-01 | 南京信息工程大学滨江学院 | Deep learning-based user association joint power distribution strategy in heterogeneous network |
CN113613301A (en) * | 2021-08-04 | 2021-11-05 | 北京航空航天大学 | Air-space-ground integrated network intelligent switching method based on DQN |
CN114116156A (en) * | 2021-10-18 | 2022-03-01 | 武汉理工大学 | Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method |
CN114205899A (en) * | 2022-01-18 | 2022-03-18 | 电子科技大学 | Heterogeneous network high energy efficiency power control method based on deep reinforcement learning |
CN114340017A (en) * | 2022-03-17 | 2022-04-12 | 山东科技大学 | Heterogeneous network resource slicing method with eMBB and URLLC mixed service |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102238631A (en) * | 2011-08-17 | 2011-11-09 | 南京邮电大学 | Method for managing heterogeneous network resources based on reinforcement learning |
CN106358308A (en) * | 2015-07-14 | 2017-01-25 | 北京化工大学 | Resource allocation method for reinforcement learning in ultra-dense network |
CN108521673A (en) * | 2018-04-09 | 2018-09-11 | 湖北工业大学 | Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network |
CN108848561A (en) * | 2018-04-11 | 2018-11-20 | 湖北工业大学 | A kind of isomery cellular network combined optimization method based on deeply study |
US20190124667A1 (en) * | 2017-10-23 | 2019-04-25 | Commissariat A L'energie Atomique Et Aux Energies Alternatives | Method for allocating transmission resources using reinforcement learning |
-
2021
- 2021-01-05 CN CN202110006111.3A patent/CN112351433B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102238631A (en) * | 2011-08-17 | 2011-11-09 | 南京邮电大学 | Method for managing heterogeneous network resources based on reinforcement learning |
CN106358308A (en) * | 2015-07-14 | 2017-01-25 | 北京化工大学 | Resource allocation method for reinforcement learning in ultra-dense network |
US20190124667A1 (en) * | 2017-10-23 | 2019-04-25 | Commissariat A L'energie Atomique Et Aux Energies Alternatives | Method for allocating transmission resources using reinforcement learning |
CN108521673A (en) * | 2018-04-09 | 2018-09-11 | 湖北工业大学 | Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network |
CN108848561A (en) * | 2018-04-11 | 2018-11-20 | 湖北工业大学 | A kind of isomery cellular network combined optimization method based on deeply study |
Non-Patent Citations (2)
Title |
---|
冯陈伟 等: "基于强化学习的异构无线网络资源管理算法", 《电信科学》 * |
陈前斌 等: "基于深度强化学习的异构云无线接入网自适应无线资源分配算法", 《电子与信息学报》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113242602A (en) * | 2021-05-10 | 2021-08-10 | 内蒙古大学 | Millimeter wave large-scale MIMO-NOMA system resource allocation method and system |
CN113242602B (en) * | 2021-05-10 | 2022-04-22 | 内蒙古大学 | Millimeter wave large-scale MIMO-NOMA system resource allocation method and system |
CN113162682A (en) * | 2021-05-13 | 2021-07-23 | 重庆邮电大学 | PD-NOMA-based multi-beam LEO satellite system resource allocation method |
CN113162682B (en) * | 2021-05-13 | 2022-06-24 | 重庆邮电大学 | PD-NOMA-based multi-beam LEO satellite system resource allocation method |
CN113473580A (en) * | 2021-05-14 | 2021-10-01 | 南京信息工程大学滨江学院 | Deep learning-based user association joint power distribution strategy in heterogeneous network |
CN113473580B (en) * | 2021-05-14 | 2024-04-26 | 南京信息工程大学滨江学院 | User association joint power distribution method based on deep learning in heterogeneous network |
CN113613301A (en) * | 2021-08-04 | 2021-11-05 | 北京航空航天大学 | Air-space-ground integrated network intelligent switching method based on DQN |
CN113613301B (en) * | 2021-08-04 | 2022-05-13 | 北京航空航天大学 | Air-ground integrated network intelligent switching method based on DQN |
CN114116156A (en) * | 2021-10-18 | 2022-03-01 | 武汉理工大学 | Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method |
CN114205899A (en) * | 2022-01-18 | 2022-03-18 | 电子科技大学 | Heterogeneous network high energy efficiency power control method based on deep reinforcement learning |
CN114205899B (en) * | 2022-01-18 | 2023-04-07 | 电子科技大学 | Heterogeneous network high-energy-efficiency power control method based on deep reinforcement learning |
CN114340017A (en) * | 2022-03-17 | 2022-04-12 | 山东科技大学 | Heterogeneous network resource slicing method with eMBB and URLLC mixed service |
Also Published As
Publication number | Publication date |
---|---|
CN112351433B (en) | 2021-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112351433B (en) | Heterogeneous network resource allocation method based on reinforcement learning | |
Alqerm et al. | Sophisticated online learning scheme for green resource allocation in 5G heterogeneous cloud radio access networks | |
Xie et al. | Energy-efficient resource allocation for heterogeneous cognitive radio networks with femtocells | |
Zappone et al. | User association and load balancing for massive MIMO through deep learning | |
CN108848561A (en) | A kind of isomery cellular network combined optimization method based on deeply study | |
Samarakoon et al. | Backhaul-aware interference management in the uplink of wireless small cell networks | |
CN107426773B (en) | Energy efficiency-oriented distributed resource allocation method and device in wireless heterogeneous network | |
CN106358308A (en) | Resource allocation method for reinforcement learning in ultra-dense network | |
Nasir et al. | Joint resource optimization for multicell networks with wireless energy harvesting relays | |
Dai et al. | Energy-efficient resource allocation for energy harvesting-based device-to-device communication | |
Wu et al. | QoE-based distributed multichannel allocation in 5G heterogeneous cellular networks: A matching-coalitional game solution | |
CN106792451B (en) | D2D communication resource optimization method based on multi-population genetic algorithm | |
CN113316154A (en) | Authorized and unauthorized D2D communication resource joint intelligent distribution method | |
CN110191489B (en) | Resource allocation method and device based on reinforcement learning in ultra-dense network | |
Yu et al. | Interference coordination strategy based on Nash bargaining for small‐cell networks | |
Han et al. | Power allocation for device-to-device underlay communication with femtocell using stackelberg game | |
Mach et al. | Power allocation, channel reuse, and positioning of flying base stations with realistic backhaul | |
Banitalebi et al. | Distributed learning-based resource allocation for self-organizing c-v2x communication in cellular networks | |
Gour et al. | Joint uplink–downlink resource allocation for energy efficient D2D underlaying cellular networks with many-to-one matching | |
Venkateswararao et al. | Traffic aware sleeping strategies for small-cell base station in the ultra dense 5G small cell networks | |
Najla et al. | Efficient exploitation of radio frequency and visible light communication bands for D2D in mobile networks | |
Su et al. | User-centric base station clustering and resource allocation for cell-edge users in 6G ultra-dense networks | |
Eliodorou et al. | User association coalition games with zero-forcing beamforming and NOMA | |
Dun et al. | The distributed resource allocation for D2D communication with game theory | |
Elsayed et al. | Distributed interference management using Q-Learning in Cognitive Femtocell networks: New USRP-based Implementation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |