CN112351433B - Heterogeneous network resource allocation method based on reinforcement learning - Google Patents

Heterogeneous network resource allocation method based on reinforcement learning Download PDF

Info

Publication number
CN112351433B
CN112351433B CN202110006111.3A CN202110006111A CN112351433B CN 112351433 B CN112351433 B CN 112351433B CN 202110006111 A CN202110006111 A CN 202110006111A CN 112351433 B CN112351433 B CN 112351433B
Authority
CN
China
Prior art keywords
base station
network
user
state
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110006111.3A
Other languages
Chinese (zh)
Other versions
CN112351433A (en
Inventor
孙君
吴锡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110006111.3A priority Critical patent/CN112351433B/en
Publication of CN112351433A publication Critical patent/CN112351433A/en
Application granted granted Critical
Publication of CN112351433B publication Critical patent/CN112351433B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/02Resource partitioning among network components, e.g. reuse partitioning
    • H04W16/10Dynamic resource partitioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/24TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
    • H04W52/243TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account interferences
    • H04W52/244Interferences in heterogeneous networks, e.g. among macro and femto or pico cells or other sector / system interference [OSI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/30TPC using constraints in the total amount of available transmission power
    • H04W52/34TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading
    • H04W52/346TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading distributing total power among users or channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0453Resources in frequency domain, e.g. a carrier in FDMA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0473Wireless resource allocation based on the type of the allocated resource the resource being transmission power

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a heterogeneous network resource allocation method based on reinforcement learning, which comprises the steps of firstly deploying a DNN framework on each base station, wherein the DNN framework is based on an ADMM algorithm and takes channel information as the weight of a network; giving an optimal resource allocation strategy in the current state according to data obtained by a base station, namely current user association information and average interference power; regarding each base station as an independent subject, and regarding the state of the base station as a modeling environment; a plurality of agent programs observe the same heterogeneous network environment and take action, and meanwhile, the agent programs are communicated with each other through the awards of the environment; the agent adjusts the policy according to the reward; the resource allocation method provided by the invention is based on the deep learning network, can provide a resource allocation scheme without all CSI information, considers the spectrum efficiency at the same time, sets the spectrum efficiency function as the reward of an agent, and can ensure the spectrum efficiency while ensuring the system throughput.

Description

Heterogeneous network resource allocation method based on reinforcement learning
Technical Field
The invention relates to the technical field of wireless communication, in particular to a heterogeneous network resource allocation method based on reinforcement learning.
Background
With the rapid growth of mobile devices and the emergence of the internet of things, next generation wireless networks face a great challenge to the proliferation of wireless applications. The most promising solution is to augment existing cellular networks with pico and femto cells with various transmission powers and coverage areas. These heterogeneous networks (hetnets) may transfer User Equipments (UEs) from Macro Base Stations (MBS) to Pico Base Stations (PBS), with different transmission powers and coverage. In addition, to achieve high spectral efficiency of the heterogeneous network, the PBS may reuse the MBS and share the same channel with the MBS. Therefore, heterogeneous networks are considered as a good strategy to increase the capacity of future wireless communication systems. There are some optimization problems in such heterogeneous networks, such as spectrum allocation and resource allocation. Recent studies have proposed new methods such as the game theory method, the linear programming method and the markov approximation strategy. However, these methods require almost complete information, which may not be generally available. Thus, it is challenging for the above-described approaches to achieve an optimal solution without such complete information.
Disclosure of Invention
The invention provides a dynamic resource allocation scheme aiming at the problem of downlink resource allocation in a heterogeneous cellular network. In particular, dynamic power allocation and channel allocation strategies are provided for the base station. To improve spectral efficiency, energy efficiency in heterogeneous cellular networks, an optimization framework based on Deep Neural Networks (DNN) is first composed of a series of multiplier alternating direction method (ADMM) iterative processes, making Channel State Information (CSI) the weight of learning. And applying a Deep Reinforcement Learning (DRL) framework to obtain a resource allocation scheme with Spectrum Efficiency (SE) and Energy Efficiency (EE).
A heterogeneous network resource allocation method based on reinforcement learning is characterized in that in a downlink of a heterogeneous network with M base stations and N mobile users, a macro base station MBS has
Figure 551708DEST_PATH_IMAGE001
A micro base station PBS has
Figure 682475DEST_PATH_IMAGE002
And satisfy
Figure 8283DEST_PATH_IMAGE003
Setting up
Figure 794973DEST_PATH_IMAGE004
Indicating a base stationmAnd the usernThe association relationship between the two or more of the three,
Figure 44689DEST_PATH_IMAGE005
indicating a base stationmAnd the usernAssociating;
Figure 29962DEST_PATH_IMAGE006
indicating a base stationmAnd the usernIrrelevant;
setting up
Figure 169082DEST_PATH_IMAGE007
Indicating the state of the spectrum when the user is presentnAnd sub-carrierkBase station ofmAssociated, spectral state
Figure 239806DEST_PATH_IMAGE007
The following rules were used to determine:
Figure 293213DEST_PATH_IMAGE008
representing a usernUsing sub-carriersk
Figure 319943DEST_PATH_IMAGE009
Representing a usernNot using sub-carriersk
Setting up
Figure 738286DEST_PATH_IMAGE010
Representing a usernAnd a base stationmOn the sub-carrierkA transmission power of; the method comprises the following specific steps:
Figure 30727DEST_PATH_IMAGE011
indicating that the total transmit power of each cell base station should be at a preset power limit
Figure 887825DEST_PATH_IMAGE012
Below;
representing time slots using block fading modelstUser's devicenTo the base stationmThe downlink channel gains of (c) are as follows:
Figure 282246DEST_PATH_IMAGE013
wherein
Figure 933807DEST_PATH_IMAGE014
Representation including path loss and lognormal shadingThe large-scale fading component follows a Jakes fading model; small scale Rayleigh fading component
Figure 916807DEST_PATH_IMAGE015
Expressed as a first order gaussian-markov process:
Figure 312016DEST_PATH_IMAGE016
wherein
Figure 47759DEST_PATH_IMAGE017
Are independent and have uniformly distributed circularly symmetric complex gaussian random variables of unit variance;
Figure 135801DEST_PATH_IMAGE018
wherein
Figure 340518DEST_PATH_IMAGE019
Is a first type of zero order bessel function,
Figure 804997DEST_PATH_IMAGE020
is the maximum doppler frequency;
the inter-cell interference ICI experienced when users in different cells are allocated the same sub-carriers is expressed as follows:
Figure 473876DEST_PATH_IMAGE021
wherein
Figure 421234DEST_PATH_IMAGE022
Indicating a base stationmOn the sub-carrierkUser of upper servicen(ii) experienced inter-cell interference;
Figure 909984DEST_PATH_IMAGE023
is represented in sub-carrierskUpper base stationm'To the usern'The transmit power of (a);
Figure 850259DEST_PATH_IMAGE024
is at the sub-carrierkUpper slave base stationm'To the usernThe square of the channel gain of (d); when in use
Figure 373644DEST_PATH_IMAGE025
From the base stationmOn the sub-carrierkUser of upper servicenThe signal to interference plus noise ratio of (c) is as follows:
Figure 724860DEST_PATH_IMAGE026
wherein
Figure 966485DEST_PATH_IMAGE027
Is from the base stationmTo the usernA power of additive white gaussian noise; when the base stationmTo a usernAnd base stationm' To a usern'Are allocated sub-carriers simultaneouslykWhen the temperature of the water is higher than the set temperature,
Figure 444871DEST_PATH_IMAGE028
will interfere with the base stationmTo a usernAnd is and
Figure 822763DEST_PATH_IMAGE029
step S1, deploying a DNN framework for each base station, wherein the DNN framework is based on an ADMM algorithm and takes channel information CSI as heterogeneous network weight; giving an optimal resource allocation strategy in the current state according to the user association information and the average interference power obtained by the base station; in particular, the amount of the solvent to be used,
the spectral efficiency objective optimization function is as follows:
Figure 111924DEST_PATH_IMAGE030
the energy efficiency objective optimization function is as follows:
Figure 575266DEST_PATH_IMAGE031
solving the target optimization function of the frequency spectrum efficiency based on an ADMM algorithm, wherein the augmented Lagrangian function is as follows:
Figure 857343DEST_PATH_IMAGE032
wherein
Figure DEST_PATH_IMAGE033
The values, representing the lagrangian multiplier,
Figure 620900DEST_PATH_IMAGE034
is a penalty parameter; at this time, the spectral efficiency optimization function is expressed as:
Figure 454863DEST_PATH_IMAGE035
by respectively pairing
Figure 671081DEST_PATH_IMAGE036
Finding the deviation
Figure 242002DEST_PATH_IMAGE037
The best solution of (1):
Figure 594486DEST_PATH_IMAGE038
the following can be obtained:
Figure 68193DEST_PATH_IMAGE039
Figure 506127DEST_PATH_IMAGE040
Figure 520220DEST_PATH_IMAGE041
wherein:
Figure 727210DEST_PATH_IMAGE042
Figure 637397DEST_PATH_IMAGE043
Figure 657568DEST_PATH_IMAGE044
Figure 881876DEST_PATH_IMAGE045
stands for ADMM algorithmlThe number of sub-iterations is,
Figure 146635DEST_PATH_IMAGE046
,
Figure 617937DEST_PATH_IMAGE047
,
Figure 764884DEST_PATH_IMAGE048
,
Figure 730566DEST_PATH_IMAGE049
,
Figure 912149DEST_PATH_IMAGE050
,
Figure 55817DEST_PATH_IMAGE052
respectively representlIn the sub-iteration
Figure 955639DEST_PATH_IMAGE053
Figure 459433DEST_PATH_IMAGE054
A value of (d);
step S2, regarding each base station as an independent agent, and taking the state of the base station as a modeling environment; a plurality of agent programs observe the same heterogeneous network environment and take action, and meanwhile, the agent programs are communicated with each other through the awards of the environment; the agent adjusts the policy according to the reward; specifically, the method comprises the following steps:
and (4) state set S: by
Figure 495522DEST_PATH_IMAGE055
A state component comprising
Figure 308626DEST_PATH_IMAGE056
(ii) a States observed by the agent to characterize a heterogeneous network environment
Figure 961325DEST_PATH_IMAGE057
Including user association information
Figure 331126DEST_PATH_IMAGE058
And interference power
Figure 424984DEST_PATH_IMAGE059
Then the heterogeneous network state is represented as:
Figure 222039DEST_PATH_IMAGE060
action set A: based on the current state, the agent is pi-in based on a decision policy
Figure 519290DEST_PATH_IMAGE061
Taking an action; the actions include selecting a subcarrier
Figure 427203DEST_PATH_IMAGE062
And corresponding transmission power
Figure 375568DEST_PATH_IMAGE063
(ii) a Then the action is represented as
Figure 343524DEST_PATH_IMAGE064
Reward: agent computing environment reward after action is taken
Figure 626606DEST_PATH_IMAGE065
(ii) a Defining the energy efficiency function as a reward in the system model:
Figure 338210DEST_PATH_IMAGE066
designing a DNN-based optimization framework, and combining Q learning to generate a strategy pi; wherein the input to the DNN-based optimization framework is the set of observed states S and the output of the DNN-based optimization framework is all executable actions in the set of actions A; each state action pair has a corresponding Q value
Figure 468977DEST_PATH_IMAGE067
(ii) a Each step selects the action that achieves the maximum Q value at each state, as shown below
Figure 545518DEST_PATH_IMAGE068
Updating the Q value according to the Q learning algorithm by the following formula
Figure 863367DEST_PATH_IMAGE069
Figure 67077DEST_PATH_IMAGE070
Wherein
Figure 193296DEST_PATH_IMAGE071
And
Figure DEST_PATH_IMAGE072
learning rate and discount factor, respectively;
Figure 955585DEST_PATH_IMAGE073
represents the next oneThe status of the mobile station is,
Figure 760730DEST_PATH_IMAGE074
is shown in a state
Figure 486240DEST_PATH_IMAGE075
The reward obtained after the action is taken is,
Figure 326020DEST_PATH_IMAGE076
indicating a state
Figure 748956DEST_PATH_IMAGE077
The following executable acts may be performed,
Figure 306976DEST_PATH_IMAGE078
is a set of executable actions;
Figure 836178DEST_PATH_IMAGE079
indicating a state
Figure 796044DEST_PATH_IMAGE080
The value of Q in the following (A),
Figure 368976DEST_PATH_IMAGE081
represents the updated Q value;
Figure 148713DEST_PATH_IMAGE082
indicating a state
Figure 747185DEST_PATH_IMAGE083
Set of executable actions under
Figure 827136DEST_PATH_IMAGE078
Maximum Q value of (1); the loss function in each agent can be expressed as:
Figure 649599DEST_PATH_IMAGE084
wherein
Figure 605048DEST_PATH_IMAGE085
A network parameter indicative of the target network,
Figure DEST_PATH_IMAGE002
a network parameter representing an online network; squaring channel gain
Figure 410510DEST_PATH_IMAGE087
And additive Gaussian noise
Figure 403874DEST_PATH_IMAGE088
As network parameters of the l-th layer;
use of
Figure 345154DEST_PATH_IMAGE089
Greedy policy from online network
Figure 347745DEST_PATH_IMAGE090
In the selection action
Figure 808813DEST_PATH_IMAGE091
Target network
Figure 973078DEST_PATH_IMAGE092
Is an online network
Figure 903119DEST_PATH_IMAGE093
But the network parameters are fixed in the iteration; and replacing the network parameters of the target network with the network parameters in the online network after each iteration.
Further, the resource allocation method based on the ADMM algorithm in step S1 specifically includes the following steps:
step S1.1, updating the currently observed state
Figure 443822DEST_PATH_IMAGE094
Step S1.2, initializing network parameters
Figure 415189DEST_PATH_IMAGE096
Step S1.3, setting threshold value
Figure 15935DEST_PATH_IMAGE097
Maximum number of iterations
Figure 479277DEST_PATH_IMAGE098
Starting iteration; network computing based on DNN
Figure 761354DEST_PATH_IMAGE100
(ii) a When in use
Figure 993752DEST_PATH_IMAGE101
Then output the corresponding
Figure 453814DEST_PATH_IMAGE102
Further, the step S2 obtains an optimal resource allocation scheme by using the ADMM network that uses the channel state information as the network weight, and includes the following specific steps:
step S2.1, initializing the reproduction memory
Figure 404453DEST_PATH_IMAGE103
DQN network parameters
Figure 224641DEST_PATH_IMAGE104
And target network replacement step size
Figure 577125DEST_PATH_IMAGE105
Step S2.2, initializing the on-line network
Figure 706624DEST_PATH_IMAGE106
And
Figure 144559DEST_PATH_IMAGE107
initializing the target network
Figure 768438DEST_PATH_IMAGE108
And make
Figure 975429DEST_PATH_IMAGE109
Step S2.3, setting threshold value
Figure 777294DEST_PATH_IMAGE110
Step S2.4, each agent program is used according to the current state information
Figure 233683DEST_PATH_IMAGE111
Greedy policy selection decision
Figure 457991DEST_PATH_IMAGE112
Step S2.5, updating environment
Figure 457171DEST_PATH_IMAGE113
Receive a reward
Figure 475942DEST_PATH_IMAGE114
Step S2.6, each agent program observes the rewards obtained by all agents
Figure 75420DEST_PATH_IMAGE115
Storing into respective D;
step S2.7, randomly sampling from D, and calculating loss function
Figure 103419DEST_PATH_IMAGE116
And update
Figure 285001DEST_PATH_IMAGE107
Every other, at
Figure 943516DEST_PATH_IMAGE117
Step updating target network parameters
Figure 577759DEST_PATH_IMAGE109
Up to all generationsThe process meets a threshold or reaches a maximum iteration step.
Compared with the prior art, the invention has the following technical advantages:
(1) when the problem of resource allocation in a heterogeneous network is solved, because the traditional convex optimization method is difficult to provide a resource allocation scheme under the condition of incomplete CSI information, the method can provide the resource allocation scheme without all CSI information based on a deep learning network;
(2) when the resource allocation is considered, the spectrum efficiency is considered at the same time, and the method is applied to the deep reinforcement learning based on model driving, and the deep reinforcement learning method driven by the model is not applied to the resource allocation scheme of the heterogeneous network at present. The spectrum efficiency function is set as the reward of the agent, so that the spectrum efficiency can be ensured while the system throughput is ensured.
Drawings
FIG. 1 is a schematic diagram of a dual-layer heterogeneous cellular network provided by the present invention;
fig. 2 is a structural diagram of a DNN optimization framework based on an ADMM algorithm provided by the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
The dual-layer heterogeneous cellular network shown in FIG. 1 comprises M base stations and N mobile users, wherein the macro base station MBS has
Figure 566706DEST_PATH_IMAGE118
A micro base station PBS has
Figure 868375DEST_PATH_IMAGE119
And satisfy
Figure 697790DEST_PATH_IMAGE003
. Each cell site is located at the center of each cell and authorizes mobile users to be randomly distributed in the cell. It is assumed that there is an overlapping area between every two adjacent small cells. It is assumed that each communication terminal is equipped with an antenna for signal transmission. In order to utilize the radio resources to the maximum extent and avoidWithout trivial details, the frequency reuse factor is set to 1, and in order to avoid intra-cell interference, it is assumed that each user in each cell is allocated only one subcarrier, so that the cells are orthogonal in the same subcarrier for all signals. The N orthogonal subcarriers used in a cell may be reused in each neighboring cell. However, users in the overlapping area are served by the nearest small cell BS and may suffer from severe inter-cell interference (ICI) due to the fact that they may use the same spectral resources.
Setting up
Figure 819330DEST_PATH_IMAGE004
Indicating a base stationmAnd the usernThe association relationship between the two or more of the three,
Figure 110503DEST_PATH_IMAGE005
indicating a base stationmAnd the usernAssociating;
Figure 797836DEST_PATH_IMAGE006
indicating a base stationmAnd the usernIrrelevant;
setting up
Figure 594891DEST_PATH_IMAGE007
Indicating the state of the spectrum when the user is presentnAnd sub-carrierkBase station ofmAssociated, spectral state
Figure 141410DEST_PATH_IMAGE007
The following rules were used to determine:
Figure 49323DEST_PATH_IMAGE008
representing a usernUsing sub-carriersk
Figure 748420DEST_PATH_IMAGE009
Representing a usernNot using sub-carriersk
Setting up
Figure 716376DEST_PATH_IMAGE010
For indicatingHouseholdnAnd a base stationmOn the sub-carrierkA transmission power of; the method comprises the following specific steps:
Figure 750191DEST_PATH_IMAGE011
indicating that the total transmit power of each cell base station should be at a preset power limit
Figure 461795DEST_PATH_IMAGE012
Below;
representing time slots using block fading modelstUser's devicenTo the base stationmThe downlink channel gains of (c) are as follows:
Figure 513934DEST_PATH_IMAGE013
wherein
Figure 652791DEST_PATH_IMAGE014
Representing large-scale fading components including path loss and lognormal shading, and following a Jakes fading model; small scale Rayleigh fading component
Figure 173902DEST_PATH_IMAGE015
Expressed as a first order gaussian-markov process:
Figure 689197DEST_PATH_IMAGE016
wherein
Figure 97307DEST_PATH_IMAGE017
Are independent and have uniformly distributed circularly symmetric complex gaussian random variables of unit variance;
Figure 813590DEST_PATH_IMAGE018
wherein
Figure 884314DEST_PATH_IMAGE019
Is a first type of zero order bessel function,
Figure 124672DEST_PATH_IMAGE020
is the maximum doppler frequency;
the inter-cell interference ICI experienced when users in different cells are allocated the same sub-carriers is expressed as follows:
Figure 964452DEST_PATH_IMAGE021
wherein
Figure 382795DEST_PATH_IMAGE022
Indicating a base stationmOn the sub-carrierkUser of upper servicen(ii) experienced inter-cell interference;
Figure 940815DEST_PATH_IMAGE023
is represented in sub-carrierskUpper base stationm'To the usern'The transmit power of (a);
Figure 232468DEST_PATH_IMAGE024
is at the sub-carrierkUpper slave base stationm'To the usernThe square of the channel gain of (d); when in use
Figure 926754DEST_PATH_IMAGE025
From the base stationmOn the sub-carrierkUser of upper servicenThe signal to interference plus noise ratio of (c) is as follows:
Figure 781578DEST_PATH_IMAGE026
wherein
Figure 561315DEST_PATH_IMAGE027
Is from the base stationmTo the usernA power of additive white gaussian noise; when the base stationmTo a usernAnd base stationm' To a usern'Are allocated sub-carriers simultaneouslykWhen the temperature of the water is higher than the set temperature,
Figure 143475DEST_PATH_IMAGE120
will interfere with the base stationmTo a usernAnd is and
Figure 957847DEST_PATH_IMAGE029
the embodiment of the invention is divided into two parts, firstly, a DNN frame is deployed for each base station, the DNN frame is based on an ADMM algorithm, and channel information CSI is used as the weight of a heterogeneous network; it is assumed that the long term average interference power received by each UE can be estimated and fed back to the serving base station through a feedback channel. This information exchange requires very limited resources to be obtained with very low frequency compared to the required signal CSI. Giving an optimal resource allocation strategy in the current state according to the user association information and the average interference power obtained by the base station; in particular, the amount of the solvent to be used,
deploying a DNN framework for each base station, wherein the DNN framework is based on an ADMM algorithm and takes channel information CSI as heterogeneous network weight; giving an optimal resource allocation strategy in the current state according to the user association information and the average interference power obtained by the base station; in particular, the amount of the solvent to be used,
the spectral efficiency objective optimization function is as follows:
Figure 717993DEST_PATH_IMAGE030
the energy efficiency objective optimization function is as follows:
Figure 250605DEST_PATH_IMAGE031
solving the target optimization function of the frequency spectrum efficiency based on an ADMM algorithm, wherein the augmented Lagrangian function is as follows:
Figure 449505DEST_PATH_IMAGE032
wherein
Figure 806800DEST_PATH_IMAGE033
The values, representing the lagrangian multiplier,
Figure 800163DEST_PATH_IMAGE034
is a penalty parameter; at this time, the spectral efficiency optimization function is expressed as:
Figure 757755DEST_PATH_IMAGE035
by respectively pairing
Figure 494767DEST_PATH_IMAGE036
Finding the deviation
Figure 18152DEST_PATH_IMAGE037
The best solution of (1):
Figure 369368DEST_PATH_IMAGE038
the following can be obtained:
Figure 610993DEST_PATH_IMAGE039
Figure 823800DEST_PATH_IMAGE040
Figure 421266DEST_PATH_IMAGE041
wherein:
Figure 756432DEST_PATH_IMAGE042
Figure 157458DEST_PATH_IMAGE043
Figure 892064DEST_PATH_IMAGE044
Figure 390042DEST_PATH_IMAGE045
stands for ADMM algorithmlThe number of sub-iterations is,
Figure 99372DEST_PATH_IMAGE046
,
Figure 50010DEST_PATH_IMAGE047
,
Figure 620931DEST_PATH_IMAGE048
,
Figure 973415DEST_PATH_IMAGE049
,
Figure 853646DEST_PATH_IMAGE050
,
Figure 291581DEST_PATH_IMAGE121
respectively representlIn the sub-iteration
Figure 712198DEST_PATH_IMAGE053
Figure 371718DEST_PATH_IMAGE054
The value of (c).
The DNN-based optimization framework shown in fig. 2 includes neurons corresponding to different operations in the ADMM iteration process, and directed edges corresponding to the data flow between the operations. Thus, the first of the DNN-based optimization frameworkskLayer corresponds to the second of ADMM procedurekAnd (6) iteration. Upon entering the DNN-based optimization framework, the input data flows through multiple layers of repetition, which correspond to successive iterations in the ADMM. DNN-based when convergence conditions are satisfiedThe optimization framework will generate resource allocation results. Specifically, the resource allocation method based on the ADMM algorithm comprises the following specific steps:
step S1.1, updating the currently observed state
Figure 485168DEST_PATH_IMAGE094
Step S1.2, initializing network parameters
Figure 348082DEST_PATH_IMAGE122
Step S1.3, setting threshold value
Figure 572390DEST_PATH_IMAGE097
Maximum number of iterations
Figure 322302DEST_PATH_IMAGE098
Starting iteration; network computing based on DNN
Figure 137811DEST_PATH_IMAGE123
(ii) a When in use
Figure 550338DEST_PATH_IMAGE101
Then output the corresponding
Figure 516020DEST_PATH_IMAGE102
The second part is that each base station is regarded as an independent subject, and the state of the base station is used as a modeling environment; a plurality of agent programs observe the same heterogeneous network environment and take action, and meanwhile, the agent programs are communicated with each other through the awards of the environment; the agent adjusts the policy according to the reward; specifically, the method comprises the following steps:
and (4) state set S: by
Figure 697603DEST_PATH_IMAGE055
A state component comprising
Figure 74226DEST_PATH_IMAGE056
(ii) a Observed by the agent for characterizing heterogeneous networksState of the network environment
Figure 974049DEST_PATH_IMAGE057
Including user association information
Figure 477843DEST_PATH_IMAGE058
And interference power
Figure 779511DEST_PATH_IMAGE059
Then the heterogeneous network state is represented as:
Figure 828501DEST_PATH_IMAGE060
action set A: based on the current state, the agent is pi-in based on a decision policy
Figure 215620DEST_PATH_IMAGE061
Taking an action; the actions include selecting a subcarrier
Figure 523104DEST_PATH_IMAGE062
And corresponding transmission power
Figure 413700DEST_PATH_IMAGE063
(ii) a Then the action is represented as
Figure 397705DEST_PATH_IMAGE064
Rewarding: agent computing environment reward after action is taken
Figure 272121DEST_PATH_IMAGE065
(ii) a Defining the energy efficiency function as a reward in the system model:
Figure 445613DEST_PATH_IMAGE066
designing a DNN-based optimization framework, and combining Q learning to generate a strategy pi; wherein the input to the DNN based optimization framework is the set of observed states S, of the DNN based optimization frameworkThe output is all executable actions in action set A; each state action pair has a corresponding Q value
Figure 393977DEST_PATH_IMAGE067
(ii) a Each step selects the action that achieves the maximum Q value at each state, as shown below
Figure 361933DEST_PATH_IMAGE068
Updating the Q value according to the Q learning algorithm by the following formula
Figure 146481DEST_PATH_IMAGE069
Figure 795768DEST_PATH_IMAGE070
Wherein
Figure 660956DEST_PATH_IMAGE071
And
Figure 127709DEST_PATH_IMAGE072
learning rate and discount factor, respectively;
Figure 711137DEST_PATH_IMAGE073
it is shown that in the next state,
Figure 757591DEST_PATH_IMAGE074
is shown in a state
Figure 742864DEST_PATH_IMAGE075
The reward obtained after the action is taken is,
Figure 741038DEST_PATH_IMAGE076
indicating a state
Figure 218287DEST_PATH_IMAGE077
The following executable acts may be performed,
Figure 271694DEST_PATH_IMAGE078
is a set of executable actions;
Figure 298425DEST_PATH_IMAGE079
indicating a state
Figure 779085DEST_PATH_IMAGE080
The value of Q in the following (A),
Figure 9209DEST_PATH_IMAGE081
represents the updated Q value;
Figure 866306DEST_PATH_IMAGE082
indicating a state
Figure 237290DEST_PATH_IMAGE083
Set of executable actions under
Figure 888851DEST_PATH_IMAGE078
Maximum Q value of (1); the loss function in each agent can be expressed as:
Figure 871850DEST_PATH_IMAGE084
wherein
Figure 267060DEST_PATH_IMAGE085
A network parameter indicative of the target network,
Figure DEST_PATH_IMAGE003
a network parameter representing an online network; squaring channel gain
Figure 90845DEST_PATH_IMAGE087
And additive Gaussian noise
Figure 357878DEST_PATH_IMAGE088
As network parameters of the l-th layer;
use of
Figure 25620DEST_PATH_IMAGE089
Greedy policy from online network
Figure 428919DEST_PATH_IMAGE090
In the selection action
Figure 687863DEST_PATH_IMAGE091
Target network
Figure 130607DEST_PATH_IMAGE092
Is an online network
Figure 867619DEST_PATH_IMAGE093
But the network parameters are fixed in the iteration; and replacing the network parameters of the target network with the network parameters in the online network after each iteration.
Specifically, the steps of obtaining the optimal resource allocation scheme by using the ADMM network using the channel state information as the network weight are as follows:
step S2.1, initializing the reproduction memory
Figure 328688DEST_PATH_IMAGE103
DQN network parameters
Figure 758532DEST_PATH_IMAGE104
And target network replacement step size
Figure 921529DEST_PATH_IMAGE105
Step S2.2, initializing the on-line network
Figure 462232DEST_PATH_IMAGE106
And
Figure 43386DEST_PATH_IMAGE107
initializing the target network
Figure 378552DEST_PATH_IMAGE108
And make
Figure 795889DEST_PATH_IMAGE109
Step S2.3, setting threshold value
Figure 874704DEST_PATH_IMAGE110
Step S2.4, each agent program is used according to the current state information
Figure 310364DEST_PATH_IMAGE111
Greedy policy selection decision
Figure 206645DEST_PATH_IMAGE112
Step S2.5, updating environment
Figure 422863DEST_PATH_IMAGE113
Receive a reward
Figure 243051DEST_PATH_IMAGE114
Step S2.6, each agent program observes the rewards obtained by all agents
Figure 595535DEST_PATH_IMAGE115
Storing into respective D;
step S2.7, randomly sampling from D, and calculating loss function
Figure 226499DEST_PATH_IMAGE116
And update
Figure 664433DEST_PATH_IMAGE107
Every other, at
Figure 22733DEST_PATH_IMAGE117
Step updating target network parameters
Figure 495303DEST_PATH_IMAGE109
Up to thenThere are agents that meet the threshold or reach the maximum iteration step.

Claims (3)

1. A heterogeneous network resource allocation method based on reinforcement learning is characterized in that in a downlink of a heterogeneous network with M base stations and N mobile users, a macro base station MBS has
Figure 633172DEST_PATH_IMAGE001
A micro base station PBS has
Figure 460664DEST_PATH_IMAGE002
And satisfy
Figure 795830DEST_PATH_IMAGE003
Setting up
Figure 462435DEST_PATH_IMAGE004
Indicating a base stationmAnd the usernThe association relationship between the two or more of the three,
Figure 541250DEST_PATH_IMAGE005
indicating a base stationmAnd the usernAssociating;
Figure 226178DEST_PATH_IMAGE006
indicating a base stationmAnd the usernIrrelevant;
setting up
Figure 935508DEST_PATH_IMAGE007
Indicating the state of the spectrum when the user is presentnAnd sub-carrierkBase station ofmAssociated, spectral state
Figure 886146DEST_PATH_IMAGE007
The following rules were used to determine:
Figure 768651DEST_PATH_IMAGE008
to representUser' snUsing sub-carriersk
Figure 776928DEST_PATH_IMAGE009
Representing a usernNot using sub-carriersk
Setting up
Figure 719476DEST_PATH_IMAGE010
Representing a usernAnd a base stationmOn the sub-carrierkA transmission power of; the method comprises the following specific steps:
Figure 157410DEST_PATH_IMAGE011
indicating that the total transmit power of each cell base station should be at a preset power limit
Figure 764978DEST_PATH_IMAGE012
Below;
representing time slots using block fading modelstUser's devicenTo the base stationmThe downlink channel gains of (c) are as follows:
Figure 175231DEST_PATH_IMAGE013
wherein
Figure 288680DEST_PATH_IMAGE014
Representing large-scale fading components including path loss and lognormal shading, and following a Jakes fading model; small scale Rayleigh fading component
Figure 403791DEST_PATH_IMAGE015
Expressed as a first order gaussian-markov process:
Figure 628099DEST_PATH_IMAGE016
wherein
Figure 627279DEST_PATH_IMAGE017
Are independent and have uniformly distributed circularly symmetric complex gaussian random variables of unit variance;
Figure 911630DEST_PATH_IMAGE018
wherein
Figure 511108DEST_PATH_IMAGE019
Is a first type of zero order bessel function,
Figure 539107DEST_PATH_IMAGE020
is the maximum doppler frequency;
the inter-cell interference ICI experienced when users in different cells are allocated the same sub-carriers is expressed as follows:
Figure 658372DEST_PATH_IMAGE021
wherein
Figure 316887DEST_PATH_IMAGE022
Indicating a base stationmOn the sub-carrierkUser of upper servicen(ii) experienced inter-cell interference;
Figure 216710DEST_PATH_IMAGE023
is represented in sub-carrierskUpper base stationm'To the usern'The transmit power of (a);
Figure 969771DEST_PATH_IMAGE024
is at the sub-carrierkUpper slave base stationm'To the usernThe square of the channel gain of (d); when in use
Figure 209122DEST_PATH_IMAGE025
From the base stationmOn the sub-carrierWave (wave)kUser of upper servicenThe signal to interference plus noise ratio of (c) is as follows:
Figure 569697DEST_PATH_IMAGE026
wherein
Figure 143766DEST_PATH_IMAGE027
Is from the base stationmTo the usernA power of additive white gaussian noise; when the base stationmTo a usernAnd base stationm' To a usern'Are allocated sub-carriers simultaneouslykWhen the temperature of the water is higher than the set temperature,
Figure 513568DEST_PATH_IMAGE028
will interfere with the base stationmTo a usernAnd is and
Figure 341846DEST_PATH_IMAGE029
step S1, deploying a DNN framework for each base station, wherein the DNN framework is based on an ADMM algorithm and takes channel information CSI as heterogeneous network weight; giving an optimal resource allocation strategy in the current state according to the user association information and the average interference power obtained by the base station; in particular, the amount of the solvent to be used,
the spectral efficiency objective optimization function is as follows:
Figure 322922DEST_PATH_IMAGE030
the energy efficiency objective optimization function is as follows:
Figure 931758DEST_PATH_IMAGE031
solving the target optimization function of the frequency spectrum efficiency based on an ADMM algorithm, wherein the augmented Lagrangian function is as follows:
Figure 777354DEST_PATH_IMAGE032
wherein
Figure 788036DEST_PATH_IMAGE033
The values, representing the lagrangian multiplier,
Figure 677363DEST_PATH_IMAGE034
is a penalty parameter; at this time, the spectral efficiency objective optimization function is expressed as:
Figure 773495DEST_PATH_IMAGE035
by respectively pairing
Figure 688361DEST_PATH_IMAGE036
Finding the deviation
Figure 553549DEST_PATH_IMAGE037
The best solution of (1);
step S2, regarding each base station as an independent agent, and taking the state of the base station as a modeling environment; a plurality of agent programs observe the same heterogeneous network environment and take action, and meanwhile, the agent programs are communicated with each other through the awards of the environment; the agent adjusts the policy according to the reward; specifically, the method comprises the following steps:
and (4) state set S: by
Figure 879357DEST_PATH_IMAGE038
A state component comprising
Figure 400468DEST_PATH_IMAGE039
(ii) a States observed by the agent to characterize a heterogeneous network environment
Figure 650184DEST_PATH_IMAGE040
Including user association information
Figure 822408DEST_PATH_IMAGE041
And interference power
Figure 132167DEST_PATH_IMAGE042
Then the heterogeneous network state is represented as:
Figure 140574DEST_PATH_IMAGE043
action set A: based on the current state, the agent is pi-in based on a decision policy
Figure 915020DEST_PATH_IMAGE044
Taking an action; the actions include selecting a subcarrier
Figure 754800DEST_PATH_IMAGE045
And corresponding transmission power
Figure 173143DEST_PATH_IMAGE046
(ii) a Then the action is represented as
Figure 465584DEST_PATH_IMAGE047
Rewarding: agent computing environment reward after action is taken
Figure 509632DEST_PATH_IMAGE048
(ii) a Defining the energy efficiency function as a reward in the system model:
Figure 141602DEST_PATH_IMAGE049
designing a DNN-based optimization framework, and combining Q learning to generate a strategy pi; wherein the input to the DNN-based optimization framework is the set of observed states S and the output of the DNN-based optimization framework is all executable actions in the set of actions A; each stateThe action pairs all have corresponding Q values
Figure 793163DEST_PATH_IMAGE050
(ii) a Each step selects the action that achieves the maximum Q value at each state, as shown below
Figure 25430DEST_PATH_IMAGE051
Updating the Q value according to the Q learning algorithm by the following formula
Figure 420639DEST_PATH_IMAGE052
Figure 907115DEST_PATH_IMAGE053
Wherein
Figure 995157DEST_PATH_IMAGE054
And
Figure 449141DEST_PATH_IMAGE055
learning rate and discount factor, respectively;
Figure 116883DEST_PATH_IMAGE056
it is shown that in the next state,
Figure 520182DEST_PATH_IMAGE057
is shown in a state
Figure 716808DEST_PATH_IMAGE058
The reward obtained after the action is taken is,
Figure 471138DEST_PATH_IMAGE059
indicating a state
Figure 208150DEST_PATH_IMAGE060
The following executable acts may be performed,
Figure 938993DEST_PATH_IMAGE061
is a set of executable actions;
Figure 306521DEST_PATH_IMAGE062
indicating a state
Figure 282567DEST_PATH_IMAGE063
The value of Q in the following (A),
Figure 10221DEST_PATH_IMAGE064
represents the updated Q value;
Figure 653691DEST_PATH_IMAGE065
indicating a state
Figure 926541DEST_PATH_IMAGE066
Set of executable actions under
Figure 655463DEST_PATH_IMAGE061
Maximum Q value of (1); the loss function in each agent can be expressed as:
Figure 921228DEST_PATH_IMAGE067
wherein
Figure 153626DEST_PATH_IMAGE068
A network parameter indicative of the target network,
Figure 659694DEST_PATH_IMAGE069
a network parameter representing an online network; squaring channel gain
Figure 62862DEST_PATH_IMAGE070
And additive Gaussian noise
Figure 945367DEST_PATH_IMAGE071
As a network parameter of the l-th layer, wherein
Figure 235534DEST_PATH_IMAGE072
Stands for ADMM algorithmlPerforming secondary iteration;
use of
Figure 178083DEST_PATH_IMAGE073
Greedy policy from online network
Figure 805898DEST_PATH_IMAGE074
In the selection action
Figure 226515DEST_PATH_IMAGE075
Target network
Figure 636767DEST_PATH_IMAGE076
Is an online network
Figure 484638DEST_PATH_IMAGE077
But the network parameters are fixed in the iteration; and replacing the network parameters of the target network with the network parameters in the online network after each iteration.
2. The reinforcement learning-based heterogeneous network resource allocation method according to claim 1, wherein the resource allocation method based on the ADMM algorithm in step S1 specifically includes the following steps:
step S1.1, updating the currently observed state
Figure 596819DEST_PATH_IMAGE078
Step S1.2, initializing network parameters
Figure 821127DEST_PATH_IMAGE079
Step S1.3, setting threshold value
Figure 85886DEST_PATH_IMAGE080
Maximum number of iterations
Figure 839079DEST_PATH_IMAGE081
Starting iteration; network computing based on DNN
Figure 251605DEST_PATH_IMAGE082
(ii) a When in use
Figure 200976DEST_PATH_IMAGE083
Then output the corresponding
Figure 382558DEST_PATH_IMAGE084
3. The method according to claim 1, wherein the step S2 obtains an optimal resource allocation scheme using an ADMM network that uses channel state information as a network weight, and includes the following steps:
step S2.1, initializing the reproduction memory
Figure 775494DEST_PATH_IMAGE085
DQN network parameters
Figure 675316DEST_PATH_IMAGE069
And target network replacement step size
Figure 428378DEST_PATH_IMAGE086
Step S2.2, initializing the on-line network
Figure 464467DEST_PATH_IMAGE087
And
Figure 28303DEST_PATH_IMAGE088
initializing the target network
Figure 415422DEST_PATH_IMAGE089
And make
Figure 703665DEST_PATH_IMAGE090
Step S2.3, setting threshold value
Figure 859840DEST_PATH_IMAGE091
Step S2.4, each agent program is used according to the current state information
Figure 594578DEST_PATH_IMAGE092
Greedy policy selection decision
Figure 921523DEST_PATH_IMAGE093
Step S2.5, updating environment
Figure 95016DEST_PATH_IMAGE094
Receive a reward
Figure 43380DEST_PATH_IMAGE095
Step S2.6, each agent program observes the rewards obtained by all agents
Figure 198287DEST_PATH_IMAGE096
Storing into respective D;
step S2.7, randomly sampling from D, and calculating loss function
Figure 294419DEST_PATH_IMAGE097
And update
Figure 740443DEST_PATH_IMAGE088
Every other, at
Figure 808894DEST_PATH_IMAGE098
Step updating target network parameters
Figure 947751DEST_PATH_IMAGE090
Until all agents meet a threshold or a maximum iteration step is reached.
CN202110006111.3A 2021-01-05 2021-01-05 Heterogeneous network resource allocation method based on reinforcement learning Active CN112351433B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110006111.3A CN112351433B (en) 2021-01-05 2021-01-05 Heterogeneous network resource allocation method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110006111.3A CN112351433B (en) 2021-01-05 2021-01-05 Heterogeneous network resource allocation method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN112351433A CN112351433A (en) 2021-02-09
CN112351433B true CN112351433B (en) 2021-05-25

Family

ID=74427832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110006111.3A Active CN112351433B (en) 2021-01-05 2021-01-05 Heterogeneous network resource allocation method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN112351433B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113242602B (en) * 2021-05-10 2022-04-22 内蒙古大学 Millimeter wave large-scale MIMO-NOMA system resource allocation method and system
CN113162682B (en) * 2021-05-13 2022-06-24 重庆邮电大学 PD-NOMA-based multi-beam LEO satellite system resource allocation method
CN113473580B (en) * 2021-05-14 2024-04-26 南京信息工程大学滨江学院 User association joint power distribution method based on deep learning in heterogeneous network
CN113613301B (en) * 2021-08-04 2022-05-13 北京航空航天大学 Air-ground integrated network intelligent switching method based on DQN
CN114116156B (en) * 2021-10-18 2022-09-09 武汉理工大学 Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method
CN114205899B (en) * 2022-01-18 2023-04-07 电子科技大学 Heterogeneous network high-energy-efficiency power control method based on deep reinforcement learning
CN114340017B (en) * 2022-03-17 2022-06-07 山东科技大学 Heterogeneous network resource slicing method with eMBB and URLLC mixed service

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102238631A (en) * 2011-08-17 2011-11-09 南京邮电大学 Method for managing heterogeneous network resources based on reinforcement learning
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN108521673A (en) * 2018-04-09 2018-09-11 湖北工业大学 Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network
CN108848561A (en) * 2018-04-11 2018-11-20 湖北工业大学 A kind of isomery cellular network combined optimization method based on deeply study

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3072851B1 (en) * 2017-10-23 2019-11-15 Commissariat A L'energie Atomique Et Aux Energies Alternatives REALIZING LEARNING TRANSMISSION RESOURCE ALLOCATION METHOD

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102238631A (en) * 2011-08-17 2011-11-09 南京邮电大学 Method for managing heterogeneous network resources based on reinforcement learning
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN108521673A (en) * 2018-04-09 2018-09-11 湖北工业大学 Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network
CN108848561A (en) * 2018-04-11 2018-11-20 湖北工业大学 A kind of isomery cellular network combined optimization method based on deeply study

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于强化学习的异构无线网络资源管理算法;冯陈伟 等;《电信科学》;20150831;第1-8页 *
基于深度强化学习的异构云无线接入网自适应无线资源分配算法;陈前斌 等;《电子与信息学报》;20200630;第1468-1476页 *

Also Published As

Publication number Publication date
CN112351433A (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN112351433B (en) Heterogeneous network resource allocation method based on reinforcement learning
Xie et al. Energy-efficient resource allocation for heterogeneous cognitive radio networks with femtocells
Alqerm et al. Sophisticated online learning scheme for green resource allocation in 5G heterogeneous cloud radio access networks
Zappone et al. User association and load balancing for massive MIMO through deep learning
Samarakoon et al. Backhaul-aware interference management in the uplink of wireless small cell networks
Nasir et al. Joint resource optimization for multicell networks with wireless energy harvesting relays
CN107426773B (en) Energy efficiency-oriented distributed resource allocation method and device in wireless heterogeneous network
Dai et al. Energy-efficient resource allocation for energy harvesting-based device-to-device communication
Tsiropoulou et al. On the problem of optimal cell selection and uplink power control in open access multi-service two-tier femtocell networks
Wu et al. QoE-based distributed multichannel allocation in 5G heterogeneous cellular networks: A matching-coalitional game solution
CN106792451B (en) D2D communication resource optimization method based on multi-population genetic algorithm
CN110191489B (en) Resource allocation method and device based on reinforcement learning in ultra-dense network
Han et al. Power allocation for device-to-device underlay communication with femtocell using stackelberg game
Yu et al. Interference coordination strategy based on Nash bargaining for small‐cell networks
Mach et al. Power allocation, channel reuse, and positioning of flying base stations with realistic backhaul
Baniasadi et al. Power control for D2D underlay cellular communication: Game theory approach
Najeh Joint mode selection and power control for D2D underlaid cellular networks
Su et al. User-centric base station clustering and resource allocation for cell-edge users in 6G ultra-dense networks
CN111343721B (en) D2D distributed resource allocation method for maximizing generalized energy efficiency of system
Venkateswararao et al. Traffic aware sleeping strategies for small-cell base station in the ultra dense 5G small cell networks
Pantisano et al. On the dynamic formation of cooperative multipoint transmissions in small cell networks
Eliodorou et al. User association coalition games with zero-forcing beamforming and NOMA
Dun et al. The distributed resource allocation for D2D communication with game theory
Sandoval et al. Indoor planning and optimization of LTE-U radio access over WiFi
Banitalebi et al. Distributed Learning-Based Resource Allocation for Self-Organizing C-V2X Communication in Cellular Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant