CN112351433A - Heterogeneous network resource allocation method based on reinforcement learning - Google Patents

Heterogeneous network resource allocation method based on reinforcement learning Download PDF

Info

Publication number
CN112351433A
CN112351433A CN202110006111.3A CN202110006111A CN112351433A CN 112351433 A CN112351433 A CN 112351433A CN 202110006111 A CN202110006111 A CN 202110006111A CN 112351433 A CN112351433 A CN 112351433A
Authority
CN
China
Prior art keywords
base station
user
state
network
resource allocation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110006111.3A
Other languages
Chinese (zh)
Other versions
CN112351433B (en
Inventor
孙君
吴锡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110006111.3A priority Critical patent/CN112351433B/en
Publication of CN112351433A publication Critical patent/CN112351433A/en
Application granted granted Critical
Publication of CN112351433B publication Critical patent/CN112351433B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/02Resource partitioning among network components, e.g. reuse partitioning
    • H04W16/10Dynamic resource partitioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/24TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
    • H04W52/243TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account interferences
    • H04W52/244Interferences in heterogeneous networks, e.g. among macro and femto or pico cells or other sector / system interference [OSI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/30TPC using constraints in the total amount of available transmission power
    • H04W52/34TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading
    • H04W52/346TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading distributing total power among users or channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0453Resources in frequency domain, e.g. a carrier in FDMA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0473Wireless resource allocation based on the type of the allocated resource the resource being transmission power

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a heterogeneous network resource allocation method based on reinforcement learning, which comprises the steps of firstly deploying a DNN framework on each base station, wherein the DNN framework is based on an ADMM algorithm and takes channel information as the weight of a network; giving an optimal resource allocation strategy in the current state according to data obtained by a base station, namely current user association information and average interference power; regarding each base station as an independent subject, and regarding the state of the base station as a modeling environment; a plurality of agent programs observe the same heterogeneous network environment and take action, and meanwhile, the agent programs are communicated with each other through the awards of the environment; the agent adjusts the policy according to the reward; the resource allocation method provided by the invention is based on the deep learning network, can provide a resource allocation scheme without all CSI information, considers the spectrum efficiency at the same time, sets the spectrum efficiency function as the reward of an agent, and can ensure the spectrum efficiency while ensuring the system throughput.

Description

Heterogeneous network resource allocation method based on reinforcement learning
Technical Field
The invention relates to the technical field of wireless communication, in particular to a heterogeneous network resource allocation method based on reinforcement learning.
Background
With the rapid growth of mobile devices and the emergence of the internet of things, next generation wireless networks face a great challenge to the proliferation of wireless applications. The most promising solution is to augment existing cellular networks with pico and femto cells with various transmission powers and coverage areas. These heterogeneous networks (hetnets) may transfer User Equipments (UEs) from Macro Base Stations (MBS) to Pico Base Stations (PBS), with different transmission powers and coverage. In addition, to achieve high spectral efficiency of the heterogeneous network, the PBS may reuse the MBS and share the same channel with the MBS. Therefore, heterogeneous networks are considered as a good strategy to increase the capacity of future wireless communication systems. There are some optimization problems in such heterogeneous networks, such as spectrum allocation and resource allocation. Recent studies have proposed new methods such as the game theory method, the linear programming method and the markov approximation strategy. However, these methods require almost complete information, which may not be generally available. Thus, it is challenging for the above-described approaches to achieve an optimal solution without such complete information.
Disclosure of Invention
The invention provides a dynamic resource allocation scheme aiming at the problem of downlink resource allocation in a heterogeneous cellular network. In particular, dynamic power allocation and channel allocation strategies are provided for the base station. To improve spectral efficiency, energy efficiency in heterogeneous cellular networks, an optimization framework based on Deep Neural Networks (DNN) is first composed of a series of multiplier alternating direction method (ADMM) iterative processes, making Channel State Information (CSI) the weight of learning. And applying a Deep Reinforcement Learning (DRL) framework to obtain a resource allocation scheme with Spectrum Efficiency (SE) and Energy Efficiency (EE).
In the downlink of a heterogeneous network with M base stations and N mobile users, the macro base station MBS has
Figure 471131DEST_PATH_IMAGE001
A micro base station PBS has
Figure 606577DEST_PATH_IMAGE002
And satisfy
Figure 685392DEST_PATH_IMAGE003
Setting up
Figure 104741DEST_PATH_IMAGE004
Indicating a base stationmAnd the usernThe association relationship between the two or more of the three,
Figure 345229DEST_PATH_IMAGE005
indicating a base stationmAnd the usernAssociating;
Figure 482818DEST_PATH_IMAGE006
indicating a base stationmAnd the usernIrrelevant;
setting up
Figure 99744DEST_PATH_IMAGE007
Indicating the state of the spectrum when the user is presentnAnd sub-carrierkBase station ofmAssociated, spectral state
Figure 389911DEST_PATH_IMAGE008
The following rules may be used to determine:
Figure 5830DEST_PATH_IMAGE009
representing a useriUsing channelsk
Figure 178185DEST_PATH_IMAGE010
Indicating that the user is not using the channelk
Setting up
Figure 785753DEST_PATH_IMAGE011
Representing a usernAnd a base stationmOn-channelkA transmission power of; the method comprises the following specific steps:
Figure 930427DEST_PATH_IMAGE012
indicating that the total transmit power of each cell base station should be at a preset power limit
Figure 512718DEST_PATH_IMAGE013
Below;
representing time slots using block fading modelstUser's devicenTo the base stationmThe downlink channel gains of (c) are as follows:
Figure 421637DEST_PATH_IMAGE014
wherein
Figure 318048DEST_PATH_IMAGE015
Representing large-scale fading components including path loss and lognormal shading, and following a Jakes fading model; small scale Rayleigh fading component
Figure 379545DEST_PATH_IMAGE016
Expressed as a first order gaussian-markov process:
Figure 585268DEST_PATH_IMAGE017
wherein
Figure 466636DEST_PATH_IMAGE018
Are independent and have uniformly distributed circularly symmetric complex gaussian random variables of unit variance;
Figure 416006DEST_PATH_IMAGE019
wherein
Figure 332010DEST_PATH_IMAGE020
Is a first type of zero order bessel function,
Figure 459366DEST_PATH_IMAGE021
is the maximum doppler frequency;
the inter-cell interference ICI experienced when users in different cells are allocated the same sub-carriers is expressed as follows:
Figure 77298DEST_PATH_IMAGE022
wherein
Figure 581091DEST_PATH_IMAGE023
Indicating a base stationmOn the sub-carriernUser of upper servicek(ii) experienced inter-cell interference;
Figure 86022DEST_PATH_IMAGE024
is represented in sub-carriersnUpper base stationm' To the usern' The transmission power of the antenna is set to be,
Figure 695864DEST_PATH_IMAGE025
is at the sub-carrierkUpper slave base stationm' To the usernThe square of the channel gain of (d); when in use
Figure 489507DEST_PATH_IMAGE026
From the base stationmOn the sub-carriernUser of upper servicekThe signal to interference plus noise ratio of (c) is as follows:
Figure 46260DEST_PATH_IMAGE027
wherein
Figure 140118DEST_PATH_IMAGE028
Is from the base stationmTo the usernA power of additive white gaussian noise; when the base stationmTo a usernAnd base stationm' To a usern'Are allocated sub-carriers simultaneouslykWhen the temperature of the water is higher than the set temperature,
Figure 671593DEST_PATH_IMAGE029
will interfere with the base stationmTo a usernAnd is and
Figure 473239DEST_PATH_IMAGE030
step S1, deploying a DNN framework for each base station, wherein the DNN framework is based on an ADMM algorithm and takes channel information CSI as heterogeneous network weight; giving an optimal resource allocation strategy in the current state according to the user association information and the average interference power obtained by the base station; in particular, the amount of the solvent to be used,
the spectral efficiency objective optimization function is as follows:
Figure 53256DEST_PATH_IMAGE031
the energy efficiency objective optimization function is as follows:
Figure 47626DEST_PATH_IMAGE032
the multi-objective optimization function is solved based on an ADMM algorithm, and the augmented Lagrange function is as follows:
Figure 687686DEST_PATH_IMAGE033
Figure 236347DEST_PATH_IMAGE034
wherein
Figure 915328DEST_PATH_IMAGE035
The values, representing the lagrangian multiplier,
Figure 842833DEST_PATH_IMAGE036
is a penalty parameter; at this time, the unconstrained optimization problem can be expressed as:
Figure 162782DEST_PATH_IMAGE037
by respectively pairing
Figure 418314DEST_PATH_IMAGE038
Find out by deviational derivation
Figure 589401DEST_PATH_IMAGE039
The best solution of (1):
Figure 276472DEST_PATH_IMAGE040
the following can be obtained:
Figure 992755DEST_PATH_IMAGE041
Figure 984851DEST_PATH_IMAGE042
Figure 975940DEST_PATH_IMAGE043
wherein:
Figure 596147DEST_PATH_IMAGE044
,
Figure 941720DEST_PATH_IMAGE045
,
Figure 686691DEST_PATH_IMAGE046
step S2, regarding each base station as an independent subject, and regarding the state of the base station as a modeling environment; a plurality of agent programs observe the same heterogeneous network environment and take action, and meanwhile, the agent programs are communicated with each other through the awards of the environment; the agent adjusts the policy according to the reward; specifically, the method comprises the following steps:
and (4) state set S: by
Figure 215893DEST_PATH_IMAGE047
A state component comprising
Figure 910179DEST_PATH_IMAGE048
(ii) a The state for characterizing the heterogeneous network environment comprises user association information
Figure 217533DEST_PATH_IMAGE049
And interference power
Figure 918641DEST_PATH_IMAGE050
Then the heterogeneous network state is represented as:
Figure 439202DEST_PATH_IMAGE051
action set A: based on the current state, the agent is pi-in based on a decision policy
Figure 440525DEST_PATH_IMAGE052
Taking an action; the actions include selecting a subcarrier
Figure 935091DEST_PATH_IMAGE053
And corresponding transmission power
Figure 389075DEST_PATH_IMAGE054
(ii) a Then the action is represented as
Figure 650292DEST_PATH_IMAGE055
Rewarding: agent computing environment reward after action is taken
Figure 709384DEST_PATH_IMAGE056
(ii) a Defining the energy efficiency function as a reward in the system model:
Figure 889698DEST_PATH_IMAGE057
designing a DNN-based optimization framework, and combining Q learning to generate a strategy pi; wherein the input to the DNN-based optimization framework is the set of observed states S and the output of the DNN-based optimization framework is all executable actions in the set of actions A; each shapeThe dynamic action pairs all have corresponding Q values
Figure 581711DEST_PATH_IMAGE058
(ii) a Each step selects the action that achieves the maximum Q value at each state, as shown below
Figure 505673DEST_PATH_IMAGE059
Updating the Q value according to the Q learning algorithm by the following formula
Figure 966742DEST_PATH_IMAGE058
Figure 448450DEST_PATH_IMAGE060
Wherein
Figure 611447DEST_PATH_IMAGE061
And
Figure 824254DEST_PATH_IMAGE062
learning rate and discount factor, respectively;
Figure 936567DEST_PATH_IMAGE063
and
Figure 255421DEST_PATH_IMAGE064
indicates the next state and is in the state
Figure 390868DEST_PATH_IMAGE065
The reward obtained after the action is taken is,
Figure 469682DEST_PATH_IMAGE066
indicating a state
Figure 623452DEST_PATH_IMAGE067
The following executable acts may be performed,
Figure 863940DEST_PATH_IMAGE068
is a set of executable actions;
Figure 548999DEST_PATH_IMAGE069
indicating a state
Figure 352876DEST_PATH_IMAGE070
The value of Q in the following (A),
Figure 174202DEST_PATH_IMAGE071
indicating a state
Figure 851171DEST_PATH_IMAGE063
Set of executable actions under
Figure 7214DEST_PATH_IMAGE068
Maximum Q value of (1); the loss function in each agent can be expressed as:
Figure 365515DEST_PATH_IMAGE072
wherein
Figure 41346DEST_PATH_IMAGE073
A weight representing the target network; use of
Figure 607326DEST_PATH_IMAGE074
Greedy policy from online network
Figure 258188DEST_PATH_IMAGE075
In the selection action
Figure 544813DEST_PATH_IMAGE076
The target network is
Figure 64699DEST_PATH_IMAGE077
While the weights are fixed, multiple iterations are performed while updating the weights in the online network.
Further, the resource allocation method based on the ADMM algorithm in step S1 specifically includes the following steps:
step S1.1, updating the currently observed state
Figure 21154DEST_PATH_IMAGE078
Step S1.2, initialization
Figure 276424DEST_PATH_IMAGE079
Step S1.3, setting threshold value
Figure 38843DEST_PATH_IMAGE080
Maximum number of iterations
Figure 158109DEST_PATH_IMAGE081
Starting iteration; network computing based on DNN
Figure 534733DEST_PATH_IMAGE082
(ii) a When in use
Figure 903397DEST_PATH_IMAGE083
Then output the corresponding
Figure 656458DEST_PATH_IMAGE084
Further, the step S2 obtains an optimal resource allocation scheme by using the ADMM network that uses the channel state information as the network weight, and includes the following specific steps:
step S2.1, initializing the reproduction memory
Figure 895810DEST_PATH_IMAGE085
DQN network parameters
Figure 443335DEST_PATH_IMAGE086
And target network replacement step size
Figure 502558DEST_PATH_IMAGE087
Step S2.2, initializing the on-line network
Figure 341201DEST_PATH_IMAGE088
And weight
Figure 481064DEST_PATH_IMAGE089
Initializing an online network
Figure 746960DEST_PATH_IMAGE090
And make the weight
Figure 293479DEST_PATH_IMAGE091
Step S2.3, setting threshold value
Figure 530555DEST_PATH_IMAGE092
Step S2.4, each agent program is used according to the current state information
Figure 164405DEST_PATH_IMAGE093
Greedy policy selection decision
Figure 850470DEST_PATH_IMAGE094
Step S2.5, updating environment
Figure 342675DEST_PATH_IMAGE095
Receive a reward
Figure 788699DEST_PATH_IMAGE096
Step S2.6, each agent program observes the rewards obtained by all agents
Figure 575259DEST_PATH_IMAGE097
Storing into respective D;
step S2.7, randomly sampling from D, and calculating loss function
Figure 917378DEST_PATH_IMAGE098
And update the weight
Figure 281232DEST_PATH_IMAGE099
Every other, at
Figure 734210DEST_PATH_IMAGE100
Updating target network parameters
Figure 375276DEST_PATH_IMAGE101
Until all agents meet a threshold or a maximum iteration step is reached.
Compared with the prior art, the invention has the following technical advantages:
(1) when the problem of resource allocation in a heterogeneous network is solved, because the traditional convex optimization method is difficult to provide a resource allocation scheme under the condition of incomplete CSI information, the method can provide the resource allocation scheme without all CSI information based on a deep learning network;
(2) when the resource allocation is considered, the spectrum efficiency is considered at the same time, and the method is applied to the deep reinforcement learning based on model driving, and the deep reinforcement learning method driven by the model is not applied to the resource allocation scheme of the heterogeneous network at present. The spectrum efficiency function is set as the reward of the agent, so that the spectrum efficiency can be ensured while the system throughput is ensured.
Drawings
FIG. 1 is a schematic diagram of a dual-layer heterogeneous cellular network provided by the present invention;
fig. 2 is a structural diagram of a DNN optimization framework based on an ADMM algorithm provided by the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
The dual-layer heterogeneous cellular network shown in FIG. 1 comprises M base stations and N mobile users, wherein the macro base station MBS has
Figure 419456DEST_PATH_IMAGE001
A micro base station PBS has
Figure 411551DEST_PATH_IMAGE002
And satisfy
Figure 402641DEST_PATH_IMAGE003
. Each cell site is located at the center of each cell and authorizes mobile users to be randomly distributed in the cell. It is assumed that there is an overlapping area between every two adjacent small cells. It is assumed that each communication terminal is equipped with an antenna for signal transmission. To maximize the use of radio resources and avoid trivial cases, the frequency reuse factor is set to 1, and to avoid intra-cell interference, it is assumed that each user in each cell is allocated only one subcarrier, so all signals are orthogonal in the same subcarrier. The N orthogonal subcarriers used in a cell may be reused in each neighboring cell. However, users in the overlapping area are served by the nearest small cell BS and may suffer from severe inter-cell interference (ICI) due to the fact that they may use the same spectral resources.
Setting up
Figure 976842DEST_PATH_IMAGE004
Indicating a base stationmAnd the usernThe association relationship between the two or more of the three,
Figure 378873DEST_PATH_IMAGE005
indicating a base stationmAnd the usernAssociating;
Figure 405735DEST_PATH_IMAGE006
indicating a base stationmAnd the usernIrrelevant;
setting up
Figure 731674DEST_PATH_IMAGE007
Indicating the state of the spectrum when the user is presentnAnd sub-carrierkBase station ofmAssociated, spectral state
Figure 144070DEST_PATH_IMAGE008
The following rules may be used to determine:
Figure 733314DEST_PATH_IMAGE009
for indicatingHouseholdiUsing channelsk
Figure 981893DEST_PATH_IMAGE010
Indicating that the user is not using the channelk
Setting up
Figure 354931DEST_PATH_IMAGE011
Representing a usernAnd a base stationmOn-channelkA transmission power of; the method comprises the following specific steps:
Figure 841407DEST_PATH_IMAGE012
indicating that the total transmit power of each cell base station should be at a preset power limit
Figure 132711DEST_PATH_IMAGE013
Below;
representing time slots using block fading modelstUser's devicenTo the base stationmThe downlink channel gains of (c) are as follows:
Figure 117854DEST_PATH_IMAGE014
wherein
Figure 520016DEST_PATH_IMAGE015
Representing large-scale fading components including path loss and lognormal shading, and following a Jakes fading model; small scale Rayleigh fading component
Figure 156272DEST_PATH_IMAGE016
Expressed as a first order gaussian-markov process:
Figure 805428DEST_PATH_IMAGE017
wherein
Figure 673939DEST_PATH_IMAGE018
Are independent and have uniformly distributed circularly symmetric complex gaussian random variables of unit variance;
Figure 863480DEST_PATH_IMAGE019
wherein
Figure 58969DEST_PATH_IMAGE020
Is a first type of zero order bessel function,
Figure 675764DEST_PATH_IMAGE021
is the maximum doppler frequency;
the inter-cell interference ICI experienced when users in different cells are allocated the same sub-carriers is expressed as follows:
Figure 855073DEST_PATH_IMAGE022
wherein
Figure 864617DEST_PATH_IMAGE023
Indicating a base stationmOn the sub-carriernUser of upper servicek(ii) experienced inter-cell interference;
Figure 491777DEST_PATH_IMAGE024
is represented in sub-carriersnUpper base stationm' To the usern' The transmission power of the antenna is set to be,
Figure 13894DEST_PATH_IMAGE025
is at the sub-carrierkUpper slave base stationm' To the usernThe square of the channel gain of (d); when in use
Figure 477236DEST_PATH_IMAGE026
From the base stationmOn the sub-carriernUser of upper servicekThe signal to interference plus noise ratio of (c) is as follows:
Figure 743001DEST_PATH_IMAGE027
wherein
Figure 178662DEST_PATH_IMAGE028
Is from the base stationmTo the usernA power of additive white gaussian noise; when the base stationmTo a usernAnd base stationm' To a usern'Are allocated sub-carriers simultaneouslykWhen the temperature of the water is higher than the set temperature,
Figure 889258DEST_PATH_IMAGE029
will interfere with the base stationmTo a usernAnd is and
Figure 777580DEST_PATH_IMAGE030
the embodiment of the invention is divided into two parts, firstly, a DNN frame is deployed for each base station, the DNN frame is based on an ADMM algorithm, and channel information CSI is used as the weight of a heterogeneous network; it is assumed that the long term average interference power received by each UE can be estimated and fed back to the serving base station through a feedback channel. This information exchange requires very limited resources to be obtained with very low frequency compared to the required signal CSI. Giving an optimal resource allocation strategy in the current state according to the user association information and the average interference power obtained by the base station; in particular, the amount of the solvent to be used,
deploying a DNN framework for each base station, wherein the DNN framework is based on an ADMM algorithm and takes channel information CSI as heterogeneous network weight; giving an optimal resource allocation strategy in the current state according to the user association information and the average interference power obtained by the base station; in particular, the amount of the solvent to be used,
the spectral efficiency objective optimization function is as follows:
Figure 394506DEST_PATH_IMAGE031
the energy efficiency objective optimization function is as follows:
Figure 668361DEST_PATH_IMAGE032
the multi-objective optimization function is solved based on an ADMM algorithm, and the augmented Lagrange function is as follows:
Figure 79751DEST_PATH_IMAGE033
Figure 252106DEST_PATH_IMAGE034
wherein
Figure 390833DEST_PATH_IMAGE035
The values, representing the lagrangian multiplier,
Figure 535506DEST_PATH_IMAGE036
is a penalty parameter; at this time, the unconstrained optimization problem can be expressed as:
Figure 773589DEST_PATH_IMAGE037
by respectively pairing
Figure 666197DEST_PATH_IMAGE038
Find out by deviational derivation
Figure 348894DEST_PATH_IMAGE039
The best solution of (1):
Figure 348074DEST_PATH_IMAGE040
the following can be obtained:
Figure 819376DEST_PATH_IMAGE041
Figure 904006DEST_PATH_IMAGE042
Figure 181273DEST_PATH_IMAGE043
wherein:
Figure 487489DEST_PATH_IMAGE044
,
Figure 129692DEST_PATH_IMAGE045
,
Figure 419728DEST_PATH_IMAGE046
the DNN-based optimization framework shown in fig. 2 includes neurons corresponding to different operations in the ADMM iteration process, and directed edges corresponding to the data flow between the operations. Thus, the first of the DNN-based optimization frameworkskLayer corresponds to the second of ADMM procedurekAnd (6) iteration. Upon entering the DNN-based optimization framework, the input data flows through multiple layers of repetition, which correspond to successive iterations in the ADMM. When the convergence condition is satisfied, the DNN-based optimization framework will generate a resource allocation result. Specifically, the resource allocation method based on the ADMM algorithm comprises the following specific steps:
step S1.1, updating the currently observed state
Figure 923522DEST_PATH_IMAGE078
Step S1.2, initialization
Figure 677720DEST_PATH_IMAGE079
Step S1.3, setting threshold value
Figure 179239DEST_PATH_IMAGE080
Maximum number of iterations
Figure 403242DEST_PATH_IMAGE081
Starting iteration; network computing based on DNN
Figure 507464DEST_PATH_IMAGE082
(ii) a When in use
Figure 335743DEST_PATH_IMAGE083
Then output the corresponding
Figure 54169DEST_PATH_IMAGE084
The second part is that each base station is regarded as an independent subject, and the state of the base station is used as a modeling environment; a plurality of agent programs observe the same heterogeneous network environment and take action, and meanwhile, the agent programs are communicated with each other through the awards of the environment; the agent adjusts the policy according to the reward; specifically, the method comprises the following steps:
and (4) state set S: by
Figure 397426DEST_PATH_IMAGE047
A state component comprising
Figure 492290DEST_PATH_IMAGE048
(ii) a The state for characterizing the heterogeneous network environment comprises user association information
Figure 237392DEST_PATH_IMAGE049
And interference power
Figure 533244DEST_PATH_IMAGE050
Then the heterogeneous network state is represented as:
Figure 878643DEST_PATH_IMAGE051
action set A: based on the current state, the agent is pi-in based on a decision policy
Figure 110954DEST_PATH_IMAGE052
Taking an action; the actions include selecting a subcarrier
Figure 287726DEST_PATH_IMAGE053
And corresponding transmission power
Figure 223321DEST_PATH_IMAGE054
(ii) a Then the action is represented as
Figure 993700DEST_PATH_IMAGE055
Rewarding: agent computing environment reward after action is taken
Figure 836891DEST_PATH_IMAGE056
(ii) a Defining the energy efficiency function as a reward in the system model:
Figure 759847DEST_PATH_IMAGE057
designing a DNN-based optimization framework, and combining Q learning to generate a strategy pi; wherein the input to the DNN-based optimization framework is the set of observed states S and the output of the DNN-based optimization framework is all executable actions in the set of actions A; each state action pair has a corresponding Q value
Figure 53294DEST_PATH_IMAGE058
(ii) a Each step selects the action that achieves the maximum Q value at each state, as shown below
Figure 796122DEST_PATH_IMAGE059
Updating the Q value according to the Q learning algorithm by the following formula
Figure 583950DEST_PATH_IMAGE058
Figure 346368DEST_PATH_IMAGE060
Wherein
Figure 482820DEST_PATH_IMAGE061
And
Figure 712944DEST_PATH_IMAGE062
learning rate and discount factor, respectively;
Figure 491413DEST_PATH_IMAGE063
and
Figure 654541DEST_PATH_IMAGE064
indicates the next state and is in the state
Figure 40523DEST_PATH_IMAGE065
The reward obtained after the action is taken is,
Figure 7211DEST_PATH_IMAGE066
indicating a state
Figure 136841DEST_PATH_IMAGE067
The following executable acts may be performed,
Figure 154476DEST_PATH_IMAGE068
is a set of executable actions;
Figure 163889DEST_PATH_IMAGE069
indicating a state
Figure 899764DEST_PATH_IMAGE070
The value of Q in the following (A),
Figure 833085DEST_PATH_IMAGE071
indicating a state
Figure 626597DEST_PATH_IMAGE063
Set of executable actions under
Figure 124355DEST_PATH_IMAGE068
Maximum Q value of (1); the loss function in each agent can be expressed as:
Figure 737739DEST_PATH_IMAGE072
wherein
Figure 786336DEST_PATH_IMAGE073
A weight representing the target network; use of
Figure 247404DEST_PATH_IMAGE074
Greedy policy from online network
Figure 598620DEST_PATH_IMAGE075
In the selection action
Figure 286916DEST_PATH_IMAGE076
The target network is
Figure 748990DEST_PATH_IMAGE077
While the weights are fixed, multiple iterations are performed while updating the weights in the online network.
Specifically, the steps of obtaining the optimal resource allocation scheme by using the ADMM network using the channel state information as the network weight are as follows:
step S2.1, initializing the reproduction memory
Figure 330144DEST_PATH_IMAGE085
DQN network parameters
Figure 648999DEST_PATH_IMAGE086
And target network replacement step size
Figure 784445DEST_PATH_IMAGE087
Step S2.2, initializing the on-line network
Figure 50210DEST_PATH_IMAGE088
And weight
Figure 672821DEST_PATH_IMAGE089
Initialization ofOnline network
Figure 850993DEST_PATH_IMAGE090
And make the weight
Figure 254161DEST_PATH_IMAGE091
Step S2.3, setting threshold value
Figure 871088DEST_PATH_IMAGE092
Step S2.4, each agent program is used according to the current state information
Figure 144943DEST_PATH_IMAGE093
Greedy policy selection decision
Figure 821912DEST_PATH_IMAGE094
Step S2.5, updating environment
Figure 728688DEST_PATH_IMAGE095
Receive a reward
Figure 70677DEST_PATH_IMAGE096
Step S2.6, each agent program observes the rewards obtained by all agents
Figure 12088DEST_PATH_IMAGE097
Storing into respective D;
step S2.7, randomly sampling from D, and calculating loss function
Figure 328800DEST_PATH_IMAGE098
And update the weight
Figure 988451DEST_PATH_IMAGE099
Every other, at
Figure 936728DEST_PATH_IMAGE100
Updating target network parameters
Figure 981913DEST_PATH_IMAGE101
Until all agents meet a threshold or a maximum iteration step is reached.

Claims (3)

1. A heterogeneous network resource allocation method based on reinforcement learning is characterized in that in a downlink of a heterogeneous network with M base stations and N mobile users, a macro base station MBS has
Figure 660841DEST_PATH_IMAGE001
A micro base station PBS has
Figure 729160DEST_PATH_IMAGE002
And satisfy
Figure 226000DEST_PATH_IMAGE003
Setting up
Figure 876424DEST_PATH_IMAGE004
Indicating a base stationmAnd the usernThe association relationship between the two or more of the three,
Figure 253048DEST_PATH_IMAGE005
indicating a base stationmAnd the usernAssociating;
Figure 90554DEST_PATH_IMAGE006
indicating a base stationmAnd the usernIrrelevant;
setting up
Figure 125506DEST_PATH_IMAGE007
Indicating the state of the spectrum when the user is presentnAnd sub-carrierkBase station ofmAssociated, spectral state
Figure 145283DEST_PATH_IMAGE007
The following rules were used to determine:
Figure 443541DEST_PATH_IMAGE008
representing a useriUsing channelsk
Figure 565080DEST_PATH_IMAGE009
Indicating that the user is not using the channelk
Setting up
Figure 387412DEST_PATH_IMAGE010
Representing a usernAnd a base stationmOn-channelkA transmission power of; the method comprises the following specific steps:
Figure 215691DEST_PATH_IMAGE011
indicating that the total transmit power of each cell base station should be at a preset power limit
Figure 747166DEST_PATH_IMAGE012
Below;
representing time slots using block fading modelstUser's devicenTo the base stationmThe downlink channel gains of (c) are as follows:
Figure 277373DEST_PATH_IMAGE013
wherein
Figure 185287DEST_PATH_IMAGE014
Representing large-scale fading components including path loss and lognormal shading, and following a Jakes fading model; small scale Rayleigh fading component
Figure 400497DEST_PATH_IMAGE015
Expressed as a first order gaussian-markov process:
Figure 306136DEST_PATH_IMAGE016
wherein
Figure 323639DEST_PATH_IMAGE017
Are independent and have uniformly distributed circularly symmetric complex gaussian random variables of unit variance;
Figure 707347DEST_PATH_IMAGE018
wherein
Figure 572535DEST_PATH_IMAGE019
Is a first type of zero order bessel function,
Figure 632764DEST_PATH_IMAGE020
is the maximum doppler frequency;
the inter-cell interference ICI experienced when users in different cells are allocated the same sub-carriers is expressed as follows:
Figure 950613DEST_PATH_IMAGE021
wherein
Figure 590541DEST_PATH_IMAGE022
Indicating a base stationmOn the sub-carriernUser of upper servicek(ii) experienced inter-cell interference;
Figure 480875DEST_PATH_IMAGE023
is represented in sub-carriersnUpper base stationm' To the usern' The transmission power of the antenna is set to be,
Figure 197158DEST_PATH_IMAGE024
is at the sub-carrierkUpper slave base stationm' To the usernThe square of the channel gain of (d); when in use
Figure 834594DEST_PATH_IMAGE025
From the base stationmOn the sub-carriernUser of upper servicekThe signal to interference plus noise ratio of (c) is as follows:
Figure 996322DEST_PATH_IMAGE026
wherein
Figure 23053DEST_PATH_IMAGE027
Is from the base stationmTo the usernA power of additive white gaussian noise; when the base stationmTo a usernAnd base stationm' To a usern'Are allocated sub-carriers simultaneouslykWhen the temperature of the water is higher than the set temperature,
Figure 910238DEST_PATH_IMAGE028
will interfere with the base stationmTo a usernAnd is and
Figure 61733DEST_PATH_IMAGE029
step S1, deploying a DNN framework for each base station, wherein the DNN framework is based on an ADMM algorithm and takes channel information CSI as heterogeneous network weight; giving an optimal resource allocation strategy in the current state according to the user association information and the average interference power obtained by the base station; in particular, the amount of the solvent to be used,
the spectral efficiency objective optimization function is as follows:
Figure 365501DEST_PATH_IMAGE030
the energy efficiency objective optimization function is as follows:
Figure 997471DEST_PATH_IMAGE031
the multi-objective optimization function is solved based on an ADMM algorithm, and the augmented Lagrange function is as follows:
Figure 835983DEST_PATH_IMAGE032
Figure 818982DEST_PATH_IMAGE033
wherein
Figure 401142DEST_PATH_IMAGE034
The values, representing the lagrangian multiplier,
Figure 340148DEST_PATH_IMAGE035
is a penalty parameter; at this time, the unconstrained optimization problem can be expressed as:
Figure 162611DEST_PATH_IMAGE036
by respectively pairing
Figure 819857DEST_PATH_IMAGE037
Find out by deviational derivation
Figure 877812DEST_PATH_IMAGE038
The best solution of (1):
Figure 733641DEST_PATH_IMAGE040
the following can be obtained:
Figure 591919DEST_PATH_IMAGE041
Figure 562893DEST_PATH_IMAGE042
Figure 503167DEST_PATH_IMAGE043
wherein:
Figure 213503DEST_PATH_IMAGE044
,
Figure 236822DEST_PATH_IMAGE045
,
Figure 933160DEST_PATH_IMAGE046
step S2, regarding each base station as an independent subject, and regarding the state of the base station as a modeling environment; a plurality of agent programs observe the same heterogeneous network environment and take action, and meanwhile, the agent programs are communicated with each other through the awards of the environment; the agent adjusts the policy according to the reward; specifically, the method comprises the following steps:
and (4) state set S: by
Figure 411546DEST_PATH_IMAGE047
A state component comprising
Figure 523858DEST_PATH_IMAGE048
(ii) a The state for characterizing the heterogeneous network environment comprises user association information
Figure 311555DEST_PATH_IMAGE049
And interference power
Figure 712580DEST_PATH_IMAGE050
Then the heterogeneous network state is represented as:
Figure 181607DEST_PATH_IMAGE051
action set A: based on the current state, the agent is pi-in based on a decision policy
Figure 335377DEST_PATH_IMAGE052
Taking an action; the actions include selecting a subcarrier
Figure 28396DEST_PATH_IMAGE053
And corresponding transmission power
Figure 916717DEST_PATH_IMAGE054
(ii) a Then the action is represented as
Figure 533643DEST_PATH_IMAGE055
Rewarding: agent computing environment reward after action is taken
Figure 807499DEST_PATH_IMAGE056
(ii) a Defining the energy efficiency function as a reward in the system model:
Figure 422151DEST_PATH_IMAGE057
designing a DNN-based optimization framework, and combining Q learning to generate a strategy pi; wherein the input to the DNN-based optimization framework is the set of observed states S and the output of the DNN-based optimization framework is all executable actions in the set of actions A; each state action pair has a corresponding Q value
Figure 578194DEST_PATH_IMAGE058
(ii) a Each step selects the action that achieves the maximum Q value at each state, as shown below
Figure 998811DEST_PATH_IMAGE059
Updating the Q value according to the Q learning algorithm by the following formula
Figure 861594DEST_PATH_IMAGE060
Figure 902274DEST_PATH_IMAGE061
Wherein
Figure 748877DEST_PATH_IMAGE062
And
Figure 143824DEST_PATH_IMAGE063
learning rate and discount factor, respectively;
Figure 454588DEST_PATH_IMAGE064
and
Figure 123293DEST_PATH_IMAGE065
indicates the next state and is in the state
Figure 457191DEST_PATH_IMAGE066
The reward obtained after the action is taken is,
Figure 422873DEST_PATH_IMAGE067
indicating a state
Figure 588144DEST_PATH_IMAGE068
The following executable acts may be performed,
Figure 981079DEST_PATH_IMAGE069
is a set of executable actions;
Figure 67853DEST_PATH_IMAGE070
indicating a state
Figure 571646DEST_PATH_IMAGE071
The value of Q in the following (A),
Figure 794686DEST_PATH_IMAGE072
indicating a state
Figure 358523DEST_PATH_IMAGE064
Set of executable actions under
Figure 401434DEST_PATH_IMAGE069
Maximum Q value of (1); the loss function in each agent can be expressed as:
Figure 505656DEST_PATH_IMAGE073
wherein
Figure 599514DEST_PATH_IMAGE074
A weight representing the target network; use of
Figure 114678DEST_PATH_IMAGE075
Greedy policy from online network
Figure 395618DEST_PATH_IMAGE076
In the selection action
Figure 496341DEST_PATH_IMAGE077
The target network is
Figure 241443DEST_PATH_IMAGE078
While the weights are fixed, multiple iterations are performed while updating the weights in the online network.
2. The reinforcement learning-based heterogeneous network resource allocation method according to claim 1, wherein the resource allocation method based on the ADMM algorithm in step S1 specifically includes the following steps:
step S1.1, updating the currently observed state
Figure 865191DEST_PATH_IMAGE079
Step S1.2, initialization
Figure 882695DEST_PATH_IMAGE080
Step S1.3, setting threshold value
Figure 984512DEST_PATH_IMAGE081
Maximum number of iterations
Figure 693709DEST_PATH_IMAGE082
Starting iteration; network computing based on DNN
Figure 973512DEST_PATH_IMAGE083
(ii) a When in use
Figure 275049DEST_PATH_IMAGE084
Then output the corresponding
Figure 728027DEST_PATH_IMAGE085
3. The method according to claim 1, wherein the step S2 obtains an optimal resource allocation scheme using an ADMM network that uses channel state information as a network weight, and includes the following steps:
step S2.1, initializing the reproduction memory
Figure 900251DEST_PATH_IMAGE086
DQN network parameters
Figure 147693DEST_PATH_IMAGE087
And target network replacement step size
Figure 592318DEST_PATH_IMAGE088
Step S2.2, initializing the on-line network
Figure 848987DEST_PATH_IMAGE089
And weight
Figure 875718DEST_PATH_IMAGE090
Initializing an online network
Figure 294061DEST_PATH_IMAGE091
And make the weight
Figure 773453DEST_PATH_IMAGE092
Step S2.3, setting threshold value
Figure 302654DEST_PATH_IMAGE093
Step S2.4, each agent program is used according to the current state information
Figure 252068DEST_PATH_IMAGE094
Greedy policy selection decision
Figure 575733DEST_PATH_IMAGE095
Step S2.5, updating environment
Figure 824312DEST_PATH_IMAGE096
Receive a reward
Figure 203209DEST_PATH_IMAGE097
Step S2.6, each agent program observes the rewards obtained by all agents
Figure 955265DEST_PATH_IMAGE098
Storing into respective D;
step S2.7, randomly sampling from D, and calculating loss function
Figure 246569DEST_PATH_IMAGE099
And update the weight
Figure 28449DEST_PATH_IMAGE101
Every other, at
Figure 7775DEST_PATH_IMAGE102
Updating target network parameters
Figure 332446DEST_PATH_IMAGE103
Until all agents meet a threshold or a maximum iteration step is reached.
CN202110006111.3A 2021-01-05 2021-01-05 Heterogeneous network resource allocation method based on reinforcement learning Active CN112351433B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110006111.3A CN112351433B (en) 2021-01-05 2021-01-05 Heterogeneous network resource allocation method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110006111.3A CN112351433B (en) 2021-01-05 2021-01-05 Heterogeneous network resource allocation method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN112351433A true CN112351433A (en) 2021-02-09
CN112351433B CN112351433B (en) 2021-05-25

Family

ID=74427832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110006111.3A Active CN112351433B (en) 2021-01-05 2021-01-05 Heterogeneous network resource allocation method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN112351433B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113162682A (en) * 2021-05-13 2021-07-23 重庆邮电大学 PD-NOMA-based multi-beam LEO satellite system resource allocation method
CN113242602A (en) * 2021-05-10 2021-08-10 内蒙古大学 Millimeter wave large-scale MIMO-NOMA system resource allocation method and system
CN113473580A (en) * 2021-05-14 2021-10-01 南京信息工程大学滨江学院 Deep learning-based user association joint power distribution strategy in heterogeneous network
CN113613301A (en) * 2021-08-04 2021-11-05 北京航空航天大学 Air-space-ground integrated network intelligent switching method based on DQN
CN114116156A (en) * 2021-10-18 2022-03-01 武汉理工大学 Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method
CN114205899A (en) * 2022-01-18 2022-03-18 电子科技大学 Heterogeneous network high energy efficiency power control method based on deep reinforcement learning
CN114340017A (en) * 2022-03-17 2022-04-12 山东科技大学 Heterogeneous network resource slicing method with eMBB and URLLC mixed service

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102238631A (en) * 2011-08-17 2011-11-09 南京邮电大学 Method for managing heterogeneous network resources based on reinforcement learning
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN108521673A (en) * 2018-04-09 2018-09-11 湖北工业大学 Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network
CN108848561A (en) * 2018-04-11 2018-11-20 湖北工业大学 A kind of isomery cellular network combined optimization method based on deeply study
US20190124667A1 (en) * 2017-10-23 2019-04-25 Commissariat A L'energie Atomique Et Aux Energies Alternatives Method for allocating transmission resources using reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102238631A (en) * 2011-08-17 2011-11-09 南京邮电大学 Method for managing heterogeneous network resources based on reinforcement learning
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
US20190124667A1 (en) * 2017-10-23 2019-04-25 Commissariat A L'energie Atomique Et Aux Energies Alternatives Method for allocating transmission resources using reinforcement learning
CN108521673A (en) * 2018-04-09 2018-09-11 湖北工业大学 Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network
CN108848561A (en) * 2018-04-11 2018-11-20 湖北工业大学 A kind of isomery cellular network combined optimization method based on deeply study

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冯陈伟 等: "基于强化学习的异构无线网络资源管理算法", 《电信科学》 *
陈前斌 等: "基于深度强化学习的异构云无线接入网自适应无线资源分配算法", 《电子与信息学报》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113242602A (en) * 2021-05-10 2021-08-10 内蒙古大学 Millimeter wave large-scale MIMO-NOMA system resource allocation method and system
CN113242602B (en) * 2021-05-10 2022-04-22 内蒙古大学 Millimeter wave large-scale MIMO-NOMA system resource allocation method and system
CN113162682A (en) * 2021-05-13 2021-07-23 重庆邮电大学 PD-NOMA-based multi-beam LEO satellite system resource allocation method
CN113162682B (en) * 2021-05-13 2022-06-24 重庆邮电大学 PD-NOMA-based multi-beam LEO satellite system resource allocation method
CN113473580A (en) * 2021-05-14 2021-10-01 南京信息工程大学滨江学院 Deep learning-based user association joint power distribution strategy in heterogeneous network
CN113473580B (en) * 2021-05-14 2024-04-26 南京信息工程大学滨江学院 User association joint power distribution method based on deep learning in heterogeneous network
CN113613301A (en) * 2021-08-04 2021-11-05 北京航空航天大学 Air-space-ground integrated network intelligent switching method based on DQN
CN113613301B (en) * 2021-08-04 2022-05-13 北京航空航天大学 Air-ground integrated network intelligent switching method based on DQN
CN114116156A (en) * 2021-10-18 2022-03-01 武汉理工大学 Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method
CN114205899A (en) * 2022-01-18 2022-03-18 电子科技大学 Heterogeneous network high energy efficiency power control method based on deep reinforcement learning
CN114205899B (en) * 2022-01-18 2023-04-07 电子科技大学 Heterogeneous network high-energy-efficiency power control method based on deep reinforcement learning
CN114340017A (en) * 2022-03-17 2022-04-12 山东科技大学 Heterogeneous network resource slicing method with eMBB and URLLC mixed service

Also Published As

Publication number Publication date
CN112351433B (en) 2021-05-25

Similar Documents

Publication Publication Date Title
CN112351433B (en) Heterogeneous network resource allocation method based on reinforcement learning
Alqerm et al. Sophisticated online learning scheme for green resource allocation in 5G heterogeneous cloud radio access networks
Zappone et al. User association and load balancing for massive MIMO through deep learning
Xie et al. Energy-efficient resource allocation for heterogeneous cognitive radio networks with femtocells
Wang et al. Price-based spectrum management in cognitive radio networks
Samarakoon et al. Backhaul-aware interference management in the uplink of wireless small cell networks
CN107426773B (en) Energy efficiency-oriented distributed resource allocation method and device in wireless heterogeneous network
CN106358308A (en) Resource allocation method for reinforcement learning in ultra-dense network
Dai et al. Energy-efficient resource allocation for energy harvesting-based device-to-device communication
Wu et al. QoE-based distributed multichannel allocation in 5G heterogeneous cellular networks: A matching-coalitional game solution
CN106792451B (en) D2D communication resource optimization method based on multi-population genetic algorithm
CN104717755A (en) Downlink frequency spectrum resource distribution method with D2D technology introduced in cellular network
CN113316154A (en) Authorized and unauthorized D2D communication resource joint intelligent distribution method
CN110191489B (en) Resource allocation method and device based on reinforcement learning in ultra-dense network
Bi et al. Deep reinforcement learning based power allocation for D2D network
Yu et al. Interference coordination strategy based on Nash bargaining for small‐cell networks
Han et al. Power allocation for device-to-device underlay communication with femtocell using stackelberg game
Mach et al. Power allocation, channel reuse, and positioning of flying base stations with realistic backhaul
Aboagye et al. Energy-efficient resource allocation for aggregated RF/VLC systems
Venkateswararao et al. Traffic aware sleeping strategies for small-cell base station in the ultra dense 5G small cell networks
Najla et al. Efficient exploitation of radio frequency and visible light communication bands for D2D in mobile networks
Su et al. User-centric base station clustering and resource allocation for cell-edge users in 6G ultra-dense networks
Pantisano et al. On the dynamic formation of cooperative multipoint transmissions in small cell networks
Marshoud et al. Macrocell–femtocells resource allocation with hybrid access motivational model
Eliodorou et al. User association coalition games with zero-forcing beamforming and NOMA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant