CN113490219B - Dynamic resource allocation method for ultra-dense networking - Google Patents

Dynamic resource allocation method for ultra-dense networking Download PDF

Info

Publication number
CN113490219B
CN113490219B CN202110762110.1A CN202110762110A CN113490219B CN 113490219 B CN113490219 B CN 113490219B CN 202110762110 A CN202110762110 A CN 202110762110A CN 113490219 B CN113490219 B CN 113490219B
Authority
CN
China
Prior art keywords
cluster
base station
downlink
uplink
interference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110762110.1A
Other languages
Chinese (zh)
Other versions
CN113490219A (en
Inventor
黄川�
崔曙光
王丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese University of Hong Kong Shenzhen
Original Assignee
Chinese University of Hong Kong Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese University of Hong Kong Shenzhen filed Critical Chinese University of Hong Kong Shenzhen
Priority to CN202110762110.1A priority Critical patent/CN113490219B/en
Publication of CN113490219A publication Critical patent/CN113490219A/en
Application granted granted Critical
Publication of CN113490219B publication Critical patent/CN113490219B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/02Resource partitioning among network components, e.g. reuse partitioning
    • H04W16/10Dynamic resource partitioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • H04B7/0426Power distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • H04B7/0456Selection of precoding matrices or codebooks, e.g. using matrices antenna weighting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0453Resources in frequency domain, e.g. a carrier in FDMA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0473Wireless resource allocation based on the type of the allocated resource the resource being transmission power
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/54Allocation or scheduling criteria for wireless resources based on quality criteria
    • H04W72/541Allocation or scheduling criteria for wireless resources based on quality criteria using the level of interference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/54Allocation or scheduling criteria for wireless resources based on quality criteria
    • H04W72/542Allocation or scheduling criteria for wireless resources based on quality criteria using measured or perceived quality

Abstract

The invention discloses a dynamic resource allocation method for ultra-dense networking, which comprises the following steps: s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station; s2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference; s3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function; s4, constructing an optimization problem based on system throughput; s5, determining a cluster central node based on a neighbor propagation algorithm; and S6, carrying out dynamic network resource allocation based on distributed reinforcement learning. The invention can effectively coordinate the transmission of a plurality of cells, improve the network performance and maximize the network throughput through the grouping, power distribution and sending parameter design of each cell.

Description

Dynamic resource allocation method for ultra-dense networking
Technical Field
The invention relates to the field of wireless communication, in particular to a dynamic resource allocation method for ultra-dense networking.
Background
Ultra-dense networking is one of the key technologies for 5G communication, and will certainly be developed in the 5G era in the future. In the ultra-dense networking, the physical distance between each access point is greatly shortened, the transmitting power between the access points and the mobile user can be obviously reduced, and the wireless coverage also fully exploits the potential of frequency reuse. Meanwhile, the full-duplex technology enables the transceiver to simultaneously transmit and receive data in the same frequency spectrum, thereby improving the data transmission density to the maximum extent in the dimension of time and frequency and reducing the energy cost of a guard interval.
In recent years, researchers have combined ultra-dense networking with full-duplex technology, and by fully utilizing wireless resources in space, time and frequency dimensions, network throughput is improved, and energy consumption of a system is reduced. In full-duplex ultra-dense networking, each node is equipped with a low power transmitter, so that the self-interference cancellation present in a full-duplex system can easily be cancelled to a sufficiently low level. Furthermore, ultra-dense networking using full-duplex technology can achieve dual performance gains from both. However, interference in the system is also particularly severe due to the irregular distribution of a large number of cells in ultra-dense networking. In addition, residual self-interference still exists in the full-duplex node, and the interference in the full-duplex ultra-dense networking system environment is more complicated. Therefore, it is necessary to design a radio resource management method for full-duplex ultra-dense networking to ensure the quality of service for the user. The literature studies a two-layer ultra-dense network with a macro cell and a plurality of cells, and proposes a combined spectrum and power management scheme which maximally improves the total throughput of a full-duplex ultra-dense network under the constraints of given user service quality and cross-layer interference. Based on the same model, the literature considers the problems of joint user access, subchannel allocation and power control in full-duplex ultra-dense networking, and further provides the problems of joint capacity maximization and power minimization in the full-duplex ultra-dense networking under a user-centered transmission scheme. The centralized control type operation requires state information of all nodes and focuses only on a static wireless environment. In a practical dynamic wireless environment, it is impossible to collect instant information of all nodes in a large network.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a dynamic resource allocation method for ultra-dense networking, which can effectively coordinate the transmission of multiple cells, improve the network performance and maximize the network throughput through the design of grouping, power allocation and sending parameters of each cell.
The purpose of the invention is realized by the following technical scheme: a dynamic resource allocation method for ultra-dense networking comprises the following steps:
s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station;
the constructed ultra-dense networking model comprises the following steps:
considering an ultra-dense network with N cells randomly deployed in a fixed area, wherein each cell is provided with a base station and a pair of uplink and downlink users corresponding to the base station; all transceivers in the system are equipped with an antenna, each base station communicates with the users in full-duplex or half-duplex mode, and all nodes operate on one frequency band.
S2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference;
the clustering results for N cells are:
setting clustering structure
Figure GDA0003468674060000021
Representing that N cells are divided into K clusters, and omega represents all feasible clustering structure sets; each cluster
Figure GDA0003468674060000022
K is more than or equal to 1 and less than or equal to N and comprises one or more cells; binary variable
Figure GDA0003468674060000023
Indicating that the nth cell selects the kth cluster, otherwise
Figure GDA0003468674060000024
Each base station can only join one cluster at most, so
Figure GDA0003468674060000025
S3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function;
the uplink transmission scheme comprises:
setting nth cell selection cluster
Figure GDA0003468674060000026
The transmission power of uplink users in the cell is
Figure GDA0003468674060000027
Of (2) a signal
Figure GDA0003468674060000028
To have
Figure GDA0003468674060000029
A physical virtual base station for each receiving antenna; modeling uplink transmission in each cluster as a multi-user single-input multi-output channel, and then receiving signals by the virtual base station group in the kth cluster are as follows:
Figure GDA00034686740600000210
wherein the content of the first and second substances,
Figure GDA00034686740600000211
representing the channel parameter from the uplink user n to the virtual base station in the cluster,
Figure GDA00034686740600000212
and
Figure GDA00034686740600000213
representing self-interfering channels and from
Figure GDA00034686740600000214
The uplink and downlink inter-cluster interference channel of (a),
Figure GDA00034686740600000215
and
Figure GDA00034686740600000216
respectively representing co-cluster downlink interference signals and signals from
Figure GDA00034686740600000217
The uplink and downlink interference signals of (2),
Figure GDA00034686740600000218
representing an additive white Gaussian noise vector and satisfying by each member
Figure GDA00034686740600000219
In the base station group receiving signals, the second term is self-interference on the entity base station, and after self-interference elimination, residual self-interference is modeled to mean value of 0 and variance of zeta2Additive white gaussian noise of (1);
decoding the signals by a minimum mean square error serial interference elimination decoder to obtain
Figure GDA00034686740600000220
The inner uplink reachable rate is:
Figure GDA00034686740600000221
wherein the content of the first and second substances,
Figure GDA0003468674060000031
representing rank as NkIdentity matrix, inter-cluster interference matrix of
Figure GDA0003468674060000032
Expressed as:
Figure GDA0003468674060000033
wherein the content of the first and second substances,
Figure GDA0003468674060000034
is composed of
Figure GDA0003468674060000035
And (4) precoding matrixes of the nth user of the inner downlink.
The downlink transmission scheme comprises:
in downlink transmission, the virtual base station passes through the precoder
Figure GDA0003468674060000036
For each signal sent to downlink users
Figure GDA0003468674060000037
Carrying out pre-coding;
Figure GDA0003468674060000038
the downlink transmission can be modeled as a multi-input single-output channel, and the received signal of the nth downlink user in the kth cluster is:
Figure GDA0003468674060000039
wherein the content of the first and second substances,
Figure GDA00034686740600000310
indicating the channel parameters from the base station to the downlink user n,
Figure GDA00034686740600000311
and
Figure GDA00034686740600000312
indicating uplink interference channels within a cluster and from
Figure GDA00034686740600000313
The uplink and downlink inter-cluster interference channel of (a),
Figure GDA00034686740600000314
and
Figure GDA00034686740600000315
respectively representing co-cluster uplink interference signals and signals from
Figure GDA00034686740600000316
The uplink and downlink interference signals of (2),
Figure GDA00034686740600000317
representing additive Gaussian whiteNoise; based on the received signal, the reachable sum rate of the downlink in the cluster can be expressed as
Figure GDA00034686740600000318
In which inter-cluster interference
Figure GDA00034686740600000319
Is shown as
Figure GDA00034686740600000320
The process of determining the revenue function includes:
for having NkCluster of base stations
Figure GDA00034686740600000321
SIC decoding complexity increases exponentially with the number of base stations, i.e.
Figure GDA00034686740600000322
For describing cluster complexity, cluster group
Figure GDA00034686740600000323
The instantaneous profit and cluster cost of (c) is defined as:
Figure GDA00034686740600000324
wherein q iskFor a given cost per unit price, based on the above analysis, the revenue function for cell n to join the kth cluster is defined as:
Figure GDA0003468674060000041
wherein
Figure GDA0003468674060000042
Representing the relative proportion of contributions, v{n}Indicating the benefit of a single cluster n.
S4, constructing an optimization problem based on system throughput;
the optimization problem described in step S4 includes:
Figure GDA0003468674060000043
Figure GDA0003468674060000044
Figure GDA0003468674060000045
Figure GDA0003468674060000046
wherein
Figure GDA0003468674060000047
All parameters are expressed on the whole time scale, T represents all time lengths, and gamma is equal to 0,1]Representing the discount coefficient, PULAnd PDLRepresents the uplink and downlink maximum transmit power; the purpose of the above optimization problem is to jointly optimize the cluster selection and the uplink and downlink transmission parameters of each cell to dynamically maximize the average total throughput of the network over a long-term time scale.
S5, determining a cluster central node based on a neighbor propagation algorithm;
s501, defining the similarity between any two base stations as
Figure GDA0003468674060000048
Wherein L isnRepresents the geographical location of base station n;
s502, defining the responsibility degree and the availability degree between any base stations as
Figure GDA0003468674060000049
Figure GDA00034686740600000410
Wherein
Figure GDA00034686740600000411
Representing the degree of freedom with which base station n is selected by base station m as the cluster center,
Figure GDA00034686740600000412
representing the fitness of base station m to select base station n as the cluster center;
s503, calculating cluster center set
Figure GDA00034686740600000413
S6, dynamic network resource allocation is carried out based on distributed reinforcement learning:
s601, dividing each time slot into two stages, selecting a cluster center by each intelligent agent respectively to define the action and state space of each intelligent agent, and enabling the gain function of each stage to be as follows:
in time slot t, each agent first selects a cluster center in the first phase, i.e. the cluster center
Figure GDA0003468674060000051
Wherein the content of the first and second substances,
Figure GDA0003468674060000052
is equivalent to
Figure GDA0003468674060000053
In the second stage, transmission is carried outThe parameters are selected, and the action space at this stage is defined as
Figure GDA0003468674060000054
Wherein
Figure GDA0003468674060000055
And
Figure GDA0003468674060000056
respectively the uplink and downlink transmission power of the nth agent,
Figure GDA0003468674060000057
representing the transmission parameters of the intelligent agent n downlink transmission node to the user m; when the first stage is finished, the clustering structure is fixed, and the current N iskWhen the agents form a cluster, the agents N, N are the same as {1, …, N ∈kThe action space in the second stage is simplified to
Figure GDA0003468674060000058
Likewise, the state of the first stage is defined, i.e.
Figure GDA0003468674060000059
Wherein h isn(t) indicates channels associated with agent n, including uplink and downlink channels and interfering channels,
Figure GDA00034686740600000510
is a vector dependent on the members in the last slot cluster; if agents n and m were clustered in the last time slot
Figure GDA00034686740600000511
Then
Figure GDA00034686740600000512
The values of the respective n-th and m-th elements in (a) are 1, and the remaining values are 0; after the clustering of the first stage is completed, each agent can observe the members in the cluster at the current moment; thus, the state of the second stage is updated to
Figure GDA00034686740600000513
Subsequently, a two-stage revenue function is defined as
Figure GDA00034686740600000514
S602, constructing a multi-agent deep reinforcement learning framework, and solving the problem of clustering and transmission parameter distributed execution behaviors:
in time slot t, agent n first selects a cluster center with the aid of a DQN network
Figure GDA00034686740600000515
As a state
Figure GDA00034686740600000516
A function of (a); then, since the same cluster of agents can observe each other, the vector
Figure GDA00034686740600000517
Generate, update the state to
Figure GDA00034686740600000518
Each agent selects behavior according to the local state and the operator network in the DDPG structure
Figure GDA00034686740600000519
When the execution of the behavior is finished, the benefits of the two stages are respectively obtained
Figure GDA00034686740600000520
And
Figure GDA00034686740600000521
the environment jumps to the next state
Figure GDA00034686740600000522
And
Figure GDA00034686740600000523
after each intelligent agent obtains the behavior at the current moment, each cell in the networking distributively selects the respective cluster center and executes the uplink and downlink sending parameters, so that the transmission of uplink and downlink signals is realized. At each moment, each cell
Figure GDA0003468674060000061
In a distributed manner, and in a manner that cluster selection is performed
Figure GDA0003468674060000062
Signal transmission is carried out on the transmission parameters, so that the allocation of dynamic resources of the whole ultra-dense networking on a long time scale is realized;
s603, after the action execution is finished, the experience of two stages
Figure GDA0003468674060000063
And
Figure GDA0003468674060000064
are respectively stored in memory registers with a fixed length of M
Figure GDA0003468674060000065
And
Figure GDA0003468674060000066
performing the following steps; if the memory buffer is full, the old memory bar will be covered by the new memory bar; the trainer randomly extracts D memory training networks from the memory buffer;
DQN networks are mainly trained by minimizing the loss function, i.e.
Figure GDA0003468674060000067
Wherein
Figure GDA0003468674060000068
Is a corresponding target Q function, the parameter θ' of which will be periodically updated according to the value of θ, i.e.
θ′←(1-τ)θ+τθ′,
Where τ is a fixed update parameter; each agent is then equipped with a duplicate version of the target Q network, using the epsilon-greedy method and based on
Figure GDA0003468674060000069
A value selection behavior of; in DQN, buffers
Figure GDA00034686740600000610
Each memory in the memory contains the experience of all agents at a certain moment, i.e. the
Figure GDA00034686740600000615
For the second stage of training, each agent has a DDPG structure, which is composed of an Actor and a Critic network. The Actor network is used for taking actions in a distributed manner according to the current local observation, and the Critic is used for evaluating the quality of the Actor output action and guiding the Actor network to output a more effective strategy; therefore, the training of Critic and Actor is also performed on the centralized controller; wherein the Actor network is mainly trained by minimizing the following gradient function, i.e.
Figure GDA00034686740600000611
Wherein
Figure GDA00034686740600000612
Is used for evaluating the behavior of the current Actor selection for the output of the Critic network and finding a better behavior for the current Actor selectionGradient in the falling direction of (u)nA strategy representing the output of the agent n in the second stage; critic is trained primarily by maximizing the loss function, i.e.
Figure GDA00034686740600000613
Wherein the content of the first and second substances,
Figure GDA00034686740600000614
is all target network at parameter θ'nThe output strategy is as follows; likewise, parameter θ'nAccording to thetanIs periodically updated, i.e. the value of
θ′n←(1-τ)θn+τθ′n.
Buffer memory
Figure GDA0003468674060000071
Each memory in the memory also contains the experience of all agents at a certain moment, i.e. the experience of all agents
Figure GDA0003468674060000072
m∈{1,…,M}。
The invention has the beneficial effects that: the invention can effectively coordinate the transmission of a plurality of cells, improve the network performance and maximize the network throughput through the grouping, power distribution and sending parameter design of each cell.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a schematic diagram of behavior execution for dynamic resource allocation;
fig. 3 is a schematic diagram of a simulation scenario of 10 cells in the embodiment;
FIG. 4 is a diagram illustrating average and profit under different clustering strategies in the embodiment;
FIG. 5 is a diagram illustrating average and profit under different duplexing modes in an embodiment;
fig. 6 is a diagram illustrating the ratio of the full-duplex base station in the embodiment as a function of time.
Detailed Description
The technical solutions of the present invention are further described in detail below with reference to the accompanying drawings, but the scope of the present invention is not limited to the following.
As shown in fig. 1, a dynamic resource allocation method for ultra-dense networking includes the following steps:
s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station;
considering an ultra-dense network with N cells randomly deployed in a fixed area, wherein each cell is provided with a base station and a pair of uplink and downlink users corresponding to the base station; all transceivers in the system are equipped with an antenna, each base station communicates with the users in full-duplex or half-duplex mode, and all nodes operate on one frequency band.
S2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference;
the clustering results for N cells are:
setting clustering structure
Figure GDA0003468674060000073
Representing that N cells are divided into K clusters, and omega represents all feasible clustering structure sets; each cluster
Figure GDA0003468674060000074
K is more than or equal to 1 and less than or equal to N and comprises one or more cells; binary variable
Figure GDA0003468674060000075
Indicating that the nth cell selects the kth cluster, otherwise
Figure GDA0003468674060000076
Each base station can only join one cluster at most, so
Figure GDA0003468674060000077
In order to increase the overall throughput of the networkIn volume, how to form a cluster structure to more efficiently service users becomes a critical issue. Next, we will present a transmission model defining the revenue function for each member of any feasible cluster.
S3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function;
the uplink transmission scheme comprises:
setting nth cell selection cluster
Figure GDA0003468674060000081
The transmission power of uplink users in the cell is
Figure GDA0003468674060000082
Of (2) a signal
Figure GDA0003468674060000083
To have
Figure GDA0003468674060000084
A physical virtual base station for each receiving antenna; modeling uplink transmission in each cluster as a multi-user single-transmission multi-reception SIMO channel, and then receiving signals by the virtual base station group in the kth cluster are as follows:
Figure GDA0003468674060000085
wherein the content of the first and second substances,
Figure GDA0003468674060000086
representing the channel parameter from the uplink user n to the virtual base station in the cluster,
Figure GDA0003468674060000087
and
Figure GDA0003468674060000088
representing self-interfering channels and from
Figure GDA0003468674060000089
Uplink and downlink ofThe inter-cluster interference channel is a channel that is,
Figure GDA00034686740600000810
and
Figure GDA00034686740600000811
respectively representing co-cluster downlink interference signals and signals from
Figure GDA00034686740600000812
The uplink and downlink interference signals of (2),
Figure GDA00034686740600000813
representing an additive white Gaussian noise vector and satisfying by each member
Figure GDA00034686740600000814
In the base station group receiving signals, the second term is self-interference on the entity base station, and after self-interference elimination, residual self-interference is modeled to mean value of 0 and variance of zeta2Additive white gaussian noise of (1);
decoding the signals by a minimum mean square error serial interference elimination decoder to obtain
Figure GDA00034686740600000815
The inner uplink reachable rate is:
Figure GDA00034686740600000816
wherein the content of the first and second substances,
Figure GDA00034686740600000817
representing rank as NkIdentity matrix, inter-cluster interference matrix of
Figure GDA00034686740600000818
Expressed as:
Figure GDA00034686740600000819
wherein the content of the first and second substances,
Figure GDA00034686740600000820
is composed of
Figure GDA00034686740600000821
And (4) precoding matrixes of the nth user of the inner downlink.
The downlink transmission scheme comprises:
in downlink transmission, the virtual base station passes through the precoder
Figure GDA00034686740600000822
For each signal sent to downlink users
Figure GDA00034686740600000823
Carrying out pre-coding;
Figure GDA00034686740600000824
the downlink transmission can be modeled as a multi-input single-output channel, and the received signal of the nth downlink user in the kth cluster is:
Figure GDA0003468674060000091
wherein the content of the first and second substances,
Figure GDA0003468674060000092
indicating the channel parameters from the base station to the downlink user n,
Figure GDA0003468674060000093
and
Figure GDA0003468674060000094
indicating uplink interference channels within a cluster and from
Figure GDA0003468674060000095
The uplink and downlink inter-cluster interference channel of (a),
Figure GDA0003468674060000096
and
Figure GDA0003468674060000097
respectively representing co-cluster uplink interference signals and signals from
Figure GDA0003468674060000098
The uplink and downlink interference signals of (2),
Figure GDA0003468674060000099
representing additive white gaussian noise; based on the received signal, the reachable sum rate of the downlink in the cluster can be expressed as
Figure GDA00034686740600000910
In which inter-cluster interference
Figure GDA00034686740600000911
Is shown as
Figure GDA00034686740600000912
The process of determining the revenue function includes:
for having NkCluster of base stations
Figure GDA00034686740600000913
SIC decoding complexity increases exponentially with the number of base stations, i.e.
Figure GDA00034686740600000914
For describing cluster complexity, cluster group
Figure GDA00034686740600000915
The instantaneous profit and cluster cost of (c) is defined as:
Figure GDA00034686740600000916
wherein q iskFor a given cost per unit price, based on the above analysis, the revenue function for cell n to join the kth cluster is defined as:
Figure GDA00034686740600000917
wherein
Figure GDA00034686740600000918
Representing the relative proportion of contributions, v{n}Indicating the benefit of a single cluster n. These cells want to select the appropriate clustering and transmission parameters to maximize long-term and revenue.
S4, constructing an optimization problem based on system throughput;
the optimization problem described in step S4 includes:
Figure GDA0003468674060000101
Figure GDA0003468674060000102
Figure GDA0003468674060000103
Figure GDA0003468674060000104
wherein
Figure GDA0003468674060000105
All parameters are expressed on the whole time scale, T represents all time lengths, and gamma is equal to 0,1]Representing the discount coefficient, PULAnd PDLRepresents the uplink and downlink maximum transmit power; the purpose of the above optimization problem is to jointly optimize the clustersParameters are selected and sent uplink and downlink of each cell to dynamically maximize the average total throughput of the network over the long-term time scale.
S5, determining a cluster central node based on a neighbor propagation algorithm;
s501, defining the similarity between any two base stations as
Figure GDA0003468674060000106
Wherein L isnRepresents the geographical location of base station n;
s502, defining the responsibility degree and the availability degree between any base stations as
Figure GDA0003468674060000107
Figure GDA0003468674060000108
Wherein
Figure GDA0003468674060000109
Representing the degree of freedom with which base station n is selected by base station m as the cluster center,
Figure GDA00034686740600001010
representing the fitness of base station m to select base station n as the cluster center;
s503, calculating cluster center set
Figure GDA00034686740600001011
S6, dynamic network resource allocation is carried out based on distributed reinforcement learning, each cell is modeled as an intelligent agent object, and resource allocation is used as a behavior set of the intelligent agent:
s601, dividing each time slot into two stages, selecting a cluster center by each intelligent agent respectively to define the action and state space of each intelligent agent, and enabling the gain function of each stage to be as follows:
in time slot t, each agent first selects a cluster center in the first phase, i.e. the cluster center
Figure GDA00034686740600001012
Wherein the content of the first and second substances,
Figure GDA0003468674060000111
is equivalent to
Figure GDA0003468674060000112
In the second phase, the selection of transmission parameters is performed, and the action space of this phase is defined as
Figure GDA0003468674060000113
Wherein
Figure GDA0003468674060000114
And
Figure GDA0003468674060000115
respectively the uplink and downlink transmission power of the nth agent,
Figure GDA0003468674060000116
representing the transmission parameters of the intelligent agent n downlink transmission node to the user m; when the first stage is finished, the clustering structure is fixed, and the current N iskWhen the agents form a cluster, the agents N, N are the same as {1, …, N ∈kThe action space in the second stage is simplified to
Figure GDA0003468674060000117
Likewise, the state of the first stage is defined, i.e.
Figure GDA0003468674060000118
Wherein h isn(t) indicates channels associated with agent n, including uplink and downlink channels and interfering channels,
Figure GDA0003468674060000119
is a vector dependent on the members in the last slot cluster; if agents n and m were clustered in the last time slot
Figure GDA00034686740600001110
Then
Figure GDA00034686740600001111
The values of the respective n-th and m-th elements in (a) are 1, and the remaining values are 0; after the clustering of the first stage is completed, each agent can observe the members in the cluster at the current moment; thus, the state of the second stage is updated to
Figure GDA00034686740600001112
Subsequently, a two-stage revenue function is defined as
Figure GDA00034686740600001113
S602, constructing a multi-agent deep reinforcement learning framework, and solving the problem of clustering and transmission parameter distributed execution behaviors, as shown in FIG. 2:
in time slot t, agent n first selects a cluster center with the aid of a DQN network
Figure GDA00034686740600001114
As a state
Figure GDA00034686740600001115
A function of (a); then, since the same cluster of agents can observe each other, the vector
Figure GDA00034686740600001116
Generate, update the state to
Figure GDA00034686740600001117
Each agent selects behavior according to the local state and the operator network in the DDPG structure
Figure GDA00034686740600001118
When the execution of the behavior is finished, the benefits of the two stages are respectively obtained
Figure GDA00034686740600001119
And
Figure GDA00034686740600001120
the environment jumps to the next state
Figure GDA00034686740600001121
And
Figure GDA00034686740600001122
after each intelligent agent obtains the behavior at the current moment, each cell in the networking distributively selects the respective cluster center and executes the uplink and downlink sending parameters, so that the transmission of uplink and downlink signals is realized. At each moment, each cell
Figure GDA00034686740600001123
In a distributed manner, and in a manner that cluster selection is performed
Figure GDA00034686740600001124
Signal transmission is carried out on the transmission parameters, so that the allocation of dynamic resources of the whole ultra-dense networking on a long time scale is realized;
s603, after the action execution is finished, the experience of two stages
Figure GDA0003468674060000121
And
Figure GDA0003468674060000122
are respectively stored in memory registers with a fixed length of M
Figure GDA0003468674060000123
And
Figure GDA0003468674060000124
performing the following steps; if the memory buffer is full, the old memory bar will be covered by the new memory bar; the trainer randomly extracts D memory training networks from the memory buffer;
DQN networks are mainly trained by minimizing the loss function, i.e.
Figure GDA0003468674060000125
Wherein
Figure GDA0003468674060000126
Is a corresponding target Q function, the parameter θ' of which will be periodically updated according to the value of θ, i.e.
θ′←(1-τ)θ+τθ′,
Where τ is a fixed update parameter; each agent is then equipped with a duplicate version of the target Q network, using the epsilon-greedy method and based on
Figure GDA0003468674060000127
A value selection behavior of; in DQN, buffers
Figure GDA0003468674060000128
Each memory in the memory contains the experience of all agents at a certain moment, i.e. the
Figure GDA0003468674060000129
m∈{1,…,M};
For the second stage of training, each agent has a DDPG structure, which is composed of an Actor and a Critic network. The Actor network is used for taking actions in a distributed manner according to the current local observation, and the Critic is used for evaluating the quality of the Actor output action and guiding the Actor network to output a more effective strategy; therefore, the training of Critic and Actor is also performed on the centralized controller; wherein the Actor network is mainly trained by minimizing the following gradient function, i.e.
Figure GDA00034686740600001210
Wherein
Figure GDA00034686740600001211
The behavior evaluation method is used for evaluating the behavior selected by the current Actor for the output of the Critic network and finding a better gradient descending direction for the behavior; mu.snRepresenting the output strategy of the agent n in the second stage; critic is trained primarily by maximizing the loss function, i.e.
Figure GDA00034686740600001212
Wherein the content of the first and second substances,
Figure GDA00034686740600001213
is all target network at parameter θ'nThe output strategy is as follows; likewise, parameter θ'nAccording to thetanIs periodically updated, i.e. the value of
θ′n←(1-τ)θn+τθ′n.
Buffer memory
Figure GDA00034686740600001214
Each memory in the memory also contains the experience of all agents at a certain moment, i.e. the experience of all agents
Figure GDA0003468674060000131
m∈{1,…,M}。
In the embodiment of the application, some simulation junctions obtained by applying the algorithm are givenAnd (5) fruit. In the simulation scenario, cells generated by 10 two-dimensional poisson point processes in a fixed area of 40 meters by 50 meters are considered, the cell density is 5000 cells per square kilometer, and the radius of each cell is 5 meters, as shown in fig. 3. Maximum uplink transmit power PUL20dB, the maximum downlink transmission power is P UL25 dB. The path loss model is 140.7+36.7log10(d) Where d is the distance from the sender to the receiver. The standard deviation of the shadow fading is set to 8dB and the Gaussian white noise power sigma2Time per slot T of-30 dBd100ms, maximum Doppler frequency fd=10Hz。
Next, we define some hyper-parameters in the neural network. In the algorithm, all the neural networks have an input layer, an output layer and three hidden layers, and the hidden layers respectively have 256 neurons. For the 10-cell simulation scenario, there are 112 state input neurons. The first stage of behavioral neurons has
Figure GDA0003468674060000132
A one, wherein
Figure GDA0003468674060000133
The magnitude of (a) is obtained by an approximate propagation algorithm, and there are 12 behavioral neurons of the second node. The activation functions in the neural network are set as ReLu functions, and learning is carried out by adopting a fixed learning rate. Initial learning rate of the DQN network is set to 10-3Initial learning rates of Actor and Critic networks are set to 10, respectively-3And 10-4We have implemented the proposed algorithm with TensorFlow. FIG. 4 compares the four scenarios of no grouping, grouping into one group, K-means algorithm grouping, random grouping and the proposed dynamic grouping algorithm scenario. Wherein, the node group unit overhead q is qk=0.06,
Figure GDA0003468674060000134
Residual self-interference power is xi2-20 dB. It can be seen that the proposed algorithm can achieve convergence at 6000 slots. Compared with the other four grouping algorithms, the average system benefit obtained by the dynamic grouping algorithm is higher. In addition, we are right toCompared with the average gain of the system under the flexible duplex mode, the full-duplex mode and the half-duplex mode, wherein each node under the full-duplex mode considers the full-power transmission, and the half-duplex mode considers the frequency division duplex mode. As can be seen from fig. 5, the performance of the proposed algorithm to support full half-duplex flexible switching mode is better than full duplex and also better than half duplex. Fig. 6 also shows the proportion of the total base station occupied by the full-duplex base station in the proposed algorithm as the environment changes. Compared with a full duplex mode of full power transmission, the algorithm can achieve greater system throughput with less power consumption; compared with the half-duplex mode, the system throughput is remarkably improved.
Finally, we fix the overhead of the node group unit as q is 0.06, and study the influence of the self-interference cancellation capability on the proportion of the base stations in the system which adopt the full half-duplex mode. We respectively give 5 levels of residual self-interference power in the range of-20 dB to 0dB, and study the base station proportion of full half-duplex mode under different clustering strategies. As can be seen from the table, the proportion of base stations using the full-duplex mode in the proposed dynamic clustering algorithm is smaller under the same self-interference cancellation capability. The combination of the simulation results can show that the full-duplex base station has smaller occupation ratio under the condition of obtaining higher benefit by the proposed algorithm under the condition of low cluster cost, thereby having higher energy efficiency.
Table 1 full duplex base station occupancy ratio
Figure GDA0003468674060000135
Figure GDA0003468674060000141
While the foregoing description shows and describes the preferred embodiments of the present invention, it is to be understood, as noted above, that the invention is not limited to the forms disclosed herein, but is not intended to be exhaustive or to exclude other embodiments and may be used in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept described herein, as determined by the above teachings or as determined by the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (5)

1. A dynamic resource allocation method facing ultra-dense networking is characterized in that: the method comprises the following steps:
s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station;
s2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference;
s3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function;
the uplink transmission scheme described in step S3 includes:
setting nth cell selection cluster
Figure FDA0003468674050000011
The transmission power of uplink users in the cell is
Figure FDA0003468674050000012
Of (2) a signal
Figure FDA0003468674050000013
To have
Figure FDA0003468674050000014
A physical virtual base station for each receiving antenna; modeling uplink transmission in each cluster as a multi-user single-input multi-output channel, and then receiving signals by the virtual base station group in the kth cluster are as follows:
Figure FDA0003468674050000015
wherein the content of the first and second substances,
Figure FDA0003468674050000016
representing the channel parameter from the uplink user n to the virtual base station in the cluster,
Figure FDA0003468674050000017
and
Figure FDA0003468674050000018
representing self-interfering channels and from
Figure FDA0003468674050000019
The uplink and downlink inter-cluster interference channel of (a),
Figure FDA00034686740500000110
and
Figure FDA00034686740500000111
respectively representing co-cluster downlink interference signals and signals from
Figure FDA00034686740500000112
The uplink and downlink interference signals of (2),
Figure FDA00034686740500000113
representing an additive white Gaussian noise vector and satisfying by each member
Figure FDA00034686740500000114
In the base station group receiving signals, the second term is self-interference on the entity base station, and after self-interference elimination, residual self-interference is modeled to mean value of 0 and variance of zeta2Additive white gaussian noise of (1);
decoding the received signal by using a minimum mean square error serial interference elimination decoder to obtain
Figure FDA00034686740500000115
The inner uplink reachable rate is:
Figure FDA00034686740500000116
wherein the content of the first and second substances,
Figure FDA00034686740500000117
representing rank as NkIdentity matrix, inter-cluster interference matrix of
Figure FDA00034686740500000118
Expressed as:
Figure FDA00034686740500000119
wherein the content of the first and second substances,
Figure FDA00034686740500000120
is composed of
Figure FDA00034686740500000121
A precoding matrix of an inner downlink nth user;
the downlink transmission scheme described in step S3 includes:
in downlink transmission, the virtual base station passes through the precoder
Figure FDA00034686740500000122
For each signal sent to downlink users
Figure FDA0003468674050000021
Carrying out pre-coding;
Figure FDA0003468674050000022
the downlink transmission can be modeled as a multi-input single-output channel, and the received signal of the nth downlink user in the kth cluster is:
Figure FDA0003468674050000023
wherein the content of the first and second substances,
Figure FDA0003468674050000024
indicating the channel parameters from the base station to the downlink user n,
Figure FDA0003468674050000025
and
Figure FDA0003468674050000026
indicating uplink interference channels within a cluster and from
Figure FDA0003468674050000027
The uplink and downlink inter-cluster interference channel of (a),
Figure FDA0003468674050000028
and
Figure FDA0003468674050000029
respectively representing co-cluster uplink interference signals and signals from
Figure FDA00034686740500000210
The uplink and downlink interference signals of (2),
Figure FDA00034686740500000211
representing additive white gaussian noise; based on the received signal, the reachable sum rate of the downlink in the cluster can be expressed as
Figure FDA00034686740500000212
Wherein inter-cluster interference
Figure FDA00034686740500000213
Is shown as
Figure FDA00034686740500000214
The process of determining the revenue function in step S3 includes:
for having NkCluster of base stations
Figure FDA00034686740500000215
The successive interference cancellation decoding complexity increases exponentially with the number of base stations, i.e.
Figure FDA00034686740500000216
To describe cluster complexity; cluster group
Figure FDA00034686740500000217
The instantaneous profit and cluster cost of (c) is defined as:
Figure FDA00034686740500000218
wherein q iskFor a given cost per unit price, based on the above analysis, the revenue function for cell n to join the kth cluster is defined as:
Figure FDA00034686740500000219
wherein
Figure FDA00034686740500000220
Representing the relative proportion of contributions, v{n}Representing the benefit of a single cluster { n };
s4, constructing an optimization problem based on system throughput;
s5, determining a cluster central node based on a neighbor propagation algorithm;
s6, dynamic network resource allocation is carried out based on distributed reinforcement learning;
the step S6 includes:
s601, dividing each time slot into two stages, selecting a cluster center by each intelligent agent respectively to define the action and state space of each intelligent agent, and enabling the gain function of each stage to be as follows:
in time slot t, each agent first selects a cluster center in the first phase, i.e. the cluster center
Figure FDA0003468674050000031
Wherein the content of the first and second substances,
Figure FDA0003468674050000032
is equivalent to
Figure FDA0003468674050000033
In the second phase, the selection of transmission parameters is performed, and the action space of this phase is defined as
Figure FDA0003468674050000034
Wherein
Figure FDA0003468674050000035
And
Figure FDA0003468674050000036
respectively the uplink and downlink transmission power of the nth agent,
Figure FDA0003468674050000037
representing the transmission parameters of the intelligent agent n downlink transmission node to the user m; when the first stage is finished, the clustering structure is fixed, and the current N iskWhen the agents form a cluster, the agents N, N are the same as {1, …, N ∈kThe action space in the second stage is simplified to
Figure FDA0003468674050000038
Likewise, the state of the first stage is defined, i.e.
Figure FDA0003468674050000039
Wherein h isn(t) indicates channels associated with agent n, including uplink and downlink channels and interfering channels,
Figure FDA00034686740500000310
is a vector dependent on the members in the last slot cluster; if agents n and m were clustered in the last time slot
Figure FDA00034686740500000311
Then
Figure FDA00034686740500000312
The values of the respective n-th and m-th elements in (a) are 1, and the remaining values are 0; after the clustering of the first stage is completed, each agent can observe the members in the cluster at the current moment; thus, the state of the second stage is updated to
Figure FDA00034686740500000313
Subsequently, a two-stage revenue function is defined as
Figure FDA00034686740500000314
S602, constructing a multi-agent deep reinforcement learning framework, and solving the problem of clustering and transmission parameter distributed execution behaviors:
in time slot t, agent n first selects a cluster center with the aid of a DQN network
Figure FDA00034686740500000315
As a state
Figure FDA00034686740500000316
A function of (a); then, since the same cluster of agents can observe each other, the vector
Figure FDA00034686740500000317
Generate, update the state to
Figure FDA00034686740500000318
Each agent selects behavior according to the local state and the operator network in the DDPG structure
Figure FDA0003468674050000041
When the execution of the behavior is finished, the benefits of the two stages are respectively obtained
Figure FDA0003468674050000042
And
Figure FDA0003468674050000043
the environment jumps to the next state
Figure FDA0003468674050000044
And
Figure FDA0003468674050000045
after each intelligent agent obtains the behavior at the current moment, each cell in the networking distributively selects a respective cluster center and executes uplink and downlink sending parameters to realize the transmission of uplink and downlink signals; at each moment, each cell
Figure FDA0003468674050000046
In a distributed manner, and in a manner that cluster selection is performed
Figure FDA0003468674050000047
Signal transmission is carried out on the transmission parameters, so that the allocation of dynamic resources of the whole ultra-dense networking on a long time scale is realized;
s603, after the action execution is finished, the experience of two stages
Figure FDA0003468674050000048
And
Figure FDA0003468674050000049
are respectively stored in memory registers with a fixed length of M
Figure FDA00034686740500000410
And
Figure FDA00034686740500000411
performing the following steps; if the memory buffer is full, the old memory bar will be covered by the new memory bar; the trainer randomly extracts D memory training networks from the memory buffer;
DQN networks are mainly trained by minimizing the loss function, i.e.
Figure FDA00034686740500000412
Wherein
Figure FDA00034686740500000413
Is a corresponding target Q function, the parameter θ' of which will be periodically updated according to the value of θ, i.e.
θ′←(1-τ)θ+τθ′,
Where τ is a fixed update parameter; each agent is then equipped with a duplicate version of the target Q network, using the epsilon-greedy method and based on
Figure FDA00034686740500000414
A value selection behavior of; in DQN, buffers
Figure FDA00034686740500000415
Each memory in the memory contains the experience of all agents at a certain moment, i.e. the
Figure FDA00034686740500000416
For the training of the second stage, each agent has a DDPG structure which is composed of an Actor and a Critic network; the Actor network is used for taking actions in a distributed manner according to the current local observation, and the Critic is used for evaluating the quality of the Actor output action and guiding the Actor network to output a more effective strategy; therefore, the training of Critic and Actor is also performed on the centralized controller; wherein the Actor network is mainly trained by minimizing the following gradient function, i.e.
Figure FDA00034686740500000417
Wherein
Figure FDA00034686740500000418
Figure FDA00034686740500000419
Is used for evaluating the behavior selected by the current Actor for the output of the Critic network and finding a better gradient descent direction, munRepresenting the output strategy of the agent n in the second stage; critic is trained primarily by maximizing the loss function, i.e.
Figure FDA0003468674050000051
Wherein the content of the first and second substances,
Figure FDA0003468674050000052
is all target network at parameter θ'nThe output strategy is as follows; likewise, parameter θ'nAccording to thetanIs periodically updated, i.e. the value of
θ′n←(1-τ)θn+τθ′n.
Buffer memory
Figure FDA00034686740500000515
Each memory in the memory also contains the experience of all agents at a certain moment, i.e. the experience of all agents
Figure FDA0003468674050000054
2. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the ultra-dense networking model constructed in step S1 includes:
considering an ultra-dense network with N cells randomly deployed in a fixed area, wherein each cell is provided with a base station and a pair of uplink and downlink users corresponding to the base station; all transceivers in the system are equipped with an antenna, each base station communicates with the users in full-duplex or half-duplex mode, and all nodes operate on one frequency band.
3. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the clustering result for N cells in step S2 is:
setting clustering structure
Figure FDA0003468674050000055
Representing that N cells are divided into K clusters, and omega represents all feasible clustering structure sets; each cluster
Figure FDA0003468674050000056
Comprises one or more cells; binary variable
Figure FDA0003468674050000057
Indicating that the nth cell selects the kth cluster, otherwise
Figure FDA0003468674050000058
Each base station can only join one cluster at most, so
Figure FDA0003468674050000059
4. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the optimization problem described in step S4 includes:
Figure FDA00034686740500000510
Figure FDA00034686740500000511
Figure FDA00034686740500000512
Figure FDA00034686740500000513
wherein
Figure FDA00034686740500000514
All parameters are expressed on the whole time scale, T represents all time lengths, and gamma is equal to 0,1]Representing the discount coefficient, PULAnd PDLRepresents the uplink and downlink maximum transmit power; the purpose of the above optimization problem is to jointly optimize the cluster selection and the uplink and downlink transmission parameters of each cell to dynamically maximize the network on the long-term time scaleAverage total throughput of (1).
5. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the step S5 includes:
s501, defining the similarity between any two base stations as
Figure FDA0003468674050000061
Wherein L isnRepresents the geographical location of base station n;
s502, defining the responsibility degree and the availability degree between any base stations as
Figure FDA0003468674050000062
Figure FDA0003468674050000063
Wherein
Figure FDA0003468674050000064
Representing the degree of freedom with which base station n is selected by base station m as the cluster center,
Figure FDA0003468674050000065
representing the fitness of base station m to select base station n as the cluster center;
s503, calculating cluster center set
Figure FDA0003468674050000066
CN202110762110.1A 2021-07-06 2021-07-06 Dynamic resource allocation method for ultra-dense networking Active CN113490219B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110762110.1A CN113490219B (en) 2021-07-06 2021-07-06 Dynamic resource allocation method for ultra-dense networking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110762110.1A CN113490219B (en) 2021-07-06 2021-07-06 Dynamic resource allocation method for ultra-dense networking

Publications (2)

Publication Number Publication Date
CN113490219A CN113490219A (en) 2021-10-08
CN113490219B true CN113490219B (en) 2022-02-25

Family

ID=77941301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110762110.1A Active CN113490219B (en) 2021-07-06 2021-07-06 Dynamic resource allocation method for ultra-dense networking

Country Status (1)

Country Link
CN (1) CN113490219B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023067610A1 (en) * 2021-10-20 2023-04-27 Telefonaktiebolaget Lm Ericsson (Publ) A method for network configuration in dense networks
CN115038155B (en) * 2022-05-23 2023-02-07 香港中文大学(深圳) Ultra-dense multi-access-point dynamic cooperative transmission method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN110798849A (en) * 2019-10-10 2020-02-14 西北工业大学 Computing resource allocation and task unloading method for ultra-dense network edge computing
CN111565419A (en) * 2020-06-15 2020-08-21 河海大学常州校区 Delay optimization oriented collaborative edge caching algorithm in ultra-dense network
CN112601284A (en) * 2020-12-07 2021-04-02 南京邮电大学 Downlink multi-cell OFDMA resource allocation method based on multi-agent deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN110798849A (en) * 2019-10-10 2020-02-14 西北工业大学 Computing resource allocation and task unloading method for ultra-dense network edge computing
CN111565419A (en) * 2020-06-15 2020-08-21 河海大学常州校区 Delay optimization oriented collaborative edge caching algorithm in ultra-dense network
CN112601284A (en) * 2020-12-07 2021-04-02 南京邮电大学 Downlink multi-cell OFDMA resource allocation method based on multi-agent deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于深度强化学习的分布式资源管理;郑冰原等;《工业控制计算机》;20200525;第33卷(第5期);全文 *
基于深度强化学习的超密集网络资源分配;郑冰原等;《电子测量技术》;20200508;第43卷(第9期);全文 *

Also Published As

Publication number Publication date
CN113490219A (en) 2021-10-08

Similar Documents

Publication Publication Date Title
CN102113395B (en) Method of joint resource allocation and clustering of base stations
Wang et al. Stackelberg game for user clustering and power allocation in millimeter wave-NOMA systems
Zhang et al. Deep reinforcement learning for multi-agent power control in heterogeneous networks
US20160119941A1 (en) Method for managing wireless resource and apparatus therefor
Bhardwaj et al. Enhanced dynamic spectrum access in multiband cognitive radio networks via optimized resource allocation
Wang et al. Resource scheduling based on deep reinforcement learning in UAV assisted emergency communication networks
CN113490219B (en) Dynamic resource allocation method for ultra-dense networking
CN103249157B (en) The resource allocation methods based on cross-layer scheduling mechanism under imperfect CSI condition
CN111526592B (en) Non-cooperative multi-agent power control method used in wireless interference channel
CN111431646A (en) Dynamic resource allocation method in millimeter wave system
Qi et al. Energy-efficient resource allocation for UAV-assisted vehicular networks with spectrum sharing
Kim et al. Online learning-based downlink transmission coordination in ultra-dense millimeter wave heterogeneous networks
Xie et al. Joint power allocation and beamforming with users selection for cognitive radio networks via discrete stochastic optimization
Lima et al. User pairing and power allocation for UAV-NOMA systems based on multi-armed bandit framework
Gao et al. Reinforcement learning based resource allocation in cache-enabled small cell networks with mobile users
Mahmoud et al. Federated learning resource optimization and client selection for total energy minimization under outage, latency, and bandwidth constraints with partial or no CSI
Ghasemi et al. Spectrum allocation based on artificial bee colony in cognitive radio networks
Guo et al. Machine learning for predictive deployment of UAVs with multiple access
Chen et al. iPAS: A deep Monte Carlo Tree Search-based intelligent pilot-power allocation scheme for massive MIMO system
Ge et al. Reinforcement learning-based interference coordination for distributed MU-MIMO
Song et al. Maximizing packets collection in wireless powered IoT networks with charge-or-data time slots
CN115866787A (en) Network resource allocation method integrating terminal direct transmission communication and multi-access edge calculation
Ismath et al. Deep contextual bandits for fast initial access in mmWave based user-centric ultra-dense networks
Wang et al. Dynamic clustering and resource allocation using deep reinforcement learning for smart-duplex networks
Sun et al. Hierarchical Reinforcement Learning for AP Duplex Mode Optimization in Network-Assisted Full-Duplex Cell-Free Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant