CN113490219A - Dynamic resource allocation method for ultra-dense networking - Google Patents

Dynamic resource allocation method for ultra-dense networking Download PDF

Info

Publication number
CN113490219A
CN113490219A CN202110762110.1A CN202110762110A CN113490219A CN 113490219 A CN113490219 A CN 113490219A CN 202110762110 A CN202110762110 A CN 202110762110A CN 113490219 A CN113490219 A CN 113490219A
Authority
CN
China
Prior art keywords
cluster
base station
downlink
uplink
interference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110762110.1A
Other languages
Chinese (zh)
Other versions
CN113490219B (en
Inventor
黄川�
崔曙光
王丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese University of Hong Kong Shenzhen
Original Assignee
Chinese University of Hong Kong Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese University of Hong Kong Shenzhen filed Critical Chinese University of Hong Kong Shenzhen
Priority to CN202110762110.1A priority Critical patent/CN113490219B/en
Publication of CN113490219A publication Critical patent/CN113490219A/en
Application granted granted Critical
Publication of CN113490219B publication Critical patent/CN113490219B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/02Resource partitioning among network components, e.g. reuse partitioning
    • H04W16/10Dynamic resource partitioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • H04B7/0426Power distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • H04B7/0456Selection of precoding matrices or codebooks, e.g. using matrices antenna weighting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0453Resources in frequency domain, e.g. a carrier in FDMA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0473Wireless resource allocation based on the type of the allocated resource the resource being transmission power
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/54Allocation or scheduling criteria for wireless resources based on quality criteria
    • H04W72/541Allocation or scheduling criteria for wireless resources based on quality criteria using the level of interference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/54Allocation or scheduling criteria for wireless resources based on quality criteria
    • H04W72/542Allocation or scheduling criteria for wireless resources based on quality criteria using measured or perceived quality

Abstract

The invention discloses a dynamic resource allocation method for ultra-dense networking, which comprises the following steps: s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station; s2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference; s3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function; s4, constructing an optimization problem based on system throughput; s5, determining a cluster central node based on a neighbor propagation algorithm; and S6, carrying out dynamic network resource allocation based on distributed reinforcement learning. The invention can effectively coordinate the transmission of a plurality of cells, improve the network performance and maximize the network throughput through the grouping, power distribution and sending parameter design of each cell.

Description

Dynamic resource allocation method for ultra-dense networking
Technical Field
The invention relates to the field of wireless communication, in particular to a dynamic resource allocation method for ultra-dense networking.
Background
Ultra-dense networking is one of the key technologies for 5G communication, and will certainly be developed in the 5G era in the future. In the ultra-dense networking, the physical distance between each access point is greatly shortened, the transmitting power between the access points and the mobile user can be obviously reduced, and the wireless coverage also fully exploits the potential of frequency reuse. Meanwhile, the full-duplex technology enables the transceiver to simultaneously transmit and receive data in the same frequency spectrum, thereby improving the data transmission density to the maximum extent in the dimension of time and frequency and reducing the energy cost of a guard interval.
In recent years, researchers have combined ultra-dense networking with full-duplex technology, and by fully utilizing wireless resources in space, time and frequency dimensions, network throughput is improved, and energy consumption of a system is reduced. In full-duplex ultra-dense networking, each node is equipped with a low power transmitter, so that the self-interference cancellation present in a full-duplex system can easily be cancelled to a sufficiently low level. Furthermore, ultra-dense networking using full-duplex technology can achieve dual performance gains from both. However, interference in the system is also particularly severe due to the irregular distribution of a large number of cells in ultra-dense networking. In addition, residual self-interference still exists in the full-duplex node, and the interference in the full-duplex ultra-dense networking system environment is more complicated. Therefore, it is necessary to design a radio resource management method for full-duplex ultra-dense networking to ensure the quality of service for the user. The literature studies a two-layer ultra-dense network with a macro cell and a plurality of cells, and proposes a combined spectrum and power management scheme which maximally improves the total throughput of a full-duplex ultra-dense network under the constraints of given user service quality and cross-layer interference. Based on the same model, the literature considers the problems of joint user access, subchannel allocation and power control in full-duplex ultra-dense networking, and further provides the problems of joint capacity maximization and power minimization in the full-duplex ultra-dense networking under a user-centered transmission scheme. The centralized control type operation requires state information of all nodes and focuses only on a static wireless environment. In a practical dynamic wireless environment, it is impossible to collect instant information of all nodes in a large network.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a dynamic resource allocation method for ultra-dense networking, which can effectively coordinate the transmission of multiple cells, improve the network performance and maximize the network throughput through the design of grouping, power allocation and sending parameters of each cell.
The purpose of the invention is realized by the following technical scheme: a dynamic resource allocation method for ultra-dense networking comprises the following steps:
s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station;
the constructed ultra-dense networking model comprises the following steps:
considering an ultra-dense network with N cells randomly deployed in a fixed area, wherein each cell is provided with a base station and a pair of uplink and downlink users corresponding to the base station; all transceivers in the system are equipped with an antenna, each base station communicates with the users in full-duplex or half-duplex mode, and all nodes operate on one frequency band.
S2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference;
the clustering results for N cells are:
setting clustering structure
Figure BDA0003150322480000021
Representing that N cells are divided into K clusters, and omega represents all feasible clustering structure sets; each cluster
Figure BDA0003150322480000022
Comprises one or more cells; binary variable
Figure BDA0003150322480000023
Indicates that the nth cell is selectedk clusters, otherwise
Figure BDA0003150322480000024
Each base station can only join one cluster at most, so
Figure BDA0003150322480000025
S3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function;
the uplink transmission scheme comprises:
setting nth cell selection cluster
Figure BDA0003150322480000026
The transmission power of uplink users in the cell is
Figure BDA0003150322480000027
Of (2) a signal
Figure BDA0003150322480000028
To have
Figure BDA0003150322480000029
A physical virtual base station for each receiving antenna; modeling uplink transmission in each cluster as a multi-user single-input multi-output channel, and then receiving signals by the virtual base station group in the kth cluster are as follows:
Figure BDA00031503224800000210
wherein the content of the first and second substances,
Figure BDA00031503224800000211
representing the channel parameter from the uplink user n to the virtual base station in the cluster,
Figure BDA00031503224800000212
and
Figure BDA00031503224800000213
representing self-interfering channelsAnd from
Figure BDA00031503224800000214
The uplink and downlink inter-cluster interference channel of (a),
Figure BDA00031503224800000215
and
Figure BDA00031503224800000216
respectively representing co-cluster downlink interference signals and signals from
Figure BDA00031503224800000217
The uplink and downlink interference signals of (2),
Figure BDA00031503224800000218
representing an additive white Gaussian noise vector and satisfying by each member
Figure BDA00031503224800000219
In the base station group receiving signals, the second term is self-interference on the entity base station, and after self-interference elimination, residual self-interference is modeled to mean value of 0 and variance of zeta2Additive white gaussian noise of (1);
decoding the signals by a minimum mean square error serial interference elimination decoder to obtain
Figure BDA00031503224800000220
The inner uplink reachable rate is:
Figure BDA00031503224800000221
wherein the content of the first and second substances,
Figure BDA0003150322480000031
representing rank as NkIdentity matrix, inter-cluster interference matrix of
Figure BDA0003150322480000032
Expressed as:
Figure BDA0003150322480000033
wherein the content of the first and second substances,
Figure BDA0003150322480000034
is composed of
Figure BDA0003150322480000035
And (4) precoding matrixes of the nth user of the inner downlink.
The downlink transmission scheme comprises:
in downlink transmission, the virtual base station passes through the precoder
Figure BDA0003150322480000036
For each signal sent to downlink users
Figure BDA0003150322480000037
Carrying out pre-coding;
Figure BDA0003150322480000038
the downlink transmission can be modeled as a multi-input single-output channel, and the received signal of the nth downlink user in the kth cluster is:
Figure BDA0003150322480000039
wherein the content of the first and second substances,
Figure BDA00031503224800000310
indicating the channel parameters from the base station to the downlink user n,
Figure BDA00031503224800000311
and
Figure BDA00031503224800000312
indicating uplink interference channels within a cluster and from
Figure BDA00031503224800000313
The uplink and downlink inter-cluster interference channel of (a),
Figure BDA00031503224800000314
and
Figure BDA00031503224800000315
respectively representing co-cluster uplink interference signals and signals from
Figure BDA00031503224800000316
The uplink and downlink interference signals of (2),
Figure BDA00031503224800000317
representing additive white gaussian noise; based on the received signal, the reachable sum rate of the downlink in the cluster can be expressed as
Figure BDA00031503224800000318
In which inter-cluster interference
Figure BDA00031503224800000319
Is shown as
Figure BDA00031503224800000320
The process of determining the revenue function includes:
for having NkCluster of base stations
Figure BDA00031503224800000321
SIC decoding complexity increases exponentially with the number of base stations, i.e.
Figure BDA00031503224800000322
For describing cluster complexity, cluster group
Figure BDA00031503224800000323
The instantaneous profit and cluster cost of (c) is defined as:
Figure BDA00031503224800000324
wherein q iskFor a given cost per unit price, based on the above analysis, the revenue function for cell n to join the kth cluster is defined as:
Figure BDA0003150322480000041
wherein
Figure BDA0003150322480000042
Representing the relative proportion of contributions, v{n}Indicating the benefit of a single cluster n.
S4, constructing an optimization problem based on system throughput;
the optimization problem described in step S4 includes:
Figure BDA0003150322480000043
Figure BDA0003150322480000044
Figure BDA0003150322480000045
Figure BDA0003150322480000046
wherein
Figure BDA0003150322480000047
All parameters are expressed on the whole time scale, T represents all time lengths, and gamma is equal to 0,1]Representing the discount coefficient, PULAnd PDLIndicating maximum transmission in uplink and downlinkPower; the purpose of the above optimization problem is to jointly optimize the cluster selection and the uplink and downlink transmission parameters of each cell to dynamically maximize the average total throughput of the network over a long-term time scale.
S5, determining a cluster central node based on a neighbor propagation algorithm;
s501, defining the similarity between any two base stations as
Figure BDA0003150322480000048
Wherein L isnRepresents the geographical location of base station n;
s502, defining the responsibility degree and the availability degree between any base stations as
Figure BDA0003150322480000049
Figure BDA00031503224800000410
Wherein
Figure BDA00031503224800000413
Representing the degree of freedom with which base station n is selected by base station m as the cluster center,
Figure BDA00031503224800000411
representing the fitness of base station m to select base station n as the cluster center;
s503, calculating cluster center set
Figure BDA00031503224800000412
S6, dynamic network resource allocation is carried out based on distributed reinforcement learning:
s601, dividing each time slot into two stages, selecting a cluster center by each intelligent agent respectively to define the action and state space of each intelligent agent, and enabling the gain function of each stage to be as follows:
in time slot t, each agent first selects a cluster center in the first phase, i.e. the cluster center
Figure BDA0003150322480000051
Wherein the content of the first and second substances,
Figure BDA0003150322480000052
is equivalent to
Figure BDA0003150322480000053
In the second phase, the selection of transmission parameters is performed, and the action space of this phase is defined as
Figure BDA0003150322480000054
Wherein
Figure BDA0003150322480000055
And
Figure BDA0003150322480000056
respectively the uplink and downlink transmission power of the nth agent,
Figure BDA0003150322480000057
representing the transmission parameters of the intelligent agent n downlink transmission node to the user m; when the first stage is finished, the clustering structure is fixed, and the current N iskWhen the agents form a cluster, the agents N, N are the same as {1, …, N ∈kThe action space in the second stage is simplified to
Figure BDA0003150322480000058
Likewise, the state of the first stage is defined, i.e.
Figure BDA0003150322480000059
Wherein h isn(t) indicates channels associated with agent n, including uplink and downlink channels and interfering channels,
Figure BDA00031503224800000510
is a vector dependent on the members in the last slot cluster; if agents n and m were clustered in the last time slot
Figure BDA00031503224800000511
Then
Figure BDA00031503224800000512
The values of the respective n-th and m-th elements in (a) are 1, and the remaining values are 0; after the clustering of the first stage is completed, each agent can observe the members in the cluster at the current moment; thus, the state of the second stage is updated to
Figure BDA00031503224800000513
Subsequently, a two-stage revenue function is defined as
Figure BDA00031503224800000514
S602, constructing a multi-agent deep reinforcement learning framework, and solving the problem of clustering and transmission parameter distributed execution behaviors:
in time slot t, agent n first selects a cluster center with the aid of a DQN network
Figure BDA00031503224800000515
As a state
Figure BDA00031503224800000516
A function of (a); then, since the same cluster of agents can observe each other, the vector
Figure BDA00031503224800000517
Generate, update the state to
Figure BDA00031503224800000518
Each agent selects behavior according to the local state and the operator network in the DDPG structure
Figure BDA00031503224800000519
When the execution of the behavior is finished, the benefits of the two stages are respectively obtained
Figure BDA00031503224800000520
And
Figure BDA00031503224800000521
the environment jumps to the next state
Figure BDA00031503224800000522
And
Figure BDA00031503224800000523
after each intelligent agent obtains the behavior at the current moment, each cell in the networking distributively selects the respective cluster center and executes the uplink and downlink sending parameters, so that the transmission of uplink and downlink signals is realized. At each moment, each cell
Figure BDA0003150322480000061
In a distributed manner, and in a manner that cluster selection is performed
Figure BDA0003150322480000062
Signal transmission is carried out on the transmission parameters, so that the allocation of dynamic resources of the whole ultra-dense networking on a long time scale is realized;
s603, after the action execution is finished, the experience of two stages
Figure BDA0003150322480000063
And
Figure BDA0003150322480000064
are respectively stored in memory registers with a fixed length of M
Figure BDA0003150322480000065
And
Figure BDA0003150322480000066
performing the following steps; if the memory buffer is full, the old memory bar will be covered by the new memory bar; the trainer randomly extracts D memory training networks from the memory buffer;
DQN networks are mainly trained by minimizing the loss function, i.e.
Figure BDA0003150322480000067
Wherein
Figure BDA0003150322480000068
Is a corresponding target Q function, the parameter θ' of which will be periodically updated according to the value of θ, i.e.
θ′←(1-τ)θ+τθ′,
Where τ is a fixed update parameter; each agent is then equipped with a duplicate version of the target Q network, using the epsilon-greedy method and based on
Figure BDA0003150322480000069
A value selection behavior of; in DQN, buffers
Figure BDA00031503224800000610
Each memory in the memory contains the experience of all agents at a certain moment, i.e. the
Figure BDA00031503224800000611
For the second stage of training, each agent has a DDPG structure, which is composed of an Actor and a Critic network. The Actor network is used for taking actions in a distributed manner according to the current local observation, and the Critic is used for evaluating the quality of the Actor output action and guiding the Actor network to output a more effective strategy; therefore, the training of Critic and Actor is also performed on the centralized controller; wherein the Actor network is mainly trained by minimizing the following gradient function, i.e.
Figure BDA00031503224800000612
Wherein
Figure BDA00031503224800000613
Is used for evaluating the behavior selected by the current Actor for the output of the Critic network and finding a better gradient descent direction, munA strategy representing the output of the agent n in the second stage; critic is trained primarily by maximizing the loss function, i.e.
Figure BDA00031503224800000614
Wherein the content of the first and second substances,
Figure BDA00031503224800000615
is all target network at parameter θ'nThe output strategy is as follows; likewise, parameter θ'nAccording to thetanIs periodically updated, i.e. the value of
θ′n←(1-τ)θn+τθ′n.
Buffer memory
Figure BDA0003150322480000071
Each memory in the memory also contains the experience of all agents at a certain moment, i.e. the experience of all agents
Figure BDA0003150322480000072
The invention has the beneficial effects that: the invention can effectively coordinate the transmission of a plurality of cells, improve the network performance and maximize the network throughput through the grouping, power distribution and sending parameter design of each cell.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a schematic diagram of behavior execution for dynamic resource allocation;
fig. 3 is a schematic diagram of a simulation scenario of 10 cells in the embodiment;
FIG. 4 is a diagram illustrating average and profit under different clustering strategies in the embodiment;
FIG. 5 is a diagram illustrating average and profit under different duplexing modes in an embodiment;
fig. 6 is a diagram illustrating the ratio of the full-duplex base station in the embodiment as a function of time.
Detailed Description
The technical solutions of the present invention are further described in detail below with reference to the accompanying drawings, but the scope of the present invention is not limited to the following.
As shown in fig. 1, a dynamic resource allocation method for ultra-dense networking includes the following steps:
s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station;
considering an ultra-dense network with N cells randomly deployed in a fixed area, wherein each cell is provided with a base station and a pair of uplink and downlink users corresponding to the base station; all transceivers in the system are equipped with an antenna, each base station communicates with the users in full-duplex or half-duplex mode, and all nodes operate on one frequency band.
S2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference;
the clustering results for N cells are:
setting clustering structure
Figure BDA0003150322480000073
Representing that N cells are divided into K clusters, and omega represents all feasible clustering structure sets; each cluster
Figure BDA0003150322480000074
K is more than or equal to 1 and less than or equal to N and comprises one or more cells; binary variable
Figure BDA0003150322480000075
Indicating that the nth cell selects the kth cluster, otherwise
Figure BDA0003150322480000076
Each base station can only join one cluster at most, so
Figure BDA0003150322480000077
In order to increase the overall throughput of the network, how to form a cluster structure in order to more efficiently serve users becomes a key issue. Next, we will present a transmission model defining the revenue function for each member of any feasible cluster.
S3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function;
the uplink transmission scheme comprises:
setting nth cell selection cluster
Figure BDA0003150322480000081
The transmission power of uplink users in the cell is
Figure BDA0003150322480000082
Of (2) a signal
Figure BDA0003150322480000083
To have
Figure BDA0003150322480000084
A physical virtual base station for each receiving antenna; modeling uplink transmission in each cluster as a multi-user single-transmission multi-reception SIMO channel, and then receiving signals by the virtual base station group in the kth cluster are as follows:
Figure BDA0003150322480000085
wherein the content of the first and second substances,
Figure BDA0003150322480000086
representing the channel parameter from the uplink user n to the virtual base station in the cluster,
Figure BDA0003150322480000087
and
Figure BDA00031503224800000824
representing self-interfering channels and from
Figure BDA0003150322480000088
The uplink and downlink inter-cluster interference channel of (a),
Figure BDA0003150322480000089
and
Figure BDA00031503224800000810
respectively representing co-cluster downlink interference signals and signals from
Figure BDA00031503224800000811
The uplink and downlink interference signals of (2),
Figure BDA00031503224800000812
representing an additive white Gaussian noise vector and satisfying by each member
Figure BDA00031503224800000813
In the base station group receiving signals, the second term is self-interference on the entity base station, and after self-interference elimination, residual self-interference is modeled to mean value of 0 and variance of zeta2Additive white gaussian noise of (1);
decoding the signals by a minimum mean square error serial interference elimination decoder to obtain
Figure BDA00031503224800000814
The inner uplink reachable rate is:
Figure BDA00031503224800000815
wherein the content of the first and second substances,
Figure BDA00031503224800000816
representing rank as NkIdentity matrix, inter-cluster interference matrix of
Figure BDA00031503224800000817
Expressed as:
Figure BDA00031503224800000818
wherein the content of the first and second substances,
Figure BDA00031503224800000819
is composed of
Figure BDA00031503224800000820
And (4) precoding matrixes of the nth user of the inner downlink.
The downlink transmission scheme comprises:
in downlink transmission, the virtual base station passes through the precoder
Figure BDA00031503224800000821
For each signal sent to downlink users
Figure BDA00031503224800000822
Carrying out pre-coding;
Figure BDA00031503224800000823
the downlink transmission can be modeled as a multi-input single-output channel, and the received signal of the nth downlink user in the kth cluster is:
Figure BDA0003150322480000091
wherein the content of the first and second substances,
Figure BDA0003150322480000092
indicating the channel parameters from the base station to the downlink user n,
Figure BDA0003150322480000093
and
Figure BDA0003150322480000094
indicating uplink interference channels within a cluster and from
Figure BDA0003150322480000095
The uplink and downlink inter-cluster interference channel of (a),
Figure BDA0003150322480000096
and
Figure BDA0003150322480000097
respectively representing co-cluster uplink interference signals and signals from
Figure BDA0003150322480000098
The uplink and downlink interference signals of (2),
Figure BDA0003150322480000099
representing additive white gaussian noise; based on the received signal, the reachable sum rate of the downlink in the cluster can be expressed as
Figure BDA00031503224800000910
In which inter-cluster interference
Figure BDA00031503224800000911
Is shown as
Figure BDA00031503224800000912
The process of determining the revenue function includes:
for having NkCluster of base stations
Figure BDA00031503224800000913
SIC decoding complexity increases exponentially with the number of base stations, i.e.
Figure BDA00031503224800000914
For describing cluster complexity, cluster group
Figure BDA00031503224800000915
The instantaneous profit and cluster cost of (c) is defined as:
Figure BDA00031503224800000916
wherein q iskFor a given cost per unit price, based on the above analysis, the revenue function for cell n to join the kth cluster is defined as:
Figure BDA00031503224800000917
wherein
Figure BDA00031503224800000918
Representing the relative proportion of contributions, v{n}Indicating the benefit of a single cluster n. These cells want to select the appropriate clustering and transmission parameters to maximize long-term and revenue.
S4, constructing an optimization problem based on system throughput;
the optimization problem described in step S4 includes:
Figure BDA0003150322480000101
Figure BDA0003150322480000102
Figure BDA0003150322480000103
Figure BDA0003150322480000104
wherein
Figure BDA0003150322480000105
All parameters are expressed on the whole time scale, T represents all time lengths, and gamma is equal to 0,1]Representing the discount coefficient, PULAnd PDLRepresents the uplink and downlink maximum transmit power; the purpose of the above optimization problem is to jointly optimize the cluster selection and the uplink and downlink transmission parameters of each cell to dynamically maximize the average total throughput of the network over a long-term time scale.
S5, determining a cluster central node based on a neighbor propagation algorithm;
s501, defining the similarity between any two base stations as
Figure BDA0003150322480000106
Wherein L isnRepresents the geographical location of base station n;
s502, defining the responsibility degree and the availability degree between any base stations as
Figure BDA0003150322480000107
Figure BDA0003150322480000108
Wherein
Figure BDA0003150322480000109
Representing the degree of freedom with which base station n is selected by base station m as the cluster center,
Figure BDA00031503224800001010
representing the fitness of base station m to select base station n as the cluster center;
s503, calculating cluster center set
Figure BDA00031503224800001011
S6, dynamic network resource allocation is carried out based on distributed reinforcement learning, each cell is modeled as an intelligent agent object, and resource allocation is used as a behavior set of the intelligent agent:
s601, dividing each time slot into two stages, selecting a cluster center by each intelligent agent respectively to define the action and state space of each intelligent agent, and enabling the gain function of each stage to be as follows:
in time slot t, each agent first selects a cluster center in the first phase, i.e. the cluster center
Figure BDA00031503224800001012
Wherein the content of the first and second substances,
Figure BDA0003150322480000111
is equivalent to
Figure BDA0003150322480000112
In the second phase, the selection of transmission parameters is performed, and the action space of this phase is defined as
Figure BDA0003150322480000113
Wherein
Figure BDA0003150322480000114
And
Figure BDA0003150322480000115
respectively the uplink and downlink transmission power of the nth agent,
Figure BDA0003150322480000116
representing the transmission parameters of the intelligent agent n downlink transmission node to the user m; when the first stage is finished, the clustering structure is fixed, and the current N iskWhen the agents form a cluster, the agents N, N are the same as {1, …, N ∈kThe action space in the second stage is simplified to
Figure BDA0003150322480000117
Likewise, the state of the first stage is defined, i.e.
Figure BDA0003150322480000118
Wherein h isn(t) indicates channels associated with agent n, including uplink and downlink channels and interfering channels,
Figure BDA0003150322480000119
is a vector dependent on the members in the last slot cluster; if agents n and m were clustered in the last time slot
Figure BDA00031503224800001110
Then
Figure BDA00031503224800001111
The values of the respective n-th and m-th elements in (a) are 1, and the remaining values are 0; after the clustering of the first stage is completed, each agent can observe the members in the cluster at the current moment; thus, the state of the second stage is updated to
Figure BDA00031503224800001112
Subsequently, a two-stage revenue function is defined as
Figure BDA00031503224800001113
S602, a multi-agent deep reinforcement learning architecture is constructed to solve the problem of clustering and transmission parameter distributed execution behavior, as shown in fig. 2:
in time slot t, agent n first selects a cluster center with the aid of a DQN network
Figure BDA00031503224800001114
As a state
Figure BDA00031503224800001115
A function of (a); then, since the same cluster of agents can observe each other, the vector
Figure BDA00031503224800001116
Generate, update the state to
Figure BDA00031503224800001117
Each agent selects behavior according to the local state and the operator network in the DDPG structure
Figure BDA00031503224800001118
When the execution of the behavior is finished, the benefits of the two stages are respectively obtained
Figure BDA00031503224800001119
And
Figure BDA00031503224800001120
the environment jumps to the next state
Figure BDA00031503224800001121
And
Figure BDA00031503224800001122
after each intelligent agent obtains the behavior at the current moment, each cell in the networking distributively selects the respective cluster center and executes the uplink and downlink sending parameters, so that the transmission of uplink and downlink signals is realized. At each moment, each cell
Figure BDA00031503224800001123
In a distributed manner, and in a manner that cluster selection is performed
Figure BDA00031503224800001124
Signal transmission is carried out on the transmission parameters, so that the allocation of dynamic resources of the whole ultra-dense networking on a long time scale is realized;
s603, after the action execution is finished, the experience of two stages
Figure BDA0003150322480000121
And
Figure BDA0003150322480000122
are respectively stored in memory registers with a fixed length of M
Figure BDA0003150322480000123
And
Figure BDA0003150322480000124
performing the following steps; if the memory buffer is full, the old memory bar will be covered by the new memory bar; the trainer randomly extracts D memory training networks from the memory buffer;
DQN networks are mainly trained by minimizing the loss function, i.e.
Figure BDA0003150322480000125
Wherein
Figure BDA0003150322480000126
Is a corresponding target Q function, the parameter θ' of which will be periodically updated according to the value of θ, i.e.
θ′←(1-τ)θ+τθ′,
Where τ is a fixed update parameter; each agent is then equipped with a duplicate version of the target Q network, using the epsilon-greedy method and based on
Figure BDA0003150322480000127
A value selection behavior of; in DQN, buffers
Figure BDA0003150322480000128
Each memory in the memory contains the experience of all agents at a certain moment, i.e. the
Figure BDA0003150322480000129
For the second stage of training, each agent has a DDPG structure, which is composed of an Actor and a Critic network. The Actor network is used for taking actions in a distributed manner according to the current local observation, and the Critic is used for evaluating the quality of the Actor output action and guiding the Actor network to output a more effective strategy; therefore, the training of Critic and Actor is also performed on the centralized controller; wherein the Actor network is mainly trained by minimizing the following gradient function, i.e.
Figure BDA00031503224800001210
Wherein
Figure BDA00031503224800001211
The behavior evaluation method is used for evaluating the behavior selected by the current Actor for the output of the Critic network and finding a better gradient descending direction for the behavior; mu.snRepresenting the output strategy of the agent n in the second stage; critic is trained primarily by maximizing the loss function, i.e.
Figure BDA00031503224800001212
Wherein the content of the first and second substances,
Figure BDA00031503224800001213
is all target network at parameter θ'nThe output strategy is as follows; likewise, parameter θ'nAccording to thetanIs periodically updated, i.e. the value of
θ′n←(1-τ)θn+τθ′n.
Buffer memory
Figure BDA00031503224800001214
Each memory in the memory also contains the experience of all agents at a certain moment, i.e. the experience of all agents
Figure BDA0003150322480000131
In the embodiments of the present application, some simulation results obtained by applying the above algorithm are given. In the simulation scenario, cells generated by 10 two-dimensional poisson point processes in a fixed area of 40 meters by 50 meters are considered, the cell density is 5000 cells per square kilometer, and the radius of each cell is 5 meters, as shown in fig. 3. Maximum uplink transmit power PUL20dB, the maximum downlink transmission power is P UL25 dB. The path loss model is 140.7+36.7log10(d) Where d is the distance from the sender to the receiver. The standard deviation of the shadow fading is set to 8dB and the Gaussian white noise power sigma2Time per slot T of-30 dBd100ms, maximum Doppler frequency fd=10Hz。
Next, we define some hyper-parameters in the neural network. In the algorithm, all the neural networks have an input layer, an output layer and three hidden layers, and the hidden layers respectively have 256 neurons. For the 10-cell simulation scenario, there are 112 state input neurons. The first stage of behavioral neurons has
Figure BDA0003150322480000132
A one, wherein
Figure BDA0003150322480000133
The magnitude of (a) is obtained by an approximate propagation algorithm, and there are 12 behavioral neurons of the second node. The activation functions in the neural network are set as ReLu functions, and learning is carried out by adopting a fixed learning rate. Initial learning rate of the DQN network is set to 10-3Initiation of Actor and Critic networksThe learning rates are respectively set to 10-3And 10-4We have implemented the proposed algorithm with TensorFlow. FIG. 4 compares the four scenarios of no grouping, grouping into one group, K-means algorithm grouping, random grouping and the proposed dynamic grouping algorithm scenario. Wherein, the node group unit overhead q is qk=0.06,
Figure BDA0003150322480000134
Residual self-interference power is xi2-20 dB. It can be seen that the proposed algorithm can achieve convergence at 6000 slots. Compared with the other four grouping algorithms, the average system benefit obtained by the dynamic grouping algorithm is higher. In addition, we compare the average gains of the proposed flexible duplex mode with systems in full-duplex, half-duplex mode, where each node considers full-power transmission and half-duplex considers frequency-division duplex. As can be seen from fig. 5, the performance of the proposed algorithm to support full half-duplex flexible switching mode is better than full duplex and also better than half duplex. Fig. 6 also shows the proportion of the total base station occupied by the full-duplex base station in the proposed algorithm as the environment changes. Compared with a full duplex mode of full power transmission, the algorithm can achieve greater system throughput with less power consumption; compared with the half-duplex mode, the system throughput is remarkably improved.
Finally, we fix the overhead of the node group unit as q is 0.06, and study the influence of the self-interference cancellation capability on the proportion of the base stations in the system which adopt the full half-duplex mode. We respectively give 5 levels of residual self-interference power in the range of-20 dB to 0dB, and study the base station proportion of full half-duplex mode under different clustering strategies. As can be seen from the table, the proportion of base stations using the full-duplex mode in the proposed dynamic clustering algorithm is smaller under the same self-interference cancellation capability. The combination of the simulation results can show that the full-duplex base station has smaller occupation ratio under the condition of obtaining higher benefit by the proposed algorithm under the condition of low cluster cost, thereby having higher energy efficiency.
Table 1 full duplex base station occupancy ratio
Figure BDA0003150322480000135
Figure BDA0003150322480000141
While the foregoing description shows and describes the preferred embodiments of the present invention, it is to be understood, as noted above, that the invention is not limited to the forms disclosed herein, but is not intended to be exhaustive or to exclude other embodiments and may be used in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept described herein, as determined by the above teachings or as determined by the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A dynamic resource allocation method facing ultra-dense networking is characterized in that: the method comprises the following steps:
s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station;
s2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference;
s3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function;
s4, constructing an optimization problem based on system throughput;
s5, determining a cluster central node based on a neighbor propagation algorithm;
and S6, carrying out dynamic network resource allocation based on distributed reinforcement learning.
2. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the ultra-dense networking model constructed in step S1 includes:
considering an ultra-dense network with N cells randomly deployed in a fixed area, wherein each cell is provided with a base station and a pair of uplink and downlink users corresponding to the base station; all transceivers in the system are equipped with an antenna, each base station communicates with the users in full-duplex or half-duplex mode, and all nodes operate on one frequency band.
3. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the clustering result for N cells in step S2 is:
setting clustering structure
Figure FDA0003150322470000011
Representing that N cells are divided into K clusters, and omega represents all feasible clustering structure sets; each cluster
Figure FDA0003150322470000012
Comprises one or more cells; binary variable
Figure FDA0003150322470000013
Indicating that the nth cell selects the kth cluster, otherwise
Figure FDA0003150322470000014
Each base station can only join one cluster at most, so
Figure FDA0003150322470000015
4. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the uplink transmission scheme described in step S3 includes:
setting nth cell selection cluster
Figure FDA0003150322470000016
The transmission power of uplink users in the cell is
Figure FDA0003150322470000017
Of (2) a signal
Figure FDA0003150322470000018
To have
Figure FDA0003150322470000019
A physical virtual base station for each receiving antenna; modeling uplink transmission in each cluster as a multi-user single-input multi-output channel, and then receiving signals by the virtual base station group in the kth cluster are as follows:
Figure FDA00031503224700000110
wherein the content of the first and second substances,
Figure FDA00031503224700000111
representing the channel parameter from the uplink user n to the virtual base station in the cluster,
Figure FDA00031503224700000112
and
Figure FDA0003150322470000021
representing self-interfering channels and from
Figure FDA0003150322470000022
The uplink and downlink inter-cluster interference channel of (a),
Figure FDA0003150322470000023
and
Figure FDA0003150322470000024
respectively representing co-cluster downlink interference signals and signals from
Figure FDA0003150322470000025
The uplink and downlink interference signals of (2),
Figure FDA0003150322470000026
representing an additive white Gaussian noise vector and satisfying by each member
Figure FDA0003150322470000027
In the base station group receiving signals, the second term is self-interference on the entity base station, and after self-interference elimination, residual self-interference is modeled to mean value of 0 and variance of zeta2Additive white gaussian noise of (1);
decoding the received signal by using a minimum mean square error serial interference elimination decoder to obtain
Figure FDA0003150322470000028
The inner uplink reachable rate is:
Figure FDA0003150322470000029
wherein the content of the first and second substances,
Figure FDA00031503224700000210
representing rank as NkIdentity matrix, inter-cluster interference matrix of
Figure FDA00031503224700000211
Expressed as:
Figure FDA00031503224700000212
wherein the content of the first and second substances,
Figure FDA00031503224700000213
is composed of
Figure FDA00031503224700000214
And (4) precoding matrixes of the nth user of the inner downlink.
5. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the downlink transmission scheme described in step S3 includes:
in downlink transmission, the virtual base station passes through the precoder
Figure FDA00031503224700000215
For each signal sent to downlink users
Figure FDA00031503224700000216
Carrying out pre-coding;
Figure FDA00031503224700000217
the downlink transmission can be modeled as a multi-input single-output channel, and the received signal of the nth downlink user in the kth cluster is:
Figure FDA00031503224700000218
wherein the content of the first and second substances,
Figure FDA00031503224700000219
indicating the channel parameters from the base station to the downlink user n,
Figure FDA00031503224700000220
and
Figure FDA00031503224700000221
indicating uplink interference channels within a cluster and from
Figure FDA00031503224700000222
The uplink and downlink inter-cluster interference channel of (a),
Figure FDA00031503224700000223
and
Figure FDA00031503224700000224
respectively representing co-cluster uplink interference signals and signals from
Figure FDA00031503224700000225
The uplink and downlink interference signals of (2),
Figure FDA00031503224700000226
representing additive white gaussian noise; based on the received signal, the reachable sum rate of the downlink in the cluster can be expressed as
Figure FDA0003150322470000031
Wherein inter-cluster interference
Figure FDA0003150322470000032
Is shown as
Figure FDA0003150322470000033
6. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the process of determining the revenue function in step S3 includes:
for having NkCluster of base stations
Figure FDA0003150322470000034
The successive interference cancellation decoding complexity increases exponentially with the number of base stations, i.e.
Figure FDA0003150322470000035
To describe cluster complexity. Cluster group
Figure FDA0003150322470000036
The instantaneous profit and cluster cost of (c) is defined as:
Figure FDA0003150322470000037
wherein q iskFor a given cost per unit price, based on the above analysis, the revenue function for cell n to join the kth cluster is defined as:
Figure FDA0003150322470000038
wherein
Figure FDA0003150322470000039
Representing the relative proportion of contributions, v{n}Indicating the benefit of a single cluster n.
7. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the optimization problem described in step S4 includes:
Figure FDA00031503224700000310
Figure FDA00031503224700000311
Figure FDA00031503224700000312
Figure FDA00031503224700000313
wherein
Figure FDA00031503224700000314
All parameters are expressed on the whole time scale, T represents all time lengths, and gamma is equal to 0,1]Representing the discount coefficient, PULAnd PDLRepresents the uplink and downlink maximum transmit power; the purpose of the above optimization problem is to jointly optimize the cluster selection and the uplink and downlink transmission parameters of each cell to dynamically maximize the average total throughput of the network over a long-term time scale.
8. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the step S5 includes:
s501, defining the similarity between any two base stations as
Figure FDA0003150322470000041
Wherein L isnRepresents the geographical location of base station n;
s502, defining the responsibility degree and the availability degree between any base stations as
Figure FDA0003150322470000042
Figure FDA0003150322470000043
Wherein
Figure FDA0003150322470000044
Representing the degree of freedom with which base station n is selected by base station m as the cluster center,
Figure FDA0003150322470000045
representing the fitness of base station m to select base station n as the cluster center;
s503, calculating cluster center set
Figure FDA0003150322470000046
9. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the step S6 includes:
s601, dividing each time slot into two stages, selecting a cluster center by each intelligent agent respectively to define the action and state space of each intelligent agent, and enabling the gain function of each stage to be as follows:
in time slot t, each agent first selects a cluster center in the first phase, i.e. the cluster center
Figure FDA0003150322470000047
Wherein the content of the first and second substances,
Figure FDA0003150322470000048
is equivalent to
Figure FDA0003150322470000049
In the second phase, the selection of transmission parameters is performed, and the action space of this phase is defined as
Figure FDA00031503224700000410
Wherein
Figure FDA00031503224700000411
And
Figure FDA00031503224700000412
respectively the uplink and downlink transmission power of the nth agent,
Figure FDA00031503224700000413
representing the transmission parameters of the intelligent agent n downlink transmission node to the user m; when the first stage is finished, the clustering structure is fixed, and the current N iskWhen the agents form a cluster, the agents N, N are the same as {1, …, N ∈kThe action space in the second stage is simplified to
Figure FDA00031503224700000414
Likewise, the state of the first stage is defined, i.e.
Figure FDA0003150322470000051
Wherein h isn(t) indicates channels associated with agent n, including uplink and downlink channels and interfering channels,
Figure FDA0003150322470000052
is a vector dependent on the members in the last slot cluster; if agents n and m were clustered in the last time slot
Figure FDA0003150322470000053
Then
Figure FDA0003150322470000054
The values of the respective n-th and m-th elements in (a) are 1, and the remaining values are 0; after the clustering of the first stage is completed, each agent can observe the members in the cluster at the current moment; thus, the state of the second stage is updated to
Figure FDA0003150322470000055
Subsequently, a two-stage revenue function is defined as
Figure FDA0003150322470000056
S602, constructing a multi-agent deep reinforcement learning framework, and solving the problem of clustering and transmission parameter distributed execution behaviors:
in time slot t, agent n first selects a cluster center with the aid of a DQN network
Figure FDA0003150322470000057
As a state
Figure FDA0003150322470000058
A function of (a); then, since the same cluster of agents can observe each other, the vector
Figure FDA0003150322470000059
Generate, update the state to
Figure FDA00031503224700000510
Each agent selects behavior according to the local state and the operator network in the DDPG structure
Figure FDA00031503224700000511
When the execution of the behavior is finished, the benefits of the two stages are respectively obtained
Figure FDA00031503224700000512
And
Figure FDA00031503224700000513
the environment jumps to the next state
Figure FDA00031503224700000514
And
Figure FDA00031503224700000515
when each agent gets the action at the current moment, each agent in the network is organizedThe cells distributively select respective cluster centers and execute uplink and downlink sending parameters to realize the transmission of uplink and downlink signals; at each moment, each cell
Figure FDA00031503224700000516
In a distributed manner, and in a manner that cluster selection is performed
Figure FDA00031503224700000517
Signal transmission is carried out on the transmission parameters, so that the allocation of dynamic resources of the whole ultra-dense networking on a long time scale is realized;
s603, after the action execution is finished, the experience of two stages
Figure FDA00031503224700000518
And
Figure FDA00031503224700000519
are respectively stored in memory registers with a fixed length of M
Figure FDA00031503224700000520
And
Figure FDA00031503224700000521
performing the following steps; if the memory buffer is full, the old memory bar will be covered by the new memory bar; the trainer randomly extracts D memory training networks from the memory buffer;
DQN networks are mainly trained by minimizing the loss function, i.e.
Figure FDA00031503224700000522
Wherein
Figure FDA00031503224700000523
Figure FDA00031503224700000524
Is a corresponding target Q function, the parameter θ' of which will be periodically updated according to the value of θ, i.e.
θ′←(1-τ)θ+τθ′,
Where τ is a fixed update parameter; each agent is then equipped with a duplicate version of the target Q network, using the epsilon-greedy method and based on
Figure FDA0003150322470000061
A value selection behavior of; in DQN, buffers
Figure FDA0003150322470000062
Each memory in the memory contains the experience of all agents at a certain moment, i.e. the
Figure FDA0003150322470000063
For the training of the second stage, each agent has a DDPG structure which is composed of an Actor and a Critic network; the Actor network is used for taking actions in a distributed manner according to the current local observation, and the Critic is used for evaluating the quality of the Actor output action and guiding the Actor network to output a more effective strategy; therefore, the training of Critic and Actor is also performed on the centralized controller; wherein the Actor network is mainly trained by minimizing the following gradient function, i.e.
Figure FDA0003150322470000064
Wherein
Figure FDA0003150322470000065
Figure FDA0003150322470000066
Is used for evaluating the behavior of the current Actor selection for the output of the Critic network and finding better gradient descent for the behaviorDirection, μnRepresenting the output strategy of the agent n in the second stage; critic is trained primarily by maximizing the loss function, i.e.
Figure FDA0003150322470000067
Wherein the content of the first and second substances,
Figure FDA0003150322470000068
is all target network at parameter θ'nThe output strategy is as follows; likewise, parameter θ'nAccording to thetanIs periodically updated, i.e. the value of
θ′n←(1-τ)θn+τθ′n.
Buffer memory
Figure FDA0003150322470000069
Each memory in the list also contains all the experiences of the agent at a certain time, namely:
Figure FDA00031503224700000610
CN202110762110.1A 2021-07-06 2021-07-06 Dynamic resource allocation method for ultra-dense networking Active CN113490219B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110762110.1A CN113490219B (en) 2021-07-06 2021-07-06 Dynamic resource allocation method for ultra-dense networking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110762110.1A CN113490219B (en) 2021-07-06 2021-07-06 Dynamic resource allocation method for ultra-dense networking

Publications (2)

Publication Number Publication Date
CN113490219A true CN113490219A (en) 2021-10-08
CN113490219B CN113490219B (en) 2022-02-25

Family

ID=77941301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110762110.1A Active CN113490219B (en) 2021-07-06 2021-07-06 Dynamic resource allocation method for ultra-dense networking

Country Status (1)

Country Link
CN (1) CN113490219B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115038155A (en) * 2022-05-23 2022-09-09 香港中文大学(深圳) Ultra-dense multi-access-point dynamic cooperative transmission method
WO2023067610A1 (en) * 2021-10-20 2023-04-27 Telefonaktiebolaget Lm Ericsson (Publ) A method for network configuration in dense networks

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN110798849A (en) * 2019-10-10 2020-02-14 西北工业大学 Computing resource allocation and task unloading method for ultra-dense network edge computing
CN111565419A (en) * 2020-06-15 2020-08-21 河海大学常州校区 Delay optimization oriented collaborative edge caching algorithm in ultra-dense network
CN112601284A (en) * 2020-12-07 2021-04-02 南京邮电大学 Downlink multi-cell OFDMA resource allocation method based on multi-agent deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN110798849A (en) * 2019-10-10 2020-02-14 西北工业大学 Computing resource allocation and task unloading method for ultra-dense network edge computing
CN111565419A (en) * 2020-06-15 2020-08-21 河海大学常州校区 Delay optimization oriented collaborative edge caching algorithm in ultra-dense network
CN112601284A (en) * 2020-12-07 2021-04-02 南京邮电大学 Downlink multi-cell OFDMA resource allocation method based on multi-agent deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
郑冰原等: "基于深度强化学习的分布式资源管理", 《工业控制计算机》 *
郑冰原等: "基于深度强化学习的超密集网络资源分配", 《电子测量技术》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023067610A1 (en) * 2021-10-20 2023-04-27 Telefonaktiebolaget Lm Ericsson (Publ) A method for network configuration in dense networks
CN115038155A (en) * 2022-05-23 2022-09-09 香港中文大学(深圳) Ultra-dense multi-access-point dynamic cooperative transmission method
CN115038155B (en) * 2022-05-23 2023-02-07 香港中文大学(深圳) Ultra-dense multi-access-point dynamic cooperative transmission method

Also Published As

Publication number Publication date
CN113490219B (en) 2022-02-25

Similar Documents

Publication Publication Date Title
Hu et al. A deep reinforcement learning-based framework for dynamic resource allocation in multibeam satellite systems
CN110401964B (en) Power control method based on deep learning for user-oriented center network
CN102113395B (en) Method of joint resource allocation and clustering of base stations
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
Wang et al. Stackelberg game for user clustering and power allocation in millimeter wave-NOMA systems
Zhang et al. Deep reinforcement learning for multi-agent power control in heterogeneous networks
US20160119941A1 (en) Method for managing wireless resource and apparatus therefor
Wang et al. Resource scheduling based on deep reinforcement learning in UAV assisted emergency communication networks
Bhardwaj et al. Enhanced dynamic spectrum access in multiband cognitive radio networks via optimized resource allocation
CN113490219B (en) Dynamic resource allocation method for ultra-dense networking
CN103249157B (en) The resource allocation methods based on cross-layer scheduling mechanism under imperfect CSI condition
CN111431646B (en) Dynamic resource allocation method in millimeter wave system
Qi et al. Energy-efficient resource allocation for UAV-assisted vehicular networks with spectrum sharing
Kim et al. Online learning-based downlink transmission coordination in ultra-dense millimeter wave heterogeneous networks
Xie et al. Joint power allocation and beamforming with users selection for cognitive radio networks via discrete stochastic optimization
CN110445518B (en) Pilot frequency distribution method based on micro cell clustering under large-scale MIMO heterogeneous network system
Lima et al. User pairing and power allocation for UAV-NOMA systems based on multi-armed bandit framework
Ghasemi et al. Spectrum allocation based on artificial bee colony in cognitive radio networks
Guo et al. Machine learning for predictive deployment of UAVs with multiple access
Chen et al. iPAS: A deep Monte Carlo Tree Search-based intelligent pilot-power allocation scheme for massive MIMO system
Song et al. Maximizing packets collection in wireless powered IoT networks with charge-or-data time slots
CN115866787A (en) Network resource allocation method integrating terminal direct transmission communication and multi-access edge calculation
Ismath et al. Deep contextual bandits for fast initial access in mmWave based user-centric ultra-dense networks
Wang et al. Dynamic clustering and resource allocation using deep reinforcement learning for smart-duplex networks
CN115811788A (en) D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant