CN113490219A

CN113490219A - Dynamic resource allocation method for ultra-dense networking

Info

Publication number: CN113490219A
Application number: CN202110762110.1A
Authority: CN
Inventors: 黄川�; 崔曙光; 王丹
Original assignee: Chinese University of Hong Kong Shenzhen
Current assignee: Chinese University of Hong Kong Shenzhen
Priority date: 2021-07-06
Filing date: 2021-07-06
Publication date: 2021-10-08
Anticipated expiration: 2041-07-06
Also published as: CN113490219B

Abstract

The invention discloses a dynamic resource allocation method for ultra-dense networking, which comprises the following steps: s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station; s2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference; s3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function; s4, constructing an optimization problem based on system throughput; s5, determining a cluster central node based on a neighbor propagation algorithm; and S6, carrying out dynamic network resource allocation based on distributed reinforcement learning. The invention can effectively coordinate the transmission of a plurality of cells, improve the network performance and maximize the network throughput through the grouping, power distribution and sending parameter design of each cell.

Description

Dynamic resource allocation method for ultra-dense networking

Technical Field

The invention relates to the field of wireless communication, in particular to a dynamic resource allocation method for ultra-dense networking.

Background

Ultra-dense networking is one of the key technologies for 5G communication, and will certainly be developed in the 5G era in the future. In the ultra-dense networking, the physical distance between each access point is greatly shortened, the transmitting power between the access points and the mobile user can be obviously reduced, and the wireless coverage also fully exploits the potential of frequency reuse. Meanwhile, the full-duplex technology enables the transceiver to simultaneously transmit and receive data in the same frequency spectrum, thereby improving the data transmission density to the maximum extent in the dimension of time and frequency and reducing the energy cost of a guard interval.

In recent years, researchers have combined ultra-dense networking with full-duplex technology, and by fully utilizing wireless resources in space, time and frequency dimensions, network throughput is improved, and energy consumption of a system is reduced. In full-duplex ultra-dense networking, each node is equipped with a low power transmitter, so that the self-interference cancellation present in a full-duplex system can easily be cancelled to a sufficiently low level. Furthermore, ultra-dense networking using full-duplex technology can achieve dual performance gains from both. However, interference in the system is also particularly severe due to the irregular distribution of a large number of cells in ultra-dense networking. In addition, residual self-interference still exists in the full-duplex node, and the interference in the full-duplex ultra-dense networking system environment is more complicated. Therefore, it is necessary to design a radio resource management method for full-duplex ultra-dense networking to ensure the quality of service for the user. The literature studies a two-layer ultra-dense network with a macro cell and a plurality of cells, and proposes a combined spectrum and power management scheme which maximally improves the total throughput of a full-duplex ultra-dense network under the constraints of given user service quality and cross-layer interference. Based on the same model, the literature considers the problems of joint user access, subchannel allocation and power control in full-duplex ultra-dense networking, and further provides the problems of joint capacity maximization and power minimization in the full-duplex ultra-dense networking under a user-centered transmission scheme. The centralized control type operation requires state information of all nodes and focuses only on a static wireless environment. In a practical dynamic wireless environment, it is impossible to collect instant information of all nodes in a large network.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a dynamic resource allocation method for ultra-dense networking, which can effectively coordinate the transmission of multiple cells, improve the network performance and maximize the network throughput through the design of grouping, power allocation and sending parameters of each cell.

The purpose of the invention is realized by the following technical scheme: a dynamic resource allocation method for ultra-dense networking comprises the following steps:

s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station;

the constructed ultra-dense networking model comprises the following steps:

considering an ultra-dense network with N cells randomly deployed in a fixed area, wherein each cell is provided with a base station and a pair of uplink and downlink users corresponding to the base station; all transceivers in the system are equipped with an antenna, each base station communicates with the users in full-duplex or half-duplex mode, and all nodes operate on one frequency band.

S2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference;

the clustering results for N cells are:

setting clustering structure

Representing that N cells are divided into K clusters, and omega represents all feasible clustering structure sets; each cluster

Comprises one or more cells; binary variable

Indicates that the nth cell is selectedk clusters, otherwise

Each base station can only join one cluster at most, so

S3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function;

the uplink transmission scheme comprises:

setting nth cell selection cluster

The transmission power of uplink users in the cell is

Of (2) a signal

To have

A physical virtual base station for each receiving antenna; modeling uplink transmission in each cluster as a multi-user single-input multi-output channel, and then receiving signals by the virtual base station group in the kth cluster are as follows:

wherein the content of the first and second substances,

representing the channel parameter from the uplink user n to the virtual base station in the cluster,

and

representing self-interfering channelsAnd from

The uplink and downlink inter-cluster interference channel of (a),

and

respectively representing co-cluster downlink interference signals and signals from

The uplink and downlink interference signals of (2),

representing an additive white Gaussian noise vector and satisfying by each member

In the base station group receiving signals, the second term is self-interference on the entity base station, and after self-interference elimination, residual self-interference is modeled to mean value of 0 and variance of zeta²Additive white gaussian noise of (1);

decoding the signals by a minimum mean square error serial interference elimination decoder to obtain

The inner uplink reachable rate is:

wherein the content of the first and second substances,

representing rank as N_kIdentity matrix, inter-cluster interference matrix of

Expressed as:

wherein the content of the first and second substances,

is composed of

And (4) precoding matrixes of the nth user of the inner downlink.

The downlink transmission scheme comprises:

in downlink transmission, the virtual base station passes through the precoder

For each signal sent to downlink users

Carrying out pre-coding;

the downlink transmission can be modeled as a multi-input single-output channel, and the received signal of the nth downlink user in the kth cluster is:

wherein the content of the first and second substances,

indicating the channel parameters from the base station to the downlink user n,

and

indicating uplink interference channels within a cluster and from

The uplink and downlink inter-cluster interference channel of (a),

and

respectively representing co-cluster uplink interference signals and signals from

The uplink and downlink interference signals of (2),

representing additive white gaussian noise; based on the received signal, the reachable sum rate of the downlink in the cluster can be expressed as

In which inter-cluster interference

Is shown as

The process of determining the revenue function includes:

for having N_kCluster of base stations

SIC decoding complexity increases exponentially with the number of base stations, i.e.

For describing cluster complexity, cluster group

The instantaneous profit and cluster cost of (c) is defined as:

wherein q is_kFor a given cost per unit price, based on the above analysis, the revenue function for cell n to join the kth cluster is defined as:

wherein

Representing the relative proportion of contributions, v_{n}Indicating the benefit of a single cluster n.

S4, constructing an optimization problem based on system throughput;

the optimization problem described in step S4 includes:

wherein

All parameters are expressed on the whole time scale, T represents all time lengths, and gamma is equal to 0,1]Representing the discount coefficient, P^ULAnd P^DLIndicating maximum transmission in uplink and downlinkPower; the purpose of the above optimization problem is to jointly optimize the cluster selection and the uplink and downlink transmission parameters of each cell to dynamically maximize the average total throughput of the network over a long-term time scale.

S5, determining a cluster central node based on a neighbor propagation algorithm;

s501, defining the similarity between any two base stations as

Wherein L is_nRepresents the geographical location of base station n;

s502, defining the responsibility degree and the availability degree between any base stations as

Wherein

Representing the degree of freedom with which base station n is selected by base station m as the cluster center,

representing the fitness of base station m to select base station n as the cluster center;

s503, calculating cluster center set

S6, dynamic network resource allocation is carried out based on distributed reinforcement learning:

s601, dividing each time slot into two stages, selecting a cluster center by each intelligent agent respectively to define the action and state space of each intelligent agent, and enabling the gain function of each stage to be as follows:

in time slot t, each agent first selects a cluster center in the first phase, i.e. the cluster center

Wherein the content of the first and second substances,

is equivalent to

In the second phase, the selection of transmission parameters is performed, and the action space of this phase is defined as

Wherein

And

respectively the uplink and downlink transmission power of the nth agent,

representing the transmission parameters of the intelligent agent n downlink transmission node to the user m; when the first stage is finished, the clustering structure is fixed, and the current N is_kWhen the agents form a cluster, the agents N, N are the same as {1, …, N ∈_kThe action space in the second stage is simplified to

Likewise, the state of the first stage is defined, i.e.

Wherein h is_n(t) indicates channels associated with agent n, including uplink and downlink channels and interfering channels,

is a vector dependent on the members in the last slot cluster; if agents n and m were clustered in the last time slot

Then

The values of the respective n-th and m-th elements in (a) are 1, and the remaining values are 0; after the clustering of the first stage is completed, each agent can observe the members in the cluster at the current moment; thus, the state of the second stage is updated to

Subsequently, a two-stage revenue function is defined as

S602, constructing a multi-agent deep reinforcement learning framework, and solving the problem of clustering and transmission parameter distributed execution behaviors:

in time slot t, agent n first selects a cluster center with the aid of a DQN network

As a state

A function of (a); then, since the same cluster of agents can observe each other, the vector

Generate, update the state to

Each agent selects behavior according to the local state and the operator network in the DDPG structure

When the execution of the behavior is finished, the benefits of the two stages are respectively obtained

And

the environment jumps to the next state

And

after each intelligent agent obtains the behavior at the current moment, each cell in the networking distributively selects the respective cluster center and executes the uplink and downlink sending parameters, so that the transmission of uplink and downlink signals is realized. At each moment, each cell

In a distributed manner, and in a manner that cluster selection is performed

Signal transmission is carried out on the transmission parameters, so that the allocation of dynamic resources of the whole ultra-dense networking on a long time scale is realized;

s603, after the action execution is finished, the experience of two stages

And

are respectively stored in memory registers with a fixed length of M

And

performing the following steps; if the memory buffer is full, the old memory bar will be covered by the new memory bar; the trainer randomly extracts D memory training networks from the memory buffer;

DQN networks are mainly trained by minimizing the loss function, i.e.

Wherein

Is a corresponding target Q function, the parameter θ' of which will be periodically updated according to the value of θ, i.e.

θ′←(1-τ)θ+τθ′,

Where τ is a fixed update parameter; each agent is then equipped with a duplicate version of the target Q network, using the epsilon-greedy method and based on

A value selection behavior of; in DQN, buffers

Each memory in the memory contains the experience of all agents at a certain moment, i.e. the

For the second stage of training, each agent has a DDPG structure, which is composed of an Actor and a Critic network. The Actor network is used for taking actions in a distributed manner according to the current local observation, and the Critic is used for evaluating the quality of the Actor output action and guiding the Actor network to output a more effective strategy; therefore, the training of Critic and Actor is also performed on the centralized controller; wherein the Actor network is mainly trained by minimizing the following gradient function, i.e.

Wherein

Is used for evaluating the behavior selected by the current Actor for the output of the Critic network and finding a better gradient descent direction, mu_nA strategy representing the output of the agent n in the second stage; critic is trained primarily by maximizing the loss function, i.e.

Wherein the content of the first and second substances,

is all target network at parameter θ'_nThe output strategy is as follows; likewise, parameter θ'_nAccording to theta_nIs periodically updated, i.e. the value of

θ′_n←(1-τ)θ_n+τθ′_n.

Buffer memory

Each memory in the memory also contains the experience of all agents at a certain moment, i.e. the experience of all agents

The invention has the beneficial effects that: the invention can effectively coordinate the transmission of a plurality of cells, improve the network performance and maximize the network throughput through the grouping, power distribution and sending parameter design of each cell.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a schematic diagram of behavior execution for dynamic resource allocation;

fig. 3 is a schematic diagram of a simulation scenario of 10 cells in the embodiment;

FIG. 4 is a diagram illustrating average and profit under different clustering strategies in the embodiment;

FIG. 5 is a diagram illustrating average and profit under different duplexing modes in an embodiment;

fig. 6 is a diagram illustrating the ratio of the full-duplex base station in the embodiment as a function of time.

Detailed Description

The technical solutions of the present invention are further described in detail below with reference to the accompanying drawings, but the scope of the present invention is not limited to the following.

As shown in fig. 1, a dynamic resource allocation method for ultra-dense networking includes the following steps:

the clustering results for N cells are:

setting clustering structure

K is more than or equal to 1 and less than or equal to N and comprises one or more cells; binary variable

Indicating that the nth cell selects the kth cluster, otherwise

Each base station can only join one cluster at most, so

In order to increase the overall throughput of the network, how to form a cluster structure in order to more efficiently serve users becomes a key issue. Next, we will present a transmission model defining the revenue function for each member of any feasible cluster.

the uplink transmission scheme comprises:

setting nth cell selection cluster

The transmission power of uplink users in the cell is

Of (2) a signal

To have

A physical virtual base station for each receiving antenna; modeling uplink transmission in each cluster as a multi-user single-transmission multi-reception SIMO channel, and then receiving signals by the virtual base station group in the kth cluster are as follows:

wherein the content of the first and second substances,

and

representing self-interfering channels and from

The uplink and downlink inter-cluster interference channel of (a),

and

The uplink and downlink interference signals of (2),

The inner uplink reachable rate is:

wherein the content of the first and second substances,

representing rank as N_kIdentity matrix, inter-cluster interference matrix of

Expressed as:

wherein the content of the first and second substances,

is composed of

And (4) precoding matrixes of the nth user of the inner downlink.

The downlink transmission scheme comprises:

in downlink transmission, the virtual base station passes through the precoder

For each signal sent to downlink users

Carrying out pre-coding;

wherein the content of the first and second substances,

indicating the channel parameters from the base station to the downlink user n,

and

indicating uplink interference channels within a cluster and from

The uplink and downlink inter-cluster interference channel of (a),

and

The uplink and downlink interference signals of (2),

In which inter-cluster interference

Is shown as

The process of determining the revenue function includes:

for having N_kCluster of base stations

For describing cluster complexity, cluster group

The instantaneous profit and cluster cost of (c) is defined as:

wherein

Representing the relative proportion of contributions, v_{n}Indicating the benefit of a single cluster n. These cells want to select the appropriate clustering and transmission parameters to maximize long-term and revenue.

S4, constructing an optimization problem based on system throughput;

the optimization problem described in step S4 includes:

wherein

All parameters are expressed on the whole time scale, T represents all time lengths, and gamma is equal to 0,1]Representing the discount coefficient, P^ULAnd P^DLRepresents the uplink and downlink maximum transmit power; the purpose of the above optimization problem is to jointly optimize the cluster selection and the uplink and downlink transmission parameters of each cell to dynamically maximize the average total throughput of the network over a long-term time scale.

s501, defining the similarity between any two base stations as

Wherein L is_nRepresents the geographical location of base station n;

Wherein

s503, calculating cluster center set

S6, dynamic network resource allocation is carried out based on distributed reinforcement learning, each cell is modeled as an intelligent agent object, and resource allocation is used as a behavior set of the intelligent agent:

Wherein the content of the first and second substances,

is equivalent to

Wherein

And

respectively the uplink and downlink transmission power of the nth agent,

Likewise, the state of the first stage is defined, i.e.

Then

Subsequently, a two-stage revenue function is defined as

S602, a multi-agent deep reinforcement learning architecture is constructed to solve the problem of clustering and transmission parameter distributed execution behavior, as shown in fig. 2:

As a state

Generate, update the state to

And

the environment jumps to the next state

And

In a distributed manner, and in a manner that cluster selection is performed

s603, after the action execution is finished, the experience of two stages

And

are respectively stored in memory registers with a fixed length of M

And

DQN networks are mainly trained by minimizing the loss function, i.e.

Wherein

θ′←(1-τ)θ+τθ′,

A value selection behavior of; in DQN, buffers

Wherein

The behavior evaluation method is used for evaluating the behavior selected by the current Actor for the output of the Critic network and finding a better gradient descending direction for the behavior; mu.s_nRepresenting the output strategy of the agent n in the second stage; critic is trained primarily by maximizing the loss function, i.e.

Wherein the content of the first and second substances,

θ′_n←(1-τ)θ_n+τθ′_n.

Buffer memory

In the embodiments of the present application, some simulation results obtained by applying the above algorithm are given. In the simulation scenario, cells generated by 10 two-dimensional poisson point processes in a fixed area of 40 meters by 50 meters are considered, the cell density is 5000 cells per square kilometer, and the radius of each cell is 5 meters, as shown in fig. 3. Maximum uplink transmit power P^UL20dB, the maximum downlink transmission power is P ^UL25 dB. The path loss model is 140.7+36.7log₁₀(d) Where d is the distance from the sender to the receiver. The standard deviation of the shadow fading is set to 8dB and the Gaussian white noise power sigma²Time per slot T of-30 dB_d100ms, maximum Doppler frequency f_d＝10Hz。

Next, we define some hyper-parameters in the neural network. In the algorithm, all the neural networks have an input layer, an output layer and three hidden layers, and the hidden layers respectively have 256 neurons. For the 10-cell simulation scenario, there are 112 state input neurons. The first stage of behavioral neurons has

A one, wherein

The magnitude of (a) is obtained by an approximate propagation algorithm, and there are 12 behavioral neurons of the second node. The activation functions in the neural network are set as ReLu functions, and learning is carried out by adopting a fixed learning rate. Initial learning rate of the DQN network is set to 10^-3Initiation of Actor and Critic networksThe learning rates are respectively set to 10^-3And 10^-4We have implemented the proposed algorithm with TensorFlow. FIG. 4 compares the four scenarios of no grouping, grouping into one group, K-means algorithm grouping, random grouping and the proposed dynamic grouping algorithm scenario. Wherein, the node group unit overhead q is q_k＝0.06,

Residual self-interference power is xi²-20 dB. It can be seen that the proposed algorithm can achieve convergence at 6000 slots. Compared with the other four grouping algorithms, the average system benefit obtained by the dynamic grouping algorithm is higher. In addition, we compare the average gains of the proposed flexible duplex mode with systems in full-duplex, half-duplex mode, where each node considers full-power transmission and half-duplex considers frequency-division duplex. As can be seen from fig. 5, the performance of the proposed algorithm to support full half-duplex flexible switching mode is better than full duplex and also better than half duplex. Fig. 6 also shows the proportion of the total base station occupied by the full-duplex base station in the proposed algorithm as the environment changes. Compared with a full duplex mode of full power transmission, the algorithm can achieve greater system throughput with less power consumption; compared with the half-duplex mode, the system throughput is remarkably improved.

Finally, we fix the overhead of the node group unit as q is 0.06, and study the influence of the self-interference cancellation capability on the proportion of the base stations in the system which adopt the full half-duplex mode. We respectively give 5 levels of residual self-interference power in the range of-20 dB to 0dB, and study the base station proportion of full half-duplex mode under different clustering strategies. As can be seen from the table, the proportion of base stations using the full-duplex mode in the proposed dynamic clustering algorithm is smaller under the same self-interference cancellation capability. The combination of the simulation results can show that the full-duplex base station has smaller occupation ratio under the condition of obtaining higher benefit by the proposed algorithm under the condition of low cluster cost, thereby having higher energy efficiency.

Table 1 full duplex base station occupancy ratio

While the foregoing description shows and describes the preferred embodiments of the present invention, it is to be understood, as noted above, that the invention is not limited to the forms disclosed herein, but is not intended to be exhaustive or to exclude other embodiments and may be used in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept described herein, as determined by the above teachings or as determined by the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A dynamic resource allocation method facing ultra-dense networking is characterized in that: the method comprises the following steps:

s4, constructing an optimization problem based on system throughput;

and S6, carrying out dynamic network resource allocation based on distributed reinforcement learning.

2. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the ultra-dense networking model constructed in step S1 includes:

3. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the clustering result for N cells in step S2 is:

setting clustering structure

Comprises one or more cells; binary variable

Indicating that the nth cell selects the kth cluster, otherwise

Each base station can only join one cluster at most, so

4. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the uplink transmission scheme described in step S3 includes:

setting nth cell selection cluster

The transmission power of uplink users in the cell is

Of (2) a signal

To have

wherein the content of the first and second substances,

and

representing self-interfering channels and from

The uplink and downlink inter-cluster interference channel of (a),

and

The uplink and downlink interference signals of (2),

decoding the received signal by using a minimum mean square error serial interference elimination decoder to obtain

The inner uplink reachable rate is:

wherein the content of the first and second substances,

representing rank as N_kIdentity matrix, inter-cluster interference matrix of

Expressed as:

wherein the content of the first and second substances,

is composed of

And (4) precoding matrixes of the nth user of the inner downlink.

5. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the downlink transmission scheme described in step S3 includes:

in downlink transmission, the virtual base station passes through the precoder

For each signal sent to downlink users

Carrying out pre-coding;

wherein the content of the first and second substances,

indicating the channel parameters from the base station to the downlink user n,

and

indicating uplink interference channels within a cluster and from

The uplink and downlink inter-cluster interference channel of (a),

and

The uplink and downlink interference signals of (2),

Wherein inter-cluster interference

Is shown as

6. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the process of determining the revenue function in step S3 includes:

for having N_kCluster of base stations

The successive interference cancellation decoding complexity increases exponentially with the number of base stations, i.e.

To describe cluster complexity. Cluster group

The instantaneous profit and cluster cost of (c) is defined as:

wherein

7. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the optimization problem described in step S4 includes:

wherein

8. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the step S5 includes:

s501, defining the similarity between any two base stations as

Wherein L is_nRepresents the geographical location of base station n;

Wherein

s503, calculating cluster center set

9. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the step S6 includes:

Wherein the content of the first and second substances,

is equivalent to

Wherein

And

respectively the uplink and downlink transmission power of the nth agent,

Likewise, the state of the first stage is defined, i.e.

Then

Subsequently, a two-stage revenue function is defined as

As a state

Generate, update the state to

And

the environment jumps to the next state

And

when each agent gets the action at the current moment, each agent in the network is organizedThe cells distributively select respective cluster centers and execute uplink and downlink sending parameters to realize the transmission of uplink and downlink signals; at each moment, each cell

In a distributed manner, and in a manner that cluster selection is performed

s603, after the action execution is finished, the experience of two stages

And

are respectively stored in memory registers with a fixed length of M

And

DQN networks are mainly trained by minimizing the loss function, i.e.

Wherein

θ′←(1-τ)θ+τθ′,

A value selection behavior of; in DQN, buffers

For the training of the second stage, each agent has a DDPG structure which is composed of an Actor and a Critic network; the Actor network is used for taking actions in a distributed manner according to the current local observation, and the Critic is used for evaluating the quality of the Actor output action and guiding the Actor network to output a more effective strategy; therefore, the training of Critic and Actor is also performed on the centralized controller; wherein the Actor network is mainly trained by minimizing the following gradient function, i.e.

Wherein

Is used for evaluating the behavior of the current Actor selection for the output of the Critic network and finding better gradient descent for the behaviorDirection, μ_nRepresenting the output strategy of the agent n in the second stage; critic is trained primarily by maximizing the loss function, i.e.

Wherein the content of the first and second substances,

θ′_n←(1-τ)θ_n+τθ′_n.

Buffer memory

Each memory in the list also contains all the experiences of the agent at a certain time, namely: