CN113891481A

CN113891481A - Throughput-oriented cellular network D2D communication dynamic resource allocation method

Info

Publication number: CN113891481A
Application number: CN202111140067.1A
Authority: CN
Inventors: 郑军; 姜书瑞; 张源
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-09-28
Filing date: 2021-09-28
Publication date: 2022-01-04

Abstract

The invention discloses a throughput-oriented dynamic resource allocation method for cellular network D2D communication, which comprises the following steps: step 1, judging the type of a newly arrived user; step 2, respectively calculating the link signal-to-interference-and-noise ratio of the user on each frequency spectrum resource block and the obtained throughput according to the user type, and allocating the frequency spectrum resource block which can currently provide the maximum user throughput for the newly arrived user; and 3, after the distribution of the spectrum resource blocks is completed, distributing the transmission power for the newly arrived users. The invention can effectively improve the total throughput of the cellular network on the premise of ensuring the communication quality of cellular users.

Description

Throughput-oriented cellular network D2D communication dynamic resource allocation method

Technical Field

The invention relates to the technical field of wireless communication, in particular to a throughput-oriented dynamic resource allocation method for cellular network D2D communication.

Background

With the rapid spread of mobile devices, more and more local applications, such as content sharing, interactive games, etc., require data to be transmitted between nearby users, and the demand is increasing. Meanwhile, with the rapid rise of the application of the internet of things, a large number of mobile terminals need to access a network. All this poses a significant challenge to communication resource management in cellular networks. D2D communication is a new type of communication where users communicate directly without going through a base station. The D2D communication is introduced into the cellular network, so that the frequency spectrum utilization rate of the network can be effectively improved, and the problem of network resource shortage is solved. Meanwhile, since the transmission distance of D2D communication is usually much shorter than that of cellular communication, it also has the advantages of improving energy efficiency, reducing transmission delay, reducing network load, etc. To improve the spectrum utilization of cellular networks, the D2D user pairs typically employ a sharing mode. In the shared mode, the pair of D2D users multiplexes spectrum resource blocks of the cellular users, in which case mutual interference between the two cannot be avoided. Therefore, how to effectively alleviate the interference between the D2D link and the cellular link under the premise of ensuring the communication quality of the cellular user becomes an important issue in the D2D communication. To address this problem, current solutions mainly include power control, spectrum allocation, and the like. In an actual network scenario, the cellular user and the D2D user both arrive and leave the network dynamically, and the interference between links is in a changing process, so it is necessary to research and design the interference control algorithm in the dynamic scenario.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a throughput-oriented dynamic resource allocation method for D2D communication in a cellular network, which is used to solve the problem of interference among a base station, a cellular user, and a D2D user caused by introducing D2D communication in a single-cell uplink scenario, and improve the network throughput at the same time.

In order to solve the technical problem, the invention provides a throughput-oriented dynamic resource allocation method for cellular network D2D communication, which comprises the following steps:

step 1, judging the type of a newly arrived user, wherein the user type comprises a cellular user and a D2D user pair.

Step 2, distributing frequency spectrum resource blocks for newly arrived users; the method specifically comprises the following steps:

step 2.1, respectively calculating link signal-to-interference-and-noise ratios of users on each frequency spectrum resource block and the obtained throughput according to the types of the users;

step 2.2, sequencing the throughputs obtained by the users on each frequency spectrum resource block from large to small; allocating a spectrum resource block which can currently provide the maximum user throughput for a newly arrived cellular user or a D2D user pair; for a cellular user, if the available throughput is less than the minimum throughput requirement of the cellular user, denying the cellular user access to the network;

step 3, distributing the sending power for the newly arrived user;

if the newly arrived user is a cellular user, allocating fixed transmission power for the newly arrived user;

if the newly arrived user is a D2D user pair, calling a power control algorithm based on Q learning to distribute the transmission power for the D2D user pair, and dynamically adjusting the transmission power of other D2D user pairs sharing the same spectrum resource block so as to maximize the total network throughput; and if all the available D2D users cannot meet the minimum throughput requirement of the cellular users occupying the same spectrum resource block, rejecting the D2D users to access the network.

Further, in step 2.1, according to the user type, the link signal-to-interference-and-noise ratio of the user on each spectrum resource block and the throughput that can be obtained are respectively calculated; the method specifically comprises the following steps:

1) if the newly arrived user type is a cellular user, the calculation process of the link signal-to-interference-and-noise ratio and the obtained throughput of the cellular user on each frequency spectrum resource block is as follows:

the calculation formula of the link signal-to-interference-and-noise ratio of the cellular user on each frequency spectrum resource block is as follows:

wherein, C_iDenotes the ith cellular subscriber (i ═ 1,2, …), D_jRepresents the j-th D2D user pair (j is 1,2, …), r is 1,2, …, K represents the number of spectrum resource blocks in the network;

to representA set of all D2D user pairs sharing an r-th spectrum resource block;

indicating a cellular user C occupying the r-th spectral resource block_iThe transmission power of the transmitter,

indicating D2D user pair D occupying the r-th spectral resource block_jThe transmission power of (a);

indicating a cellular user C occupying the r-th spectral resource block_iThe channel gain with the base station is,

indicating D2D user pair D occupying the r-th spectral resource block_jChannel gain, σ, between the transmitting end and the base station²Representing the noise power.

According to the shannon theorem, the calculation formula of the throughput obtained by the cellular user on each spectrum resource block is as follows:

where W represents the bandwidth of one spectrum resource block.

2) If the newly arrived user type is D2D user, the link signal-to-interference-and-noise ratio and the obtainable throughput of the D2D user on each spectrum resource block are calculated as follows:

the signal-to-interference-and-noise ratio calculation formula of the D2D link is as follows:

wherein the content of the first and second substances,

indicating occupation of the r-th spectrum resourceBlock D2D user pairs D_jThe channel gain between the transmitting end and the receiving end,

indicating a cellular user C occupying the r-th spectral resource block_iAnd D2D user pair D_jThe channel gain between the receiving ends is,

representing different D2D user pairs D sharing an r-th spectrum resource block_j'Transmitting terminal and D_jChannel gain between receiving ends.

According to shannon's theorem, the throughput calculation formula of the D2D user pair is:

further, in step 3, if the newly arrived user is the D2D user pair, the Q-learning-based power control algorithm is invoked to allocate the transmission power to the D2D user pair, specifically, the transmission power is allocated to the D2D user pair according to the Q-value table output by the Q-learning-based power control algorithm.

Further, in step 3, if the newly arrived user is a D2D user pair, a power control algorithm based on Q learning is invoked to allocate transmission power to the D2D user pair, and the transmission power of other D2D user pairs sharing the same spectrum resource block is dynamically adjusted to maximize the total network throughput; the method comprises the following specific steps:

step 3.1, for N sharing newly arrived D2D user pair allocated spectrum resource block_rD2D user pair D_j,j∈{1,2,…,N_rInitializing the values of Q value tables output by all the power control algorithms based on Q learning to be 0;

step 3.2, selecting the jth D2D user pair sharing the spectrum resource block;

3.3, based on the current Q value table, selecting an action a according to an epsilon-greedy strategy; wherein action a is defined as selecting one transmission work for each pair of D2D users sharing the spectrum resource blockThe rate p ∈ { p }₁,p₂,…,p_LIn which p is₁,p₂,…,p_LIs the alternative transmit power. Specifically, a random number of 0-1 is generated, and if the random number is smaller than epsilon, the action is randomly selected, and if the random number is larger than epsilon, the action with the maximum Q value is selected.

Step 3.4, executing action a, and calculating a reward function R;

the reward function R is defined as follows:

wherein, tau₀Representing a minimum throughput requirement of a cellular user occupying the spectrum resource block;

the above equation shows that when the throughput of the cellular user is higher than the minimum throughput requirement, the reward function is the total throughput of all users sharing the spectrum resource block, that is, the optimization goal of the algorithm is to maximize the total network throughput, otherwise, the reward function is-1, which represents a penalty value.

Step 3.5, updating the Q value table according to the following formula:

wherein Q' (s, a) represents an updated value of the Q value table, Q (s, a) represents a current value of the Q value table, a represents a learning rate, 0. ltoreq. a.ltoreq.1, γ represents an attenuation factor, 0. ltoreq. γ.ltoreq.1,

represents the maximum value in the current Q value table;

step 3.6, repeating the steps 3.3-3.5 until the Q value table is converged;

step 3.7, repeating steps 3.2-3.6 until all D2D user pairs sharing the spectrum resource block are traversed;

and 3.8, assigning j to be 1, and repeating the steps 3.2-3.7 until the Q value tables of all the D2D user pairs sharing the spectrum resource block converge to the same optimal solution.

The invention has the beneficial effects that: on the premise of ensuring the communication quality of cellular users, the total throughput of the cellular network can be effectively improved.

Drawings

Fig. 1 is a schematic diagram of a cellular network D2D communication uplink system model in the invention.

Fig. 2 is a schematic flow chart of a step of allocating a spectrum resource block to a newly arrived user in the present invention.

Fig. 3 is a flow chart illustrating a procedure of allocating transmission power to a newly arrived user according to the present invention.

FIG. 4 is a schematic diagram of a power control algorithm based on Q learning according to the present invention.

FIG. 5 is a schematic flow chart of the method of the present invention.

Detailed Description

The embodiment of the invention discloses a throughput-oriented cellular network D2D communication dynamic resource allocation method, which is applied to a single-cell scene. Within the cell there is a base station BS, with both cellular and D2D user pairs dynamically arriving and leaving the network. There are K spectrum resource blocks in the system, noted

The D2D user pairs multiplex the frequency spectrum resource blocks of the cellular user uplink, the cellular user and the D2D user pairs are distributed uniformly in the cell area at random, and the base station can obtain the channel state information of all links. There are two link modes in a cell: a cellular link mode between a base station and a cellular user; D2D user is directed to direct link mode between the sender and receiver.

Because the D2D user multiplexes the spectrum resource of the uplink, there are three kinds of interference in the system, as shown in fig. 1: (1) the signals transmitted by the cellular users to the base station are received by the D2D user pair receiving end, and generate interference to the D2D user pair; (2) signals transmitted by a D2D user to a transmitting end to a D2D user to a receiving end are received by a base station, and interference is generated on the base station; (3) signals transmitted by the D2D user pair transmitting terminal to the D2D user pair receiving terminal are received by other D2D users in the same cell to the receiving terminal, and generate interference to other D2D user pairs.

The cellular network D2D communication dynamic resource allocation method for throughput mainly comprises 3 steps: (1) judging the type of a newly arrived user; (2) allocating spectrum resource blocks for newly arrived users; (3) the transmission power is allocated to the newly arrived user.

Specifically, as shown in fig. 5, the throughput-oriented dynamic resource allocation method for cellular network D2D communication according to the present invention includes the following steps:

Step 2, distributing frequency spectrum resource blocks for newly arrived users; as shown in fig. 2, the method specifically includes the following steps:

1) the user type is a cellular user, and the calculation process of the link signal-to-interference-and-noise ratio and the obtained throughput of the cellular user on each frequency spectrum resource block is as follows:

the channel gains between the base station and the cellular users, between the base station and the D2D users ' receivers, between the D2D users ' transmitters and the cellular users, and between the D2D users ' pairs in the cell are respectively expressed as:

wherein the content of the first and second substances,

respectively representing cellular subscribers C_iWith base station, D2D user pair D_jPath loss between the transmitting end and the base station, beta denotes a gain exponent, mu denotes a path loss exponent,

indicating cellular user C_iAnd D2D user pair D_jThe distance between the receiving ends is such that,

representing D2D user pair D_jThe distance between the transmitting end and the receiving end,

representing different D2D user pairs D_j'Transmitting terminal and D_jThe distance between the receiving ends.

represents a set of all D2D user pairs sharing an r-th spectrum resource block;

where W represents the bandwidth of one spectrum resource block.

2) The user type is D2D user, and the steps of calculating the link signal-to-interference-and-noise ratio and the obtainable throughput of the D2D user on each spectrum resource block are as follows:

wherein the content of the first and second substances,

indicating D2D user pair D occupying the r-th spectral resource block_jThe channel gain between the transmitting end and the receiving end,

indicating a cellular user C occupying the r-th spectral resource block_iAnd D2D user pairsD_jThe channel gain between the receiving ends is,

step 2.2, sequencing the throughputs obtained by the users on each frequency spectrum resource block from large to small; allocating a spectrum resource block which can currently provide the maximum user throughput for a newly arrived cellular user or a D2D user pair;

for a cellular user, if the available throughput is less than the minimum throughput requirement of the cellular user, denying the cellular user access to the network;

step 3, distributing the transmission power for the newly arrived user, as shown in fig. 3;

if the newly arrived user is a D2D user pair, calling a power control algorithm based on Q learning, distributing the transmission power for the D2D user pair according to a Q value table output by the power control algorithm, and dynamically adjusting the transmission power of other D2D user pairs sharing the same spectrum resource block so as to maximize the total throughput of the network; and if all the available D2D users cannot meet the minimum throughput requirement of the cellular users occupying the same spectrum resource block within a certain constraint range, rejecting the D2D users to access the network.

Specifically, as shown in fig. 4, if the arriving user is a pair of D2D users, the specific steps of allocating transmission power to the arriving user are as follows:

step 3.1, for N sharing newly arrived D2D user pair allocated spectrum resource block_rD2D user pair D_j,j∈{1,2,…,N_r}, beginningInitializing the values of Q value tables output by all the power control algorithms based on Q learning to be 0, and assigning j to be 1;

step 3.2, selecting the jth D2D user pair sharing the spectrum resource block;

3.3, based on the current Q value table, selecting an action a according to an epsilon-greedy strategy; wherein action a is defined as selecting a transmit power p e p for each pair of D2D users sharing the spectrum resource block₁,p₂,…,p_LIn which p is₁,p₂,…,p_LIs the alternative transmit power. Specifically, a random number of 0-1 is generated, and if the random number is smaller than epsilon, the action is randomly selected, and if the random number is larger than epsilon, the action with the maximum Q value is selected.

Step 3.4, executing action a, and calculating a reward function R;

the reward function R is defined as follows:

Step 3.5, updating the Q value table according to the following formula:

wherein Q' (s, a) represents an updated value of the Q value table, Q (s, a) represents a current value of the Q value table, α represents a learning rate, 0. ltoreq. a.ltoreq.1, γ represents an attenuation factor, 0. ltoreq. γ.ltoreq.1,

representing the maximum value in the current Q-value table；

Step 3.6, repeating the steps 3.3-3.5 until the Q value table is converged;

and 3.8, re-assigning j to 1, and repeating the steps 3.2-3.7 until the Q value tables of all the D2D user pairs sharing the spectrum resource block converge to the same optimal solution.

While the invention has been shown and described with respect to the embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the following claims.

Claims

1. A dynamic resource allocation method for throughput-oriented cellular network D2D communication, comprising the steps of:

step 1, judging the type of a newly arrived user, wherein the user type comprises a cellular user and a D2D user pair;

step 3, distributing the sending power for the newly arrived user;

if the newly arrived user is a D2D user pair, calling a power control algorithm based on Q learning to distribute the transmission power for the D2D user pair, and dynamically adjusting the transmission power of other D2D user pairs sharing the same spectrum resource block so as to maximize the total network throughput; and when all available D2D users cannot meet the minimum throughput requirement of the cellular users occupying the same spectrum resource block for the selected power, rejecting the D2D users to access the network.

2. A throughput-oriented dynamic resource allocation method for cellular network D2D communication according to claim 1, wherein in step 2.1, the link sir and the achievable throughput of the user on each spectrum resource block are calculated respectively according to the user type; the method specifically comprises the following steps:

represents a set of all D2D user pairs sharing an r-th spectrum resource block;

indicating D2D user pair D occupying the r-th spectral resource block_jChannel gain, σ, between the transmitting end and the base station²Representing the noise power;

wherein, W represents the bandwidth of one spectrum resource block;

wherein the content of the first and second substances,

representing different D2D user pairs sharing an r-th spectrum resource blockD_j'Transmitting terminal and D_jChannel gain between the receiving ends;

3. the method as claimed in claim 1, wherein in step 3, if the newly arrived user is the D2D user pair, the Q-learning based power control algorithm is invoked to allocate the transmission power for the D2D user pair, and specifically, the transmission power is allocated for the D2D user pair according to the Q-value table outputted by the Q-learning based power control algorithm.

4. The dynamic D2D communication resource allocation method for a throughput-oriented cellular network, according to claim 1, wherein in step 3, if the newly arrived user is the D2D user pair, the power control algorithm based on Q learning is invoked to allocate the transmission power for the D2D user pair, and the transmission power of other D2D user pairs sharing the same spectrum resource block is dynamically adjusted to maximize the total network throughput; the method comprises the following specific steps:

step 3.1, for N sharing newly arrived D2D user pair allocated spectrum resource block_rD2D user pair D_j,j∈{1,2,…,N_rInitializing the values of Q value tables output by all the power control algorithms based on Q learning to be 0; assigning j to 1;

step 3.2, selecting the jth D2D user pair sharing the spectrum resource block;

3.3, based on the current Q value table, selecting an action a according to an epsilon-greedy strategy; wherein action a is defined as selecting a transmit power p e p for each pair of D2D users sharing the spectrum resource block₁,p₂,…,p_LIn which p is₁,p₂,…,p_LIs an alternative transmit power; specifically, a random number of 0-1 is generated,if the Q value is larger than the epsilon, selecting the action randomly;

step 3.4, executing action a, and calculating a reward function R;

the reward function R is defined as follows:

the above equation represents that when the throughput of a cellular user is higher than its minimum throughput requirement, the reward function is the total throughput of all users sharing the spectrum resource block, i.e. the optimization goal of the algorithm is to maximize the network total throughput; otherwise, the reward function is-1 and represents a penalty value;

step 3.5, updating the Q value table according to the following formula:

wherein Q' (s, a) represents an updated value of the Q value table, Q (s, a) represents a current value of the Q value table, α represents a learning rate, 0. ltoreq. α.ltoreq.1, γ represents an attenuation factor, 0. ltoreq. γ.ltoreq.1,

represents the maximum value in the current Q value table;

step 3.6, repeating the steps 3.3-3.5 until the Q value table is converged;