CN116133081A - Joint relay selection and NOMA channel and power allocation algorithm in single UAV heterogeneous network - Google Patents

Joint relay selection and NOMA channel and power allocation algorithm in single UAV heterogeneous network Download PDF

Info

Publication number
CN116133081A
CN116133081A CN202310164386.9A CN202310164386A CN116133081A CN 116133081 A CN116133081 A CN 116133081A CN 202310164386 A CN202310164386 A CN 202310164386A CN 116133081 A CN116133081 A CN 116133081A
Authority
CN
China
Prior art keywords
user
users
relay
unmanned aerial
aerial vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310164386.9A
Other languages
Chinese (zh)
Inventor
张晋喜
周志雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital University of Physical Education and Sports
Original Assignee
Capital University of Physical Education and Sports
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capital University of Physical Education and Sports filed Critical Capital University of Physical Education and Sports
Priority to CN202310164386.9A priority Critical patent/CN116133081A/en
Publication of CN116133081A publication Critical patent/CN116133081A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/22Communication route or path selection, e.g. power-based or shortest path routing using selective relaying for reaching a BTS [Base Transceiver Station] or an access point
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/18502Airborne stations
    • H04B7/18504Aircraft used as relay or high altitude atmospheric platform
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/18502Airborne stations
    • H04B7/18506Communications with or from aircraft, i.e. aeronautical mobile service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/90Services for handling of emergency or hazardous situations, e.g. earthquake and tsunami warning systems [ETWS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/24TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
    • H04W52/242TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account path loss
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W88/00Devices specially adapted for wireless communication networks, e.g. terminals, base stations or access point devices
    • H04W88/02Terminal devices
    • H04W88/04Terminal devices adapted for relaying to or from another terminal or user
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Astronomy & Astrophysics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Emergency Management (AREA)
  • Environmental & Geological Engineering (AREA)
  • Public Health (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention belongs to the technical field of network communication, in particular to a joint relay selection and NOMA channel and power distribution algorithm in a single UAV heterogeneous network, which comprises the following steps of: establishing a system model, constructing a heterogeneous network scene assisted by a single unmanned aerial vehicle, and after the ground base station service is interrupted due to the occurrence of scenes such as disasters and the like, deploying the unmanned aerial vehicle in the air as an air base station for serving the ground user and promoting the rescue process of the ground network in order to transfer the emergency call and information of the ground cellular user and the monitoring data of the Internet of things equipment; step 2: an optimization problem description describing QoS requirements of cellular users and IoT users as minimum transmission rates; step 3: in order to restore the communication of the external user, the energy-saving relay selection algorithm firstly associates the external IoT user with the relay in an energy-saving manner; step 4: the power and subband selection based on deep reinforcement learning is reasonable in design, the service area of the unmanned aerial vehicle can be effectively expanded, and relay selection is performed with minimum energy cost.

Description

Joint relay selection and NOMA channel and power allocation algorithm in single UAV heterogeneous network
Technical Field
The invention relates to the technical field of network communication, in particular to a joint relay selection and NOMA channel and power distribution algorithm in a single UAV heterogeneous network.
Background
The Internet of things and the traditional cellular network form a ground heterogeneous network, so that stable communication service can be provided for users. However, when a natural disaster occurs, the ground network is broken due to the destruction of the communication infrastructure, and all internet of things users and cellular users in the area lose connection. To solve this problem, the unmanned aerial vehicle communication can quickly establish an air base station and build an unmanned aerial vehicle-assisted air-ground heterogeneous network.
However, space-terrestrial heterogeneous networks still present many challenges. On the one hand, although the deployment of the unmanned aerial vehicle can resume the communication of the target area, the coverage of the unmanned aerial vehicle is limited, and still coverage holes cannot be covered. Terrestrial cellular users or IoT users that are outside of the effective coverage area of the UAV cannot establish direct communication with the drone over the air-to-ground channel due to poor channel conditions. In this case, D2D technology may be applied to establish multi-hop communications for users outside the coverage area. Specifically, a user outside the coverage area first establishes a D2D transmission link with a relay user capable of directly communicating with the drone, and then transmits its data to the relay. The data will then be forwarded from the relay to the drone. In order to save energy, relay selection should be done in an energy efficient way. That is, users outside the coverage area choose to be able to provide the required rate and to use up as little relay as possible, taking into account their own transmission energy costs and rate requirements.
On the other hand, energy efficiency is also an important factor that hinders heterogeneous network performance. Because both the ground cellular device and the internet of things device are energy limited users, power control is a critical issue that needs to be addressed. In addition, the real-time rate requirements of the user should also be guaranteed. Thus, there is a need to achieve a tradeoff between reduced power consumption and improved user QoS performance in space-floor heterogeneous networks, which presents a significant challenge for conventional orthogonal resource allocation schemes. Accordingly, much research has been focused in recent years on the combination of NOMA technology with unmanned aerial vehicles to improve the spectral efficiency of unmanned aerial vehicle networks. The traditional NOMA resource allocation scheme is used for pairing the user NOMA based on a channel gain or game theory method, and the mode is simple and direct, but the performance cannot be optimized, so that a large optimization space exists.
Based on the above, we propose a joint relay selection and NOMA channel, power allocation algorithm in a single UAV heterogeneous network.
Disclosure of Invention
This section is intended to outline some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description summary and in the title of the application, to avoid obscuring the purpose of this section, the description summary and the title of the invention, which should not be used to limit the scope of the invention.
The present invention has been made in view of the problems occurring in the prior art.
Therefore, the invention aims to provide joint relay selection and NOMA channel and power allocation algorithm in a single UAV heterogeneous network, which can effectively expand the service area of the unmanned aerial vehicle and perform relay selection with minimum energy cost.
In order to solve the technical problems, according to one aspect of the present invention, the following technical solutions are provided:
joint relay selection and NOMA channel and power allocation algorithm in single UAV heterogeneous network, step 1: establishing a system model, constructing a heterogeneous network scene assisted by a single unmanned aerial vehicle, and after the ground base station service is interrupted due to the occurrence of scenes such as disasters and the like, deploying the unmanned aerial vehicle in the air as an air base station for serving the ground user and promoting the rescue process of the ground network in order to transfer the emergency call and information of the ground cellular user and the monitoring data of the Internet of things equipment;
step 2: an optimization problem description describing QoS requirements of cellular users and IoT users as minimum transmission rates;
step 3: in order to restore the communication of external users, the energy-saving relay selection algorithm firstly correlates the external IoT users with relays (namely the internal IoT users) in an energy-saving mode, adopts a many-to-one matching game to design an energy-saving correlation scheme, and the energy-saving essence is to determine the correlation scheme according to the consumed energy, wherein each external user is an applicant, and the relays have the right to decide whether to correlate the corresponding applicant with the relay;
step 4: based on the power and subband selection of the deep reinforcement learning, after the relay selection algorithm of the previous step is executed, the D2D link between the external user and the internal relay is already constructed, and as part of the internal users play the role of the relay, the internal users have different QoS requirements in terms of transmission rate, in this case, the resource allocation among the internal users has great influence on the system performance, and the optimization target of reducing the energy consumption while guaranteeing the service quality of the users is realized by adopting the deep reinforcement learning to dynamically and adaptively allocate the resources.
As a preferred scheme of the joint relay selection and NOMA channel and power allocation algorithm in the single UAV heterogeneous network of the present invention, the present invention comprises: in the step 1:
because the coverage of a single UAV is limited, only part of ground users are in the coverage of the UAV, other users are outside the coverage area and cannot directly communicate with the UAV, therefore, D2D-based multi-hop transmission is adopted to transmit signaling to external users, the hop count is equal to 2, namely, the users outside the coverage area of the UAV transmit data to the user relay in the coverage area, then the relay transmits the data to the UAV, the ground user set is denoted as U, and the UAV consists of cellular equipment and IoT equipment and is respectively denoted as U C And U I I.e. u=u C ∪U I And the Internet of things equipment set is divided into
Figure BDA0004095407450000031
And->
Figure BDA0004095407450000032
The two parts respectively correspond to the Internet of things equipment inside and outside the coverage area of the unmanned aerial vehicle, the coverage area of the unmanned aerial vehicle is described according to the path loss, when the path loss from a ground user to the unmanned aerial vehicle is smaller than a preset threshold value, the user is considered as an internal user, and the unmanned aerial vehicle canTo communicate directly with the drone, otherwise it is classified as an external user, which can communicate with the drone only by means of relay transmission, assuming that the external user includes only IoT users, which can extend to the scenario where the external user is a cellular user, the external IoT user can only select an internal IoT user as its relay, the cellular and IoT devices have different QoS requirements, denoted +.>
Figure BDA0004095407450000033
And->
Figure BDA0004095407450000034
The positions of the drone and the user k e U are denoted as [ x ], respectively u ,y u ,h]And [ x ] k ,y k ]Wherein [ x ] u ,y u ]Is the horizontal position of unmanned aerial vehicle, and h is unmanned aerial vehicle's fixed height.
As a preferred scheme of the joint relay selection and NOMA channel and power allocation algorithm in the single UAV heterogeneous network of the present invention, the present invention comprises: in the step 1:
the method comprises the steps that a NOMA scheme is adopted for transmitting data for internal users on the ground, the coverage area of the unmanned aerial vehicle is a circle formed by taking the horizontal coordinate of the unmanned aerial vehicle as an origin and the coverage radius of the unmanned aerial vehicle as a radius, clustering is conducted in the circle according to the distance from the user to the origin, for example, when 16 internal users exist in a network and are divided into 4 NOMA clusters, the 16 users are ordered in ascending order from the near to the far according to the distance, the users are distributed to different clusters at equal intervals, users 1, 5, 9 and 13 form a first cluster, users 2, 6, 10 and 14 form a second cluster, and the like, the 16 users are distributed to the 4 clusters, and therefore 4 users exist in each cluster.
As a preferred scheme of the joint relay selection and NOMA channel and power allocation algorithm in the single UAV heterogeneous network of the present invention, the present invention comprises: the objective of step 2 is to reduce energy consumption while guaranteeing user quality of service for
Figure BDA0004095407450000041
In (a) should optimize its transmission at the same timePower and resource allocation to mitigate NOMA intra-cluster interference, increase user rate, for
Figure BDA0004095407450000042
The relay selection should also be performed in an energy-saving manner, so that, in order to achieve the goal of reducing power consumption, the power consumption is characterized by the transmit power of the user, and the QoS compliance indication is to indicate whether the QoS requirement of the user is met by an indication function.
As a preferred scheme of the joint relay selection and NOMA channel and power allocation algorithm in the single UAV heterogeneous network of the present invention, the present invention comprises: in step 3, a utility function is designed first, utility values of each external user connected to all relays are calculated, utility values of each applicant are calculated for the relays, the utility values are related to energy consumed by the external users, preference lists of the relays and the external applicant are obtained respectively in a manner of descending order of the utility values, the external users are more prone to selecting relays consuming as little energy as possible, and after the external users put forward connection requests to the own preference relay, the relays determine whether to accept the application according to the pre-established preference list and the pre-defined maximum external user number acceptable for each relay.
As a preferred scheme of the joint relay selection and NOMA channel and power allocation algorithm in the single UAV heterogeneous network of the present invention, the present invention comprises: the step 4 comprises a deep reinforcement learning base, a NOMA sub-band based on deep Q learning and a power selection algorithm.
Compared with the prior art, the invention has the beneficial effects that:
1. modeling relay selection and resource allocation problems in unmanned aerial vehicle assisted heterogeneous networks based on NOMA transmission. In order to balance between guaranteeing QoS of users and reducing energy consumption, an energy-saving relay selection and NOMA-based power and resource allocation algorithm is provided.
2. Relay selection is built for external users through a matching game algorithm, and a low-complexity many-to-one matching algorithm is provided for associating the external users with internal users. In order to reduce the energy consumption and to ensure the quality of service for the user as much as possible, the objective function is designed as a function of the user transmission power and the user QoS as variables. The external user and the internal relay respectively construct a preference relation according to the target utility value and mutually select.
3. In order to meet the differentiated QoS requirements of internal users of the unmanned aerial vehicle and improve the QoS standard reaching rate of the internal users, a power and subband allocation algorithm based on deep reinforcement learning is provided for dynamically adjusting the power and subband allocated to each user. By adopting a NOMA clustering method, users in each cluster are jointly used as an intelligent agent, and the optimal power and sub-band allocation strategy for the internal users is gradually obtained through interaction with environment states such as channel gain, interference, qoS standard reaching conditions and the like.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following detailed description of the embodiments of the present invention will be given with reference to the accompanying drawings, which are to be understood as merely some embodiments of the present invention, and from which other drawings can be obtained by those skilled in the art without inventive faculty. Wherein:
FIG. 1 is a single UAV assisted heterogeneous network model of the present invention;
FIG. 2 is a flow chart of a relay selection algorithm for energy saving according to the present invention;
FIG. 3 is a diagram of a deep reinforcement learning training process according to the present invention;
FIG. 4 is a flow chart of a power and subband selection algorithm based on deep Q learning in accordance with the present invention;
fig. 5 is a table diagram of parameters related to the present invention.
Detailed Description
In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Next, the present invention will be described in detail with reference to the drawings, wherein the sectional view of the device structure is not partially enlarged to general scale for the convenience of description, and the drawings are only examples, which should not limit the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.
The invention provides the following technical scheme: the joint relay selection and NOMA channel and power distribution algorithm in the single UAV heterogeneous network can effectively expand the service area of the unmanned aerial vehicle and perform relay selection with minimum energy cost in the use process;
example 1
System model
As shown in fig. 1, the present invention contemplates a single unmanned assisted heterogeneous network scenario. When the ground base station service is interrupted due to the occurrence of scenes such as disasters, in order to transfer emergency calls and messages of ground cellular users and monitoring data of Internet of things equipment, an aerial unmanned aerial vehicle is deployed as an aerial base station to serve the ground users and promote rescue processes of a ground network. Because of the limited coverage of a single UAV, it is assumed that only some ground users are within the coverage of the UAV and other users are outside the coverage area and cannot communicate directly with the drone. Thus, D2D-based multi-hop transmission is employed to transmit signaling for external users. For simplicity, assume that the hop count is equal to 2, i.e., the user outside the drone coverage transmits his data to the user relay in the coverage, and then the relay transmits the data to the drone. The terrestrial user set is denoted as U, consists of cellular devices and IoT devices, denoted as U respectively C And U I I.e. u=u C ∪U I . And the Internet of things equipment set is divided into
Figure BDA0004095407450000071
And->
Figure BDA0004095407450000072
The two parts respectively correspond to the Internet of things equipment inside and outside the coverage area of the unmanned aerial vehicle. The coverage of the drone is described in terms of path loss. Specifically, when the path loss from the ground user to the drone is less than a preset threshold, then the user is considered an internal user and can communicate directly with the drone. Otherwise, the unmanned aerial vehicle can be classified as an external user, and can only communicate with the unmanned aerial vehicle in a relay transmission mode. For simplicity, it is assumed that the external users include only IoT users, but the model and algorithm proposed by the present invention are generic and can also be extended to scenarios where the external users are cellular users. The external IoT user can only select the internal IoT user as its relay. Cellular and IoT devices have different QoS requirements, denoted +.>
Figure BDA0004095407450000073
And->
Figure BDA0004095407450000074
The positions of the drone and the user k e U are denoted as [ x ], respectively u ,y u ,h]And [ x ] k ,y k ]Wherein [ x ] u ,y u ]Is the horizontal position of unmanned aerial vehicle, and h is unmanned aerial vehicle's fixed height. Meanwhile, the invention assumes that ideal channel state information between the unmanned aerial vehicle and the ground users and between the ground users can be obtained.
In the single unmanned aerial vehicle auxiliary communication network, a NOMA scheme is adopted to transmit data for internal users on the ground. Specifically, the coverage area of the unmanned aerial vehicle is a circle formed by taking the horizontal coordinate of the unmanned aerial vehicle as an origin and the coverage radius of the unmanned aerial vehicle as a radius, and clustering is performed in the circle according to the distance from the user to the origin. For example, in the case where there are 16 internal users in the network and are divided into 4 NOMA clusters, the 16 users are sorted in ascending order of distance from near to far, the users are allocated to different clusters at equal intervals, users 1, 5, 9, 13 constitute a first cluster, users 2, 6, 10, 14 constitute a second cluster, and so on, the 16 users are allocated to 4 clusters, so that there are 4 users in each cluster.
The distance between the unmanned aerial vehicle and the user k is calculated as:
Figure BDA0004095407450000075
the probability that the transmission link between the unmanned aerial vehicle and the user k is the LoS link is calculated as:
Figure BDA0004095407450000076
where a and b are constants determined by the environment.
Figure BDA0004095407450000077
For the elevation angle between the drone and user k, calculate as:
Figure BDA0004095407450000078
the path loss between the drone and user k is calculated as:
Figure BDA0004095407450000081
wherein f c Is the carrier frequency, c is the speed of light, ζ LoS And zeta NLoS Representing attenuation losses in the LoS and NLoS links.
For air uplink transmissions, NOMA is employed to improve spectral efficiency. In NOMA-based systems, users may be potentially interfered with by users occupying the same spectrum resources. In the scenario considered by the present invention, it is assumed that the drone is equipped with a SIC receiver, from which the target signal can be demodulated. Meanwhile, it is assumed that the demodulation order is from the user with the highest received power to the user with the lowest received power. For user k, the interference it receives comes from those users at the UAV whose received signal power is lower than that of user k. Therefore, the uplink SINR for user k in subband n is calculated as:
Figure BDA0004095407450000082
wherein g k,u,n U is a small scale parameter between the unmanned aerial vehicle and the user k on the sub-band n n,k Is the set of users transmitting on subband n and having a received power level on the drone lower than user k, σ 2 Is the noise power.
The uplink transmission rate of internal user k (cellular or IoT user) is calculated as:
Figure BDA0004095407450000083
wherein a is n,k E {0,1} is an indicator variable that indicates whether user k is transmitting on subband n. B (B) a Is the bandwidth of a single subband in the space-time channel.
For a user-to-internal user relay link outside the UAV coverage, the transmission rate is calculated as:
Figure BDA0004095407450000084
wherein M is k Is an associated relay for external user k.
Figure BDA0004095407450000085
From external user k to relay M k Channel gain of>
Figure BDA0004095407450000086
Is a small scale fading, obeys an exponential distribution of unit mean,/->
Figure BDA0004095407450000087
Is user k and relay M k Distance between B r Is one RB in the relay linkBandwidth.
It is assumed that orthogonal frequency bands are adopted for transmission between different relay links, so that interference between the relay links does not exist. Further, it is assumed that the relay link and the space-to-ground link multiplex the same frequency band. However, due to the low transmission power and the long transmission distance, the interference of the relay link to the drone is negligible.
Description of optimization problem
In the present invention, the QoS requirements of cellular users and IoT users are described as minimum transmission rates. QoS requirements are expressed as minimum transmission rates for cellular user k and IoT user k, respectively
Figure BDA0004095407450000091
And->
Figure BDA0004095407450000092
For each relay node, its total QoS requirement consists of its own transmission rate requirement and the transmission rate requirements of all external users associated with it, namely:
Figure BDA0004095407450000093
wherein M is k Is the set of external users that select k as the relay node.
The aim of the invention is to reduce the energy consumption while guaranteeing the quality of service of the user. For the following
Figure BDA0004095407450000094
The users in the network should optimize their transmitting power and resource allocation at the same time to reduce interference in the NOMA cluster and increase the user rate. For->
Figure BDA0004095407450000095
The relay selection should also be performed in a power-saving manner. Thus, in order to achieve the goal of reducing power consumption, the power consumption is characterized by the transmit power of the user, and the QoS compliance indication is a function of indicating whether the QoS requirements of the user are met. The object of the invention is toThe topic modeling is as follows:
Figure BDA0004095407450000096
wherein p is k Is the transmit power of user k,
Figure BDA0004095407450000097
and->
Figure BDA0004095407450000098
QoS requirements for cellular user k and IoT user k, respectively. I is a QoS indicator function, I (true) =1, I (false) =0. ω∈ (0, 1) is a weight factor that characterizes the importance of power consumption and user QoS satisfaction. η is an adjustment coefficient that adjusts the transmit power term and the QoS indicator term to the same order of magnitude. C1 and C2 represent the power constraints of cellular users and Internet of things users, respectively, +.>
Figure BDA0004095407450000101
And->
Figure BDA0004095407450000102
Minimum transmit power for cellular and IoT users, </i >>
Figure BDA0004095407450000103
And->
Figure BDA0004095407450000104
The maximum transmitting power is the maximum transmitting power of the cellular user and the Internet of things user. C3 denotes that each internal user can occupy only one subband.
Relay selection and resource allocation algorithm in single UAV assisted heterogeneous networks
It can be seen that it is very difficult to solve the problem (2-8) directly due to the large number of combinations of power and subbands. A feasible approach is to decompose the objective problem, optimizing the objective function for the internal and external users, respectively. In this section, an energy-saving relay selection scheme is designed for an external user first, and energy consumption and QoS requirements in the relay selection process are considered. After determining the relay for each external user, executing a joint power and subband selection algorithm for the internal user, and dynamically adjusting the power and subband selection under different environments by adopting deep reinforcement learning to realize the balance of reducing the user energy consumption and guaranteeing the user QoS.
Energy-saving relay selection algorithm
To resume communication for the external user, the external IoT user and the relay (i.e., the internal IoT user) are first associated in a power-efficient manner. The design of the energy-saving association scheme is carried out by adopting the many-to-one matching game, and the essence of energy saving is to determine the association scheme according to the consumed energy. Each external user is an applicant, and the relay has the right to decide whether to associate the corresponding applicant with itself. Firstly, a utility function is designed, utility values of all relays connected to each external user are calculated, and utility values of all requesters are received for the relays and calculated, wherein the utility values relate to energy consumed by the external users. The preference list of the relay and the external applicant is obtained in a descending order of utility values, respectively, and the external user is more inclined to select the relay consuming as little energy as possible. After the external user makes a connection request to the own preference relay, the relay decides whether to accept the application or not according to a pre-established preference list and a pre-defined maximum external user number acceptable for each relay.
User' s
Figure BDA0004095407450000105
Connect to candidate relay->
Figure BDA0004095407450000106
The utility value of relay j receiving application user i is calculated as follows:
Figure BDA0004095407450000107
wherein p is cir Is the static circuit power consumption at the external IoT user,
Figure BDA0004095407450000111
the minimum transmit power required for user i to reach QoS requirements when connected to relay j is calculated as follows:
Figure BDA0004095407450000112
in the present invention, the association between the external user and the internal user is expressed as a many-to-one matching game. The decision maker is an external user and an internal user which can act as a relay. Each decision maker has a preference list, and the final association result is obtained according to the preference sequence.
For each external user
Figure BDA0004095407450000113
Its preference relation for all potential relays with which a connection may be established> i The definition is as follows:
Figure BDA0004095407450000114
i.e. user i prefers to select j as its relay rather than j'.
For each relay
Figure BDA0004095407450000115
Its preference relation for all external users> j The definition is as follows:
Figure BDA0004095407450000116
i.e. relay j prefers to be a relay for user i rather than i'.
The relay selection algorithm comprehensively considers the indexes such as the channel quality, qoS and the like of the user, so the provided algorithm is also suitable for dynamic network scenes. The energy-saving relay selection algorithm summarizes the detailed process of the energy-saving relay selection algorithm provided by the invention. Initially, external users and relays are handed overChannel state information and other related information. Each external user and relay builds a preference list from the utility values calculated by equation (2-9). Each external user then issues a connection request to the preferred relay in its list. After the relay receives the request, ranking the applicant according to the preference list. The maximum number of external users acceptable by the relay is set as N max In the present invention N max Is set to 4. Each relay accepts the applicant with the highest rank in its own utility list, and the remaining applicant will be rejected. Each rejected applicant then updates the list and makes a connection request to the next preferred relay. After all external users are associated with the relay, the association process ends.
Power and subband selection based on deep reinforcement learning
The section performs power and subband selection for internal users in the unmanned network. After the relay selection algorithm of the previous section is performed, the D2D link between the external user and the internal relay has been constructed. Since some of the internal users play a role of relay, the internal users have different QoS requirements in terms of transmission rate. In this case, the allocation of resources between internal users can have a significant impact on system performance. The deep reinforcement learning is adopted to carry out dynamic and self-adaptive resource allocation, so that the optimization target of reducing energy consumption while guaranteeing the service quality of users is realized. To facilitate an understanding of the proposed algorithm, the present invention first briefly introduces deep reinforcement learning underlying knowledge, and then sets forth the proposed framework and implementation of the algorithm based on power and subband selection for deep reinforcement learning.
Deep reinforcement learning foundation
The objective problem of the present invention relates to the competing relationship between users for spectrum resources, and the trade-off of user QoS satisfaction and power consumption, which is not convex and cannot be solved by means of conventional optimization techniques. In this case, reinforcement learning (Reinforcement Learning, RL) can be derived from the design strategy through interaction with the network environment and continually learn and refine decisions to increase objective function values. Reinforcement learning is a method for learning, predicting and decidingThe policy method or framework is that an agent automatically obtains an optimal policy through interaction with the environment. The basic elements of reinforcement learning can be represented by a five-tuple (S, a, pi, p, R). The agent learns and makes decisions by perceiving the environmental state S. At training time t, the observed environmental state of the agent is s t And selecting to execute the action a according to the strategy pi t E A. When the agent performs action a t Thereafter, the network state is from s t Conversion to s t+1 At the same time, the intelligent agent obtains real-time rewards r t E R. Thus, the agent is in state s t Lower execution action a t The probability of (c) can be characterized as a conditional transition probability p(s) t+1 ,r t |s t ,a t )。
The main goal of reinforcement learning is to find a strategy to maximize the jackpot that considers not only instant rewards, but also future rewards:
Figure BDA0004095407450000121
wherein R is t Is the real-time rewards received by the agent in time slot t. Gamma is a discounting factor, and as gamma approaches 0, the agent is more concerned about short-term rewards; as γ approaches 1, long term rewards are considered more important.
The invention solves the target problem by adopting the reinforcement learning method-Q learning which is most widely applied. In Q learning, the policies of the current agent are reflected by way of building a Q table in which the corresponding Q values of the different state-action pairs are stored. According to the Belman equation and the time sequence difference learning method, the intelligent agent observes the state s at the time t t And performs action a t Then, the corresponding Q value is updated as follows:
Q(s t ,a t )=Q(s t ,a t )+η(r t+1 +γma a x ∈A Q(s t+1 ,a)-Q(s t ,a t )) (2-14)
where η is the learning rate. Along with state transition in the learning process, the Q table is gradually stable, and the optimal strategy function of each agent can be obtained.
However, Q learning can only effectively solve the problem that both states and actions are discrete and not large in dimension, since Q tables can only record the Q values of a limited number of state-action pairs. However, in many practical problems or tasks, the number of states and optional actions traversed is large, which makes Q learning inefficient in solving the problem, and unsuitable for solving such complex problems. In this case, deep Q Learning (DQL) can apply a Learning strategy in state transitions from high and continuous state spaces with a strong Learning capability of Deep Learning (DL), which has been widely studied and applied in unmanned aerial vehicle networks.
The core idea of DQL is to estimate and approximate complex nonlinear Q functions Q (S, a) using Deep Q-Network (DQN):
Q(s t ,a t ;θ)≈Q * (s t ,a t ) (2-15)
wherein Q(s) t ,a t The method comprises the steps of carrying out a first treatment on the surface of the θ) is a Q-value function estimated by the DQN with the parameter θ, Q * (s t ,a t ) Is practical(s) t ,a t ) Corresponding Q values.
In DQN, the input is a state vector and the output is a value function vector containing the Q value for each action in that state. At time t, the agent observes a state s by interacting with the surrounding environment t Selecting action a according to a policy function t And obtain rewards r t While the network environment transitions to state s t+1 . Then, the data samples { s } t ,a t ,r t ,s t+1 Stored in memory pool for training. The optimal DQN parameter is calculated by minimizing the following loss function:
Figure BDA0004095407450000131
in addition, to overcome the problem of instability in the DQN training process, experience playback and target networks are introduced into the DQN to stabilize the learning process. Experience playback refers to randomly selecting a training sample set from a memory pool for training each time, so as to avoid correlation among samples. Meanwhile, the main network is used for selecting actions, and constructing another target network which is completely consistent with the current main network structure and is used for calculating a target Q value, so that the correlation between the target Q value and the main network Q value is reduced. At this time, the parameter θ of the current network is updated by back propagation using the gradient descent method, and the loss function of the target minimization is defined as:
Figure BDA0004095407450000145
where B is a set of samples randomly selected from the memory pool, and θ' are parameters of the primary and target networks, respectively. The detailed process of the agent interacting with the environment and learning is shown in fig. 3:
NOMA sub-band and power selection algorithm based on deep Q learning
As described above, in NOMA, in order to reduce SIC complexity, a NOMA clustering method is adopted. Thus, in the present invention, the internal users are divided into different clusters according to their distance from the UAV coverage center. For convenience, the number of users in each cluster is set to be the same. The joint power and subband selection based on deep Q learning proposed by the present invention is performed in each cluster separately. In NOMA clustering, each cluster occupies one orthogonal frequency band (i.e., subband) and there is no interference between users of different clusters. The DQL model and algorithm implementation of the present invention are described in detail below.
1) State, action, and rewards definitions in the DQL model:
the environmental state that each agent k can observe at time slot t is made up of several parts:
channel gain information to drone for all users of the cluster in which user k is located in slot t-1:
Figure BDA0004095407450000141
wherein N is s Is the number of subbands per cluster, +.>
Figure BDA0004095407450000142
Is the channel gain between UAV and user K in time slot t-1, K c Is the total number of users in the same cluster as user k.
Interference power level received by user k in each subband during slot t-1:
Figure BDA0004095407450000143
wherein the method comprises the steps of
Figure BDA0004095407450000144
Representing the interference that agent k receives on subband n during time slot t-1.
QoS compliance indication vector for users in the same NOMA cluster as agent k within slot t-1:
Figure BDA0004095407450000151
wherein->
Figure BDA0004095407450000152
Indicating that the rate of user k in time slot t-1 meets the QoS requirement;
Figure BDA0004095407450000153
it indicates that it has not been reached.
Summarizing, the observed state of each agent at time t can be expressed as s t ={H t-1 ,I t-1 ,ACK t-1 }。
Optional set of actions at time slot t for each agent k: in the reinforcement learning framework proposed by the present invention, the agents are internal users and the actions are combinations of power levels and subbands that each agent can select. The uplink transmission power range of the internal user is uniformly dispersed into N p Each cluster is assigned N s Sub-bands, thus each agent has an action space of size N p N s
Rewards received by each agent k in time slot t: based on the state s observed by the agent in time slot t t And selected action a t The environment will transition to a new state s t+1 Intelligent bodyWill immediately get rewards R t . Since the objective of the present invention is to achieve a trade-off of energy consumption and QoS for internal users, in the proposed reinforcement learning framework the reward function consists of two parts, namely the sum of power consumption and QoS indications for all users in the same NOMA cluster. Specifically, the transient rewards function of agent k (cellular or IoT user) at time slot t is designed to:
Figure BDA0004095407450000154
wherein the method comprises the steps of
Figure BDA0004095407450000155
And->
Figure BDA0004095407450000156
Representing cellular users and IoT users in the same cluster as user k, i.e. +.>
Figure BDA0004095407450000157
2) The execution process of the deep Q learning algorithm comprises the following steps:
there are two phases in deep Q learning: an offline training phase and a testing phase. Unlike deep learning, there is no concept of training data set or test data set in deep Q learning. The intelligent agent learns the optimal strategy by executing state transition after different actions, and tests the performance of the learned strategy in a real environment. In the off-line training stage, the intelligent agent traverses states as much as possible through interaction with the environment, continuously learns and improves the action selection strategy of the intelligent agent, and finally obtains a stable Q value approximate function. When the offline training phase is completed, the agent has obtained an action selection strategy that achieves the best cumulative discount rewards in different environmental conditions. In the test phase, the agent uses the learned strategy to guide the action selection under different environmental states.
Since each user in the network acts as an agent and the different agents perform actions with independence, when the actions are performed synchronously, the agents are unaware of itHis actions selected by the agent. In this case, I in the input state vector t-1 Not the latest value, the observed state of each agent cannot accurately represent the real environment in real time. For this reason, in the deep Q learning algorithm designed in the present invention, the agent is designed to asynchronously update its action strategy. That is, only one agent performs action selection in each time slot, while the other agents do not perform actions. Using an asynchronous strategy, each agent may be able to record I of the input state t-1 And ACK (acknowledgement character) t-1 To observe environmental changes caused by the behavior of other agents, which may reduce false action selections due to inaccurate observations of environmental conditions.
The detailed process of implementing dynamic power and subband selection based on deep Q learning is presented in the power and subband selection algorithm based on deep Q learning. At the beginning of training, the structure and other parameters of the two deep neural networks are initialized. Each agent then interacts with the environment in turn and performs a selection of actions according to a policy of on-greedy, which balances exploration and utilization, a random policy commonly used in deep reinforcement learning. The probability of the agent randomly selecting an action is co, and the probability of executing an action selection strategy with the maximum Q value is 1-co. Under the maximum Q value strategy, the agent is selected in the state s t Action a with maximum Q value t
a t =argmax a∈A Q(s t ,a|θ) (2-19)
Wherein Q(s) t A|θ) is state s t And selecting the Q value corresponding to the action a.
Although the invention has been described hereinabove with reference to embodiments, various modifications thereof may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In particular, the features of the disclosed embodiments may be combined with each other in any manner as long as there is no structural conflict, and the exhaustive description of these combinations is not given in this specification merely for the sake of omitting the descriptions and saving resources. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (6)

1. The joint relay selection and NOMA channel and power distribution algorithm in the single UAV heterogeneous network is characterized in that: the method comprises the following steps:
step 1: establishing a system model, constructing a heterogeneous network scene assisted by a single unmanned aerial vehicle, and after the ground base station service is interrupted due to the occurrence of scenes such as disasters and the like, deploying the unmanned aerial vehicle in the air as an air base station for serving the ground user and promoting the rescue process of the ground network in order to transfer the emergency call and information of the ground cellular user and the monitoring data of the Internet of things equipment;
step 2: an optimization problem description describing QoS requirements of cellular users and IoT users as minimum transmission rates;
step 3: in order to restore the communication of external users, the energy-saving relay selection algorithm firstly correlates the external IoT users with relays (namely the internal IoT users) in an energy-saving mode, adopts a many-to-one matching game to design an energy-saving correlation scheme, and the energy-saving essence is to determine the correlation scheme according to the consumed energy, wherein each external user is an applicant, and the relays have the right to decide whether to correlate the corresponding applicant with the relay;
step 4: based on the power and subband selection of the deep reinforcement learning, after the relay selection algorithm of the previous step is executed, the D2D link between the external user and the internal relay is already constructed, and as part of the internal users play the role of the relay, the internal users have different QoS requirements in terms of transmission rate, in this case, the resource allocation among the internal users has great influence on the system performance, and the optimization target of reducing the energy consumption while guaranteeing the service quality of the users is realized by adopting the deep reinforcement learning to dynamically and adaptively allocate the resources.
2. The joint relay selection and NOMA channel, power allocation algorithm in a single UAV heterogeneous network of claim 1, wherein: in the step 1:
because the coverage of a single UAV is limited, only part of ground users are in the coverage of the UAV, other users are outside the coverage area and cannot directly communicate with the UAV, therefore, D2D-based multi-hop transmission is adopted to transmit signaling to external users, the hop count is equal to 2, namely, the users outside the coverage area of the UAV transmit data to the user relay in the coverage area, then the relay transmits the data to the UAV, the ground user set is denoted as U, and the UAV consists of cellular equipment and IoT equipment and is respectively denoted as U C And U I I.e. u=u C ∪U I And the Internet of things equipment set is divided into
Figure FDA0004095407440000021
And->
Figure FDA0004095407440000022
The method comprises the steps that two parts correspond to the internet of things equipment inside and outside the coverage area of the unmanned aerial vehicle respectively, the coverage area of the unmanned aerial vehicle is described according to the path loss, when the path loss from a ground user to the unmanned aerial vehicle is smaller than a preset threshold value, the ground user is considered to be an internal user and can directly communicate with the unmanned aerial vehicle, otherwise, the ground user is classified as an external user and can only communicate with the unmanned aerial vehicle in a relay transmission mode, the external user only comprises an internet of things (IoT) user and can extend to a scene that the external user is a cellular user, the external IoT user only can select the internal IoT user as a relay of the internal IoT user, and the cellular and IoT equipment have different QoS requirements and are respectively expressed as +>
Figure FDA0004095407440000023
And->
Figure FDA0004095407440000024
The positions of the drone and the user k e U are denoted as [ x ], respectively u ,y u ,h]And [ x ] k ,y k ]Wherein [ x ] u ,y u ]Is the horizontal position of unmanned aerial vehicle, and h is unmanned aerial vehicle's fixed height.
3. The joint relay selection and NOMA channel, power allocation algorithm in a single UAV heterogeneous network of claim 1, wherein: in the step 1:
the method comprises the steps that a NOMA scheme is adopted for transmitting data for internal users on the ground, the coverage area of the unmanned aerial vehicle is a circle formed by taking the horizontal coordinate of the unmanned aerial vehicle as an origin and the coverage radius of the unmanned aerial vehicle as a radius, clustering is conducted in the circle according to the distance from the user to the origin, for example, when 16 internal users exist in a network and are divided into 4 NOMA clusters, the 16 users are ordered in ascending order from the near to the far according to the distance, the users are distributed to different clusters at equal intervals, users 1, 5, 9 and 13 form a first cluster, users 2, 6, 10 and 14 form a second cluster, and the like, the 16 users are distributed to the 4 clusters, and therefore 4 users exist in each cluster.
4. The joint relay selection and NOMA channel, power allocation algorithm in a single UAV heterogeneous network of claim 1, wherein: the objective of step 2 is to reduce energy consumption while guaranteeing user quality of service for
Figure FDA0004095407440000025
The users in (1) should optimize their transmit power and resource allocation at the same time to mitigate the interference in NOMA cluster, increase the user rate, for +.>
Figure FDA0004095407440000026
The relay selection should also be performed in an energy-saving manner, so that, in order to achieve the goal of reducing power consumption, the power consumption is characterized by the transmit power of the user, and the QoS compliance indication is to indicate whether the QoS requirement of the user is met by an indication function.
5. The joint relay selection and NOMA channel, power allocation algorithm in a single UAV heterogeneous network of claim 1, wherein: in step 3, a utility function is designed first, utility values of each external user connected to all relays are calculated, utility values of each applicant are calculated for the relays, the utility values are related to energy consumed by the external users, preference lists of the relays and the external applicant are obtained respectively in a manner of descending order of the utility values, the external users are more prone to selecting relays consuming as little energy as possible, and after the external users put forward connection requests to the own preference relay, the relays determine whether to accept the application according to the pre-established preference list and the pre-defined maximum external user number acceptable for each relay.
6. The joint relay selection and NOMA channel, power allocation algorithm in a single UAV heterogeneous network of claim 1, wherein: the step 4 comprises a deep reinforcement learning base, a NOMA sub-band based on deep Q learning and a power selection algorithm.
CN202310164386.9A 2023-02-25 2023-02-25 Joint relay selection and NOMA channel and power allocation algorithm in single UAV heterogeneous network Pending CN116133081A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310164386.9A CN116133081A (en) 2023-02-25 2023-02-25 Joint relay selection and NOMA channel and power allocation algorithm in single UAV heterogeneous network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310164386.9A CN116133081A (en) 2023-02-25 2023-02-25 Joint relay selection and NOMA channel and power allocation algorithm in single UAV heterogeneous network

Publications (1)

Publication Number Publication Date
CN116133081A true CN116133081A (en) 2023-05-16

Family

ID=86301027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310164386.9A Pending CN116133081A (en) 2023-02-25 2023-02-25 Joint relay selection and NOMA channel and power allocation algorithm in single UAV heterogeneous network

Country Status (1)

Country Link
CN (1) CN116133081A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116709255A (en) * 2023-08-04 2023-09-05 中国人民解放军军事科学院系统工程研究院 Distributed selection method for relay unmanned aerial vehicle under incomplete information condition

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116709255A (en) * 2023-08-04 2023-09-05 中国人民解放军军事科学院系统工程研究院 Distributed selection method for relay unmanned aerial vehicle under incomplete information condition
CN116709255B (en) * 2023-08-04 2023-10-31 中国人民解放军军事科学院系统工程研究院 Distributed selection method for relay unmanned aerial vehicle under incomplete information condition

Similar Documents

Publication Publication Date Title
Tang et al. Deep reinforcement learning for dynamic uplink/downlink resource allocation in high mobility 5G HetNet
Zhou et al. A deep-learning-based radio resource assignment technique for 5G ultra dense networks
Zhang et al. Beyond D2D: Full dimension UAV-to-everything communications in 6G
CN110493826B (en) Heterogeneous cloud wireless access network resource allocation method based on deep reinforcement learning
Zhang et al. Intelligent user association for symbiotic radio networks using deep reinforcement learning
Qin et al. Distributed UAV-BSs trajectory optimization for user-level fair communication service with multi-agent deep reinforcement learning
CN108322938B (en) Power distribution method based on double-layer non-cooperative game theory under ultra-dense networking and modeling method thereof
CN105072676B (en) Aeronautical Ad hoc networks Poewr control method based on TDMA agreement
CN114867030B (en) Dual-time scale intelligent wireless access network slicing method
Chandrasekharan et al. Clustering approach for aerial base-station access with terrestrial cooperation
Sikeridis et al. Self-adaptive energy efficient operation in UAV-assisted public safety networks
CN116133081A (en) Joint relay selection and NOMA channel and power allocation algorithm in single UAV heterogeneous network
CN113225794A (en) Full-duplex cognitive communication power control method based on deep reinforcement learning
Guo et al. Resource allocation for aerial assisted digital twin edge mobile network
Benfaid et al. AdaptSky: A DRL based resource allocation framework in NOMA-UAV networks
Alenezi et al. Reinforcement learning approach for content-aware resource allocation in hybrid WiFi-VLC networks
Badnava et al. Spectrum-aware mobile edge computing for uavs using reinforcement learning
Khodmi et al. MDP-based handover in heterogeneous ultra-dense networks
CN111935777B (en) 5G mobile load balancing method based on deep reinforcement learning
Zhang et al. Machine learning driven UAV-assisted edge computing
CN105764068A (en) Small base station capacity and coverage optimization method based on tabu search
Madelkhanova et al. Optimization of cell individual offset for handover of flying base station
Khuntia et al. An actor-critic reinforcement learning for device-to-device communication underlaying cellular network
Zheng et al. NSATC: An interference aware framework for multi-cell NOMA TUAV airborne provisioning
Gong et al. Distributed DRL-based resource allocation for multicast D2D communications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination