CN115567093A - Air network resource allocation method, device, electronic equipment and storage medium - Google Patents

Air network resource allocation method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115567093A
CN115567093A CN202211048190.5A CN202211048190A CN115567093A CN 115567093 A CN115567093 A CN 115567093A CN 202211048190 A CN202211048190 A CN 202211048190A CN 115567093 A CN115567093 A CN 115567093A
Authority
CN
China
Prior art keywords
network
air network
low
altitude platform
air
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211048190.5A
Other languages
Chinese (zh)
Inventor
尹梦君
林巍
王超
李强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Communication Technology Co Ltd
Original Assignee
Inspur Communication Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Communication Technology Co Ltd filed Critical Inspur Communication Technology Co Ltd
Priority to CN202211048190.5A priority Critical patent/CN115567093A/en
Publication of CN115567093A publication Critical patent/CN115567093A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/18502Airborne stations
    • H04B7/18504Aircraft used as relay or high altitude atmospheric platform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/18Network planning tools
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/90Services for handling of emergency or hazardous situations, e.g. earthquake and tsunami warning systems [ETWS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W64/00Locating users or terminals or network equipment for network management purposes, e.g. mobility management
    • H04W64/003Locating users or terminals or network equipment for network management purposes, e.g. mobility management locating network equipment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/02Hierarchically pre-organised networks, e.g. paging networks, cellular networks, WLAN [Wireless Local Area Network] or WLL [Wireless Local Loop]
    • H04W84/04Large scale networks; Deep hierarchical networks
    • H04W84/06Airborne or Satellite Networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Astronomy & Astrophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Environmental & Geological Engineering (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Emergency Management (AREA)
  • Medical Informatics (AREA)
  • Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides a method and a device for allocating air network resources, electronic equipment and a storage medium, which belong to the technical field of communication and comprise the following steps: establishing a resource optimization problem model of the air network by taking the maximum air network energy efficiency as a target and combining target constraint conditions; the air network is constructed based on various types of low-altitude platform equipment; solving the resource optimization problem model by adopting a deep reinforcement learning algorithm, and determining the target deployment position and the target transmitting power of each low-altitude platform device in the air network and the channel allocation information of each terminal; the multi-type low-altitude platform equipment comprises an aerial base station, an aerial radio frequency unit and a millimeter wave enhanced unmanned aerial vehicle; the channel allocation information is the channel information between the terminal and each low-altitude platform device. The invention can effectively optimize and distribute the resources of the aerial network while improving the communication coverage of aerial network recovery, realizes the energy efficiency optimization of the emergency aerial network, can ensure the aerial network performance and saves the network deployment cost.

Description

Air network resource allocation method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to an air network resource allocation method, an apparatus, an electronic device, and a storage medium.
Background
With the increasing maturity of fifth generation communication technologies, they have gained popularity and application in more and more fields.
At present, in an emergency scene after a disaster, an unmanned aerial vehicle base station can be used as an aerial base station to establish an aerial network to form an aerial communication network, but the communication coverage area which can be recovered by the existing aerial network is limited. Meanwhile, due to resource shortage in an emergency scene, after communication equipment of the air network is deployed, the problems of energy efficiency, resource utilization and the like need to be considered for recovery of the air network, and the network energy efficiency of the existing air network needs to be further optimized and upgraded.
Therefore, how to more reasonably allocate resources to an air communication network while improving the communication coverage of air network recovery and achieve maximization of energy efficiency has become an urgent technical problem to be solved in the industry.
Disclosure of Invention
The invention provides an air network resource allocation method, an air network resource allocation device, electronic equipment and a storage medium, which are used for more reasonably allocating resources to an air communication network while improving the communication coverage of air network recovery.
The invention provides an air network resource allocation method, which comprises the following steps:
establishing a resource optimization problem model of the air network by taking the maximum air network energy efficiency as a target and combining target constraint conditions; the air network is constructed based on multiple types of low-altitude platform equipment;
solving the resource optimization problem model by adopting a deep reinforcement learning algorithm, and determining the target deployment position and the target transmitting power of each low-altitude platform device in the air network and the channel allocation information of each terminal; the multi-type low-altitude platform equipment comprises an aerial base station, an aerial radio frequency unit and a millimeter wave reinforced unmanned aerial vehicle; the channel allocation information is the channel information between the terminal and each low-altitude platform device.
According to the method for allocating the air network resources, which is provided by the invention, the aim of maximizing the energy efficiency of the air network is taken, and the target constraint condition is combined to establish a resource optimization problem model of the air network, which comprises the following steps:
determining the network capacity of the air network based on the transmission rate of each terminal and the channel allocation information of each terminal, and determining the network capacity under unit cost based on the network capacity of the air network and the total deployment cost of the air network; a total deployment cost of the over-the-air network is determined based on hardware costs and energy consumption costs of each of the low altitude platform devices;
establishing an objective function according to the maximum network capacity under the unit cost as an optimization objective; the air network energy efficiency comprises the network capacity at the unit cost;
determining the target constraint condition based on the transmitting power of each low-altitude platform device in the air network and the channel allocation information of each terminal;
and establishing a resource optimization problem model of the air network based on the objective function and the objective constraint condition.
According to the method for allocating the air network resources provided by the invention, the resource optimization problem model is solved by adopting a deep reinforcement learning algorithm, and the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network are determined, wherein the method comprises the following steps:
performing iterative training on a deep Q network model in the deep reinforcement learning algorithm based on the state information and the reward function of each low-altitude platform device in the air network to obtain an optimal action value profit value of the air network energy efficiency; the reward function is determined based on a network capacity of the over-the-air network and a total deployment cost of the over-the-air network;
and determining the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network based on the optimal action value revenue value.
According to the air network resource allocation method provided by the invention, iterative training is carried out on a deep Q network model in the deep reinforcement learning algorithm based on the state information of each low-altitude platform device in the air network and the reward function, so as to obtain the optimal action value profit value of the air network energy efficiency, and the method comprises the following steps:
step 1, establishing a state space and an action space according to state information of each low-altitude platform device in the air network;
step 2, determining the initial state of the air network based on the state space, and dividing the time of each training into a plurality of time intervals; the initial state is the state of the over-the-air network in a first time interval;
step 3, determining a first reward value after the air network executes a first action in a first state in a current time interval and a second state in a next time interval; the first action is determined based on the action space;
step 4, storing the first state, the first action, the first reward value and the second state as a data sample to a memory unit, randomly extracting a data sample from the memory unit, and updating the network parameters of the deep Q network model; the first reward value is determined based on the reward function;
step 5, traversing the plurality of time intervals, executing the step 3 to the step 4, completing one iterative training of the deep Q network model, and obtaining a maximum action value profit value of the trained air network energy efficiency;
step 6, traversing preset iteration times, executing the steps 3 to 5, and training the depth Q network model for the preset iteration times to obtain the trained optimal action value income value of the air network energy efficiency; and the optimal action value profit value is the maximum action value profit value of the air network energy efficiency obtained through the last iterative training.
According to the air network resource allocation method provided by the invention, before the iterative training of the deep Q network model in the deep reinforcement learning algorithm is carried out based on the state information and the reward function of each low-altitude platform device in the air network to obtain the optimal action value profit value of the air network energy efficiency, the method further comprises the following steps:
dividing the time of each training into a plurality of time intervals;
determining, based on a network capacity of the over-the-air network and a total deployment cost of the over-the-air network, a first increment of the network capacity at a current time interval relative to a previous time interval and a second increment of the total deployment cost at the current time interval relative to the previous time interval;
determining a reward function in the deep reinforcement learning algorithm based on a ratio of the first increment to the second increment.
According to the method for allocating the air network resources provided by the invention, the establishment of the state space and the action space according to the state information of each low-altitude platform device in the air network comprises the following steps:
establishing a state space according to the deployment position and the transmitting power of each low-altitude platform device and the channel allocation information of each terminal;
and establishing an action space according to the movement type of each low-altitude platform device, the adjustment type of the transmitting power of each low-altitude platform device and the access state information between each low-altitude platform device and each terminal.
According to the air network resource allocation method provided by the invention, the air base station is used for controlling the wireless coverage range of a cell; the aerial radio frequency unit is used for expanding the wireless coverage range under the control of the aerial base station; the millimeter wave enhanced unmanned aerial vehicle is used for expanding the capacity of the aerial base station and the aerial radio frequency unit.
The invention also provides an air network resource allocation device, which comprises:
the system comprises an establishing module, a resource optimization problem model and a resource optimization problem model, wherein the establishing module is used for establishing a resource optimization problem model of an air network by taking the maximum air network energy efficiency as a target and combining target constraint conditions; the air network is constructed based on multiple types of low-altitude platform equipment;
the allocation module is used for solving the resource optimization problem model by adopting a deep reinforcement learning algorithm and determining the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network; the multi-type low-altitude platform equipment comprises an aerial base station, an aerial radio frequency unit and a millimeter wave enhanced unmanned aerial vehicle; the channel allocation information is the channel information between the terminal and each low-altitude platform device.
The present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the air network resource allocation method as described in any of the above.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an over-the-air network resource allocation method as described in any one of the above.
The present invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a method of over the air network resource allocation as described in any one of the above.
According to the air network resource allocation method, the device, the electronic equipment and the storage medium, the air network is constructed by adopting a network deployment mechanism jointly participated by various types of low-altitude platform equipment, wherein the various types of low-altitude platform equipment comprise an air base station, an air radio frequency unit and a millimeter wave reinforced unmanned aerial vehicle, and an emergency air network with high energy efficiency as a target is formed on the basis of the large-range coverage capability of the air base station, the network coverage expansion capability of the air radio frequency unit and the capacity expansion capability of the millimeter wave reinforced unmanned aerial vehicle, so that the communication coverage area of emergency air network recovery can be improved, and local capacity reinforcement and resource reuse can be realized; by taking the maximum air network energy efficiency as a target and combining a target constraint condition, a resource optimization problem model of the air network is established, and in view of the fact that the model belongs to a non-convex optimization problem, the resource optimization problem model is solved by adopting a deep reinforcement learning algorithm according to the advantages of the deep reinforcement learning algorithm for solving the non-convex optimization problem, resource optimization and allocation can be effectively carried out on the air network, the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network are determined, the energy efficiency optimization of the emergency air network is realized, the air network performance can be ensured, and the network deployment cost can be saved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart illustrating an over-the-air network resource allocation method provided by the present invention;
FIG. 2 is a schematic structural diagram of an over-the-air network resource allocation apparatus provided in the present invention;
fig. 3 is a schematic physical structure diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The air network resource allocation method, apparatus, electronic device and storage medium of the present invention are described below with reference to fig. 1-3.
Fig. 1 is a schematic flow chart of an air network resource allocation method provided by the present invention, as shown in fig. 1, including: step 110 and step 120.
Step 110, establishing a resource optimization problem model of the air network by taking the maximum air network energy efficiency as a target and combining target constraint conditions; the air network is constructed based on various types of low-altitude platform equipment;
specifically, the air network described in the embodiment of the present invention may be applicable to a scene of emergency after a disaster, and may also be applicable to other scenes in which an air network needs to be established for emergency network communication.
It should be noted that, in the embodiment of the present invention, an air network may include a High Altitude Platform (HAP) and a Low Altitude Platform (LAP), and the Low Altitude Platform (LAP) is easier to deploy than the High Altitude Platform (HAP) and is consistent with the concept of cellular network application. HAPs have high area coverage and long endurance, but are costly and prone to strong interference. Compared with the prior art, the LAP is more flexible and efficient, and the emergency deployment cost performance is high. LAP can be divided into rotorcraft, fixed wing aircraft and airships, depending on the drone type.
In the embodiment of the present invention, the air network is constructed based on multiple types of LAP devices, where the multiple types of LAP devices include an air Base Station (AeBS), an air Radio frequency unit (AeRRH), and a millimeter wave-enhanced unmanned Aerial vehicle (mmW-UAV), that is, in the embodiment of the present invention, a deployment architecture of the air network may include four network units, namely, the AeBS, the AeRRH, the mmW-UAV, and a User Equipment (UE).
Optionally, in an embodiment of the present invention, the AeBS is used to control the radio coverage of the cell; the AeRRH is used for expanding the wireless coverage range under the control of the AeBS; mmW-UAV is used to extend the capacity of airborne base stations and airborne radio frequency units.
Wherein, in the embodiment of the invention, the AeBS can be an independent large-scale low-altitude platform. And the motor boat is used as a carrier, and a base station module is carried to complete rapid deployment. The radio coverage of the cell is controlled by appropriately setting the height and transmit power of the AeBS. Although the AeBS is inconvenient to move, it has a strong carrying capacity and can provide a wide coverage.
In the embodiment of the invention, the AeRRH may be a baseband-free processing unit, which refers to a fixed wing aircraft carrying a radio remote unit, cannot work independently, needs to perform data transmission under the control of the AeBS, and can be used for gap filling of a coverage edge, expanding a wireless coverage range and making up for the deficiency of the coverage range of the AeBS.
In the embodiment of the invention, the mmW-UAV can be an independent small-sized aerial platform, and in the embodiment of the invention, a fixed wing aircraft carrying a millimeter wave module can be selected as a simulation experiment object. Can be rapidly deployed in an open personnel gathering scene to expand the capacity of the AeBS and the AeRRH.
The user equipment UE comprises a common mobile phone terminal and special emergency equipment. May communicate with each other through an AeBS, aeRRH, or mmW-UAV.
The communication link in the emergency scenario includes: satellite backhaul links, inter-AeBS links, aeBS-RRH links, etc. These links provide reliable and flexible network connectivity for the LAP and the UE.
All LAPs have sufficient signal strength and resources, assuming that the communication quality of the satellite-to-AeBS backhaul link is guaranteed. It is assumed that all UAVs carry 2 transmission modes, namely, conventional band transmission and millimeter wave band transmission. Each UAV may freely switch transmission modes. When the UAV uses the conventional frequency band as the AeRRH to fill the coverage gap, its channel model is the same as the AeBS, except that the flight altitude and the transmission power are limited by the carrier. The channel model for HPN, the channel models for AeBS and AeRRH, and the channel model for mmW-UAV are different due to different transmission modes.
In this embodiment, the AeBS and AeRRH channel models are established. In one aspect, the path loss between the AeBS and the UE includes free space fading, FSPL, and large scale fading. On the other hand, the coverage area of an AeBS is related to its power and height. At the same height, the higher the AeBS transmit power, the larger the coverage area. The higher the AeBS height, the larger the coverage area at the same transmit power.
The line-of-sight transmission probability and the non-line-of-sight transmission probability are represented by P (LoS) and P (NLos), respectively. AeBS i Sending to user UE j Path loss of
Figure BDA0003823020920000081
As formula (1):
Figure BDA0003823020920000082
by combining Line of Sight (LOS) and Non-Line of Sight (NLOS) in one formula, the path LOSs expression is:
Figure BDA0003823020920000091
wherein free space fading FSPL ij =20logd ij +20 kgf-27.55, carrier frequency f =2000MHz, and AeBS i To user UE j The distance of (c):
Figure BDA0003823020920000092
θ ij representing the angle of the transmission path to the ground. The line-of-sight transmission probability and the non-line-of-sight transmission probability are as shown in equation (3) and equation (4).
P(LoS)=a(θ ijo ) b ; (3)
P(NLoS)=1-a(θ ijo ) b ; (4)
Wherein, theta o Is at 15 DEG theta ij ∈[θ o ,90°],(a,b)=(0.76,0.06),(PL LoS ,PL NLoS ) = (0.1,21). Thus, the AeBS i Over U2E linksWhile transmitting, UE j The received power of (a) is:
Figure BDA0003823020920000093
in this embodiment, a channel model of the mums-UAV is established. The UAV is mainly responsible for hot spot coverage and channel model reference when using the millimeter wave frequency band. The mmW-UAV and the antenna array of the target user form a beam for data transmission, and a sector antenna model is used for carrying out approximate calculation on antenna gain G for convenient calculation A . The main lobe gain is defined as M, and the side lobe gain is defined as M. Gain G of the antenna A Can be expressed as equation (6).
Figure BDA0003823020920000094
When the antenna of the mmW-UAV is perfectly aligned with the antenna of the UE, the antenna gain is M; when the side lobes are aligned, the antenna gain is m; the rest are M. The main lobe width δ of the beam may be 30 °. M =10dB, M = -10dB. Thus, UAV i When transmitting through millimeter wave frequency band, UE j Is shown in equation (7).
Figure BDA0003823020920000101
Wherein the content of the first and second substances,
Figure BDA0003823020920000102
is calculated by the method
Figure BDA0003823020920000103
The same, the parameter values are different.
In embodiments of the invention, the deployment of mmW-UAVs is to solve the dense user problem, so the scenario can be fixed to a dense urban environment. At carrier frequency f =28GHz, (a, b) = (0.33,0.23), (PL) and LoS ,PL NLoS )=(2,2.65)。
according to the method provided by the embodiment of the invention, the aerial network is constructed by deploying various types of LAP equipment comprising the AeBS, the AeRRH and the mmW-UAV, and the emergency aerial network with high energy efficiency as a target is formed on the basis of the large-range coverage capability of the AeBS, the network coverage expansion capability of the AeRRH and the capacity expansion capability of the mmW-UAV, so that the communication coverage area for recovering the emergency aerial network can be improved, and the local capacity enhancement and resource reuse can be realized.
In the post-disaster emergency network, the joint deployment of multiple LAP devices needs to avoid the problems of over-deployment, over-coverage and resource waste besides the need to meet the communication requirements of users in the network.
Further, in the embodiment of the invention, a resource optimization problem model of the air network is established by taking the maximization of the energy efficiency of the air network as a target and combining target constraint conditions.
Based on the content of the foregoing embodiment, as an optional embodiment, in step 110, with the goal of maximizing the energy efficiency of the air network as a target, and in combination with a target constraint condition, establishing a resource optimization problem model of the air network, includes:
determining the network capacity of the air network based on the transmission rate of each terminal and the channel allocation information of each terminal, and determining the network capacity under unit cost based on the network capacity of the air network and the total deployment cost of the air network; the total deployment cost of the over-the-air network is determined based on the hardware cost and the energy consumption cost of each low-altitude platform device;
establishing an objective function according to the maximum network capacity under unit cost as an optimization target; the air network energy efficiency comprises the network capacity at unit cost;
determining a target constraint condition based on the transmitting power of each low-altitude platform device in the air network and the channel allocation information of each terminal;
and establishing a resource optimization problem model of the air network based on the objective function and the objective constraint condition.
Specifically, the terminal described in the embodiment of the present invention refers to a user equipment UE side in an air network. The method can be specifically used for all user terminal equipment in a disaster area in an emergency scene after a disaster.
The channel allocation information described in the embodiment of the present invention is channel information between a terminal and each low-altitude platform device, that is, allocation information of each LAP device channel to which each terminal is allocated and accessed.
In this embodiment, before determining the problem model, a network deployment cost index is defined, and the deployment is measured and limited by the index. The low-altitude platform equipment is divided into the following types according to different types: aeBS cost
Figure BDA0003823020920000111
AeRRH cost
Figure BDA0003823020920000112
And mmW-UAV cost
Figure BDA0003823020920000113
Among them, the AeBS cost
Figure BDA0003823020920000114
Can be expressed as:
Figure BDA0003823020920000115
as in equation (8), the deployment cost of an AeBS is divided into two parts, c B Representing the deployment hardware cost of the AeBS,
Figure BDA0003823020920000116
representing the energy consumption cost of the AeBS. Wherein, the fixed power of AeBSi is Pc B Gain of transmission power of
Figure BDA0003823020920000117
Figure BDA0003823020920000118
Represents the maximum transmit power gain of the AeBS and theta represents the hardware and power adaptation parameters.
Of AeRRH and mmW-UVAThe cost rate calculation formula is similar to the AeBS, from which the AeRRH cost can be determined
Figure BDA0003823020920000119
Such as equation (9) and mmW-UAV cost
Figure BDA00038230209200001110
As in equation (10).
Figure BDA00038230209200001111
Figure BDA0003823020920000121
The deployment condition of the whole network is controlled by adjusting the cost of the three types of LAP equipment. While the total deployment cost of the entire network, i.e. the sum of the deployment costs of all the LAP devices in the network, can be denoted C Net As in equation (11).
Figure BDA0003823020920000122
Wherein, c B ,c R ,c M The value theta can be configured according to the shortage degree of hardware and energy in the current emergency environment; n is a radical of B Denotes the number of AeBS, N R Denotes the number of AeRRH, N M Representing the number of mmW-UAVs.
Further, in an embodiment of the present invention, determining multiple LAP joint deployment problem models in an emergency network may set the targets as: the highest energy efficiency is achieved through the height, power distribution and channel distribution of various types of air base stations after deployment, namely the highest network capacity is obtained under the condition of controlling the network cost rate.
In the embodiment of the invention, the network capacity of the air network is determined based on the transmission rate of each terminal and the channel allocation information of each terminal accessing the low-altitude platform equipment, and the network capacity under unit cost is determined based on the network capacity of the air network and the total deployment cost of the air network; an objective function can be established with the maximum network capacity at unit cost as the optimization objective, wherein the objective function can be expressed as:
Figure BDA0003823020920000123
wherein the matrix [ B]、[R]、[M]Respectively represent N B An AeBS, N R An AeRRH, N M The position of the individual mmW-UAVs and the corresponding transmit power. 0-1 matrix [ beta ]]、[γ]、[ω]Respectively represent AeBS, aeRRH, mmW-UAV and N U Channel allocation case of individual UE.
Figure BDA0003823020920000124
Represents N U The transmission rate of each UE accessing the AeBS, the AeRRH and the mmW-UAV.
The transmission rate of terminal i may be represented as:
Figure BDA0003823020920000131
in the formula, bandwidth w ij =10MHz,υ=0.1。
The Signal to Interference plus Noise Ratio (SINR) is a Ratio of the strength of a received useful Signal to the strength of a received Interference Signal. The SINR value of terminal i is expressed as
Figure BDA0003823020920000132
Figure BDA0003823020920000133
In the formula, S i Indicating the received power at the location of terminal i, interference signal N i For the sum of other co-channel neighbour station interferences, i.e. the sum of all signals minus the useful signal S i Noise σ 2 The value may be-118 dBm.
Further, in the embodiment of the present invention, the target constraint condition is determined based on the transmission power of each low-altitude platform device in the air network and the channel allocation information of each terminal, as shown in formula (15).
Figure BDA0003823020920000134
In formula (15) representing the target constraint condition, formula (a) represents the condition that the transmission rate of each terminal meets the requirement of the transmission rate of the user terminal, and at least one LAP channel meets the requirement of the transmission rate of the user terminal so as to recover all user communication in an emergency scene; r 0 Indicating a minimum transmission rate threshold value. Equation (b) limits the channels allocated to each terminal and there is one and only one channel serving that terminal. And each channel can be allocated to only one terminal as shown in formula (c), thereby avoiding waste of resources. The total resource number of the system is limited, the formula (d) represents the total channel resource number of the AeBS, the AeRRH and the mmW-UAV, and the resource upper limit eta of the system is not exceeded max . The formula (e) limits the situation that each terminal obtains less than three AeBS effective coverage, i.e. the over-coverage situation; wherein the content of the first and second substances,
Figure BDA0003823020920000142
indicating the lowest transmission rate of the terminal. Equation (f) indicates that the transmit power of all LAP devices cannot exceed their maximum transmit power.
Further, based on the objective function and the objective constraint condition, a resource optimization problem model of the air network can be established, as shown in formula (16).
Figure BDA0003823020920000141
Optionally, in the embodiment of the present invention, in the case that the deployment in the two-dimensional plane of the AeBS, the AeRRH and the mmW-UAV and the number of devices are determined, the device cost is fixed, and the variables related to the cost consumption only remain the transmitting power of the AeBS, the AeRRH and the mmW-UAV, so that the resource optimization problem model of the air network can be further optimized, as shown in formula (17).
Figure BDA0003823020920000151
According to the method provided by the embodiment of the invention, the resource optimization problem model of the air network is constructed based on the joint deployment of various LAP devices by considering the condition of limited resources in an emergency scene, so that the communication requirements of users in the air network can be met, and the problems of over-deployment, over-coverage and resource waste in the emergency scene can be avoided.
Step 120, solving the resource optimization problem model by adopting a deep reinforcement learning algorithm, and determining the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network; the multi-class low-altitude platform equipment comprises an aerial base station, an aerial radio frequency unit and a millimeter wave enhanced unmanned aerial vehicle.
Specifically, the target deployment location described in the embodiment of the present invention refers to an optimal deployment location for deploying each low-altitude platform device in an air network, where the deployment location may include a two-dimensional location coordinate and an altitude, and the two-dimensional location coordinate represents a deployment location coordinate of each LAP on a two-dimensional plane, and does not have altitude information.
The target transmitting power described in the embodiment of the invention refers to the optimal transmitting power parameter value distributed to each low-altitude platform device in the air network.
The channel allocation information of each terminal described in the embodiment of the present invention refers to optimal allocation information of each low-altitude platform device channel to which each terminal is allocated to access.
In the embodiment of the invention, the communication coverage of the recovery of the emergency air network can be improved and the maximum energy efficiency of the air network can be ensured by jointly deploying the target deployment position of each low-altitude platform device in the air network, allocating the target transmitting power and the channel allocation information of each terminal.
For this purpose, in order to maximize the energy efficiency of the network, power and channel allocation is followed, the deployment location and the transmission power of the AeBS, aeRRH, mmW-UAV, and the selection of the user access channel are determined. This problem is a non-convex optimization problem, and the analytical solution cannot be directly calculated. In this embodiment, it is considered that the terminal location is continuously updated in a small range, and heuristic learning is required continuously during the deployment process. Therefore, a Deep Reinforcement Learning (DRL) algorithm can be used to solve the resource optimization problem model.
Based on the content of the foregoing embodiment, as an optional embodiment, in step 120, a deep reinforcement learning algorithm is used to solve the resource optimization problem model, and determine the target deployment position, the target transmission power, and the channel allocation information of each terminal of each low-altitude platform device in the air network, where the method includes:
performing iterative training on a deep Q network model in a deep reinforcement learning algorithm based on state information and reward functions of each low-altitude platform device in the air network to obtain an optimal action value profit value of the air network energy efficiency; the reward function is determined based on the network capacity of the over-the-air network and the total deployment cost of the over-the-air network;
and determining the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network based on the optimal action value revenue value.
It should be noted that, the DRL algorithm combines the perception capability of deep learning and the decision-making capability of reinforcement learning, which is different from supervised learning and unsupervised learning, and has strong adaptive capability and real-time learning capability. The goal of reinforcement learning is to learn behavior strategies in interaction with the environment to obtain the maximum long-term reward and to obtain the optimal solution by maximizing the cumulative reward.
In an embodiment of the invention, the reward function is determined based on the network capacity of the over-the-air network and the total deployment cost of the over-the-air network.
Based on the content of the foregoing embodiment, as an optional embodiment, before performing iterative training on a deep Q network model in a deep reinforcement learning algorithm based on state information and a reward function of each low-altitude platform device in an air network to obtain an optimal action value revenue value of air network energy efficiency, the method further includes:
dividing the time of each training into a plurality of time intervals;
determining a first increment of the network capacity in a current time interval relative to a previous time interval and a second increment of the total deployment cost in the current time interval relative to the previous time interval based on the network capacity of the over-the-air network and the total deployment cost of the over-the-air network;
a reward function in the DRL algorithm is determined based on a ratio of the first increment to the second increment.
Specifically, the first increment described in the embodiment of the present invention refers to an increment of the network capacity of the current time interval relative to the network capacity of the previous time interval, which may be specifically obtained by subtracting the network capacity of the previous time interval from the network capacity of the current time interval.
The second increment described in the embodiment of the present invention refers to an increment of the total network deployment cost of the current time interval relative to the total network deployment cost of the previous time interval, which can be specifically obtained by subtracting the total network deployment cost of the previous time interval from the total network deployment cost of the current time interval.
In the embodiment of the present invention, according to the resource optimization problem model, as shown in equation (12), the network capacity R can be expressed as:
Figure BDA0003823020920000171
furthermore, the total deployment cost C of the over-the-air network Net Can be obtained by the above formula (11).
In an embodiment of the present invention, the reward function may be defined as a ratio between a capacity increment and a network cost increment, a numerator of the reward function is equivalent to a benefit of each low-altitude platform device after performing an action, and a denominator is equivalent to a state cost in terms of energy consumption, so that a result of maximizing the final accumulated return is that the average energy efficiency is maximized.
Further, in the embodiment of the present invention, by dividing the time of each training of the DRL algorithm into a plurality of time intervals, the reward function in the DRL algorithm can be expressed as:
Figure BDA0003823020920000172
in the formula, R t Representing the network capacity of the current time interval; r is t-1 Indicating the network capacity of the last time interval; c t Representing a total deployment cost of the network for the current time interval; c t-1 Representing the total deployment cost of the network for the last time interval.
In addition, if the capacity or coverage required by the target constraint condition cannot be achieved after each low-altitude platform device executes the action, r t =r t -r - I.e. a penalty rather than a reward, r - Representing the magnitude of the penalty value.
According to the method provided by the embodiment of the invention, the reward function in the DRL algorithm is defined by considering the maximization of the energy efficiency of the air network, so that the behavior strategy which can maximize the energy efficiency of the air network is ensured to be learned in the deep reinforcement learning process, the maximum long-term reward is obtained, and the optimal solution of a resource optimization problem model is favorably obtained.
Further, a state space and an action space of the air Network are constructed based on state information and reward functions of each low-altitude platform device in the air Network, the state, the action and the reward values of each low-altitude platform device are stored in a memory unit, repeated iterative training is carried out on a Deep Q-Network (DQN) model in a Deep reinforcement learning algorithm, an optimal action value profit value of the energy efficiency of the air Network is obtained, and optimal actions of each low-altitude platform device are obtained. According to the optimal action of each low-altitude platform device, the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network can be determined.
The air network resource allocation method of the embodiment of the invention constructs an air network by adopting a network deployment mechanism jointly participated by various types of low-altitude platform equipment, wherein the various types of low-altitude platform equipment comprise an air base station, an air radio frequency unit and a millimeter wave reinforced unmanned aerial vehicle, and an emergency air network with high energy efficiency as a target is formed on the basis of the large-range coverage capability of the air base station, the network coverage expansion capability of the air radio frequency unit and the capacity expansion capability of the millimeter wave reinforced unmanned aerial vehicle, so that the communication coverage area of emergency air network recovery can be improved, and local capacity reinforcement and resource reuse can be realized; by taking the maximum air network energy efficiency as a target and combining a target constraint condition, a resource optimization problem model of the air network is established, and in view of the fact that the model belongs to a non-convex optimization problem, the resource optimization problem model is solved by adopting a deep reinforcement learning algorithm according to the advantages of the deep reinforcement learning algorithm for solving the non-convex optimization problem, resource optimization and allocation can be effectively carried out on the air network, the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network are determined, the energy efficiency optimization of the emergency air network is realized, the air network performance can be ensured, and the network deployment cost can be saved.
Based on the content of the above embodiment, as an optional embodiment, the iteratively training a deep Q network model in a deep reinforcement learning algorithm based on state information and a reward function of each low-altitude platform device in an air network to obtain an optimal action value revenue value of air network energy efficiency includes:
step 1, establishing a state space and an action space according to state information of each low-altitude platform device in an air network;
step 2, determining the initial state of the air network based on the state space, and dividing the time of each training into a plurality of time intervals; the initial state is the state of the air network in a first time interval;
step 3, determining a first reward value after the air network executes a first action in a first state in the current time interval and a second state in the next time interval; the first action is determined based on the action space;
step 4, storing the first state, the first action, the first reward value and the second state as a data sample in a memory unit, randomly extracting a data sample from the memory unit, and updating the network parameters of the depth Q network model; the first reward value is determined based on a reward function;
step 5, traversing a plurality of time intervals, executing the step 3 to the step 4, completing one-time iterative training of the deep Q network model, and obtaining the maximum action value income value of the trained air network energy efficiency;
and 6, traversing the preset iteration times, executing the steps 3 to 5, and training the depth Q network model for the preset iteration times to obtain the optimal action value income value of the trained air network energy efficiency.
Specifically, the state information of each low-altitude platform device in the air network described in the present invention refers to air altitude information, transmission power information, and channel allocation information of each terminal of each low-altitude platform device node in the air network, such as an AeBS node, an AeRRH node, and an mmW-UAV node, and may further include information such as a position information of each network node and a real-time position, a real-time direction, and a real-time speed of each network node, which may be obtained by using a GPS device.
In an embodiment of the invention, the DQN model may represent { S, a, r }, where S represents a state space, a represents an action space, and r represents a reward determined by a reward function. At time t, it is expressed as state S t Action A t R is a prize t
Based on the content of the foregoing embodiment, as an optional embodiment, in step 1, establishing a state space and an action space according to state information of each low altitude platform device in an air network includes:
establishing a state space according to the deployment position and the transmitting power of each low-altitude platform device and the channel allocation information of each terminal;
and establishing an action space according to the movement type of each low-altitude platform device, the adjustment type of the transmitting power of each low-altitude platform device and the access state information between each low-altitude platform device and each terminal.
Specifically, in the embodiment of the present invention, according to the state information of each low-altitude platform device in the air network, the deployment position, the transmission power, and the allocation information of each terminal access channel of each low-altitude platform device are obtained, so as to establish a state space.
S t The following parts may be included: n is a radical of B An AeBS, N R A AeRRH, N M The deployment heights of the mmW-UAVs are respectively expressed as
Figure BDA0003823020920000201
The transmission power is expressed as
Figure BDA0003823020920000202
And the signal allocation information of the terminal may be expressed as { beta, gamma, omega }.
It should be noted that, in the embodiment of the present invention, in the case where each LAP device in the air network is not deployed in two dimensions, h is t Representing the position and deployment height of the low-altitude platform equipment in the air network; given the two-dimensional deployment of the various LAP devices, h t Only the deployment height of the low-altitude platform device in the aerial network may be represented.
Further, an action space is established according to the movement type of each low-altitude platform device, the adjustment type of the transmitting power of each low-altitude platform device and the access state information between each low-altitude platform device and each terminal.
In the embodiment of the present invention, the movement type of the low altitude platform device can be determined by two situations, one is that the movement type of the LAP device without two-dimensional deployment can include forward, backward, leftward, rightward, upward, downward and hovering. The second is the case where the LAP device is deployed in two dimensions, and its movement types include only up, down, and hover. The adjustment type of the transmission power of each low-altitude platform device may include transmission power up-regulation, down-regulation and maintenance. Wherein the hovering and maintaining are for the airborne LAP device to maintain a current altitude and transmit power, respectively. The access state information between each low-altitude platform device and each terminal can be represented by using a 0-1 matrix, and the access state information represents the state of the LAP device correspondingly accessed by the terminal, and comprises two actions of accessing and disconnecting.
For the sake of simplifying the description, in particular, in the case of a LAP device deployed in two dimensions, action a t Is in state S t The actions performed below. Discretizing the execution operation of each LAP device, and defining the operation as 6 actions: altitude up, down and hover, transmit power up, down and maintain. A of each LAP device L The motion space may be represented as {1,2,3,4,5,6}, and the motion value is set to 1 when the motion is selected and to 0 otherwise. Total number of airborne base stations NL = N B +N R +N M The action space for a multi-airborne LAP deployment can then be represented as
Figure BDA0003823020920000211
According to the method provided by the embodiment of the invention, the state and the action of each low-altitude platform device are determined by acquiring the state information of each low-altitude platform device in the air network, and the state space and the action space are established, so that accurate input parameters are provided for the subsequent deep reinforcement learning.
Further, in the embodiment of the present invention, the Q function and the error function of the value network of the DQN model are defined as follows:
the value Network, which is described by Deep Neural Networks (DNN), updates the Q function for the state-action pairs. The Q function is a long-term expectation of states and actions over k time periods. And (3) adopting a Q-leanling learning mechanism, and iteratively realizing the optimized learning of the Q function according to a formula (20).
Figure BDA0003823020920000212
Wherein upsilon is k Represents the learning rate, the larger the learning rate, the current Q k (s t ,a t ) The greater the influence, the slower the convergence speed, which can be set to1.τ represents a discount factor, which represents a preference for the reward, and the larger the discount factor is, the larger the influence on the future result is, and the value range may be between 0 and 1. s' indicates that action a is being performed t A later observation value; a 'represents the actions in the action set that make the Q function at the kth iteration executable in the s' state.
Error function: if an approximation Q of the Q function is to be implemented k+1 (s t ,a t )≈Q k (s t ,a t ) An error function may be defined, as in equation (21).
Figure BDA0003823020920000221
And updating the parameters of each iteration through the target function to obtain the optimal solution of the Q function. The update function of the Q network is as in equation (22).
Figure BDA0003823020920000222
Further, in step 2, the initial state of each LAP device in the air network is determined by initializing the memory units M-Unit and Q network, and randomly initializing the state, that is, randomly selecting from the state space. And divides the time of each training into a plurality of time intervals. It is understood that the initial state is the state of each LAP device of the over-the-air network during the first time interval.
In the embodiment of the present invention, the time of each training can be divided into a plurality of time intervals with equal length, and different time slots can be represented by an index T e {1,2, …, T }. The air network starts from an initial state and explores different states for T times, which represents a learning process.
In step 3, a first reward value after the air network performs a first action in a first state in a current time interval and a second state in a next time interval are determined; the first action is determined based on the action space, which may be an action randomly selected from the action space or an optimal action learned into the action space through a deep Q network.
The first state described in the embodiment of the present invention refers to the state of each LAP device in the air network in the current time interval, which may be a state selected from the state space or a state updated based on the learning process in the previous time interval. The second state refers to the state of each LAP device in the air network in the next time interval obtained after the state in the current time interval is updated.
The first reward value described in the embodiment of the present invention is determined based on a reward function, and specifically, the first action may be selected and executed in the first state through the air network, and then the reward value calculated through the reward function is obtained.
In the embodiment of the invention, in step 3, each LAP device enters an exploration process, and the epsilon-greedy exploration mode is to randomly select a first action through a probability epsilon and select a first action of a Q network with a probability 1-epsilon. The smaller epsilon, the more emphatically the existing learning results are utilized to select actions; the larger epsilon, the more emphasis is placed on randomly selecting actions apart from existing learning outcomes. By an epsilon-greedy mechanism, the algorithm can be prevented from falling into a locally optimal solution. Epsilon is not a fixed value, and varies depending on the number of cycles k, and for example, epsilon =10/k may be set.
Further, in step 4, the first state, the first action, the first reward value and the second state may be stored in the memory unit as a data sample, and a data sample is randomly extracted from the memory unit, and the network parameters of the deep Q network model are updated by using the Q network update formula (20), the formula (21) and the formula (22).
In step 5, the above steps 3 to 4 are executed in a circulating iteration mode by traversing a plurality of time intervals, so that one-time iterative training of the deep Q network model can be completed, and the maximum Q value of the air network energy efficiency after the training, namely the maximum action value income value, is obtained.
In step 6, the process from the step 3 to the step 5 is executed in a circulating mode through traversing the preset iteration times, the deep Q network model is trained continuously until the training times reach the preset iteration times, learning can be stopped, and after the training is finished, the maximum action value income value of the air network energy efficiency is obtained through the last iteration training, and the optimal action value income value of the air network energy efficiency is obtained. Through the optimal action value revenue value, the optimal state and the optimal action of each LAP device in the air network can be determined, and the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network can also be determined.
Optionally, in the embodiment of the present invention, an algorithm flow for solving the resource optimization problem model by using a deep reinforcement learning algorithm and determining the target deployment position, the target transmission power, and the channel allocation information of each terminal of each low-altitude platform device in the air network is as follows.
Figure BDA0003823020920000241
In the method provided by the embodiment of the invention, in view of the fact that the constructed resource optimization problem model of the air network belongs to the non-convex optimization problem, the deep reinforcement learning algorithm is utilized to solve the advantage of solving the non-convex optimization problem, and the resource optimization problem model is solved by adopting the deep reinforcement learning algorithm, so that the resource optimization and allocation can be effectively carried out on the air network according to the deployed network information of the air network, and the energy efficiency optimization of the emergency air network is realized.
The application of the air network resource allocation method provided by the embodiment of the invention can also be adaptively applied to the networking of the 6G heterogeneous air emergency network, and is favorable for realizing the autonomous high-energy-efficiency optimized deployment of the energy-efficiency-optimized 6G emergency air network.
The following describes the air network resource allocation apparatus provided in the present invention, and the air network resource allocation apparatus described below and the air network resource allocation method described above may be referred to correspondingly.
Fig. 2 is a schematic structural diagram of an air network resource allocation apparatus provided in the present invention, as shown in fig. 2, including:
the establishing module 210 is configured to establish a resource optimization problem model of the air network by taking the maximum air network energy efficiency as a target and combining target constraint conditions; the air network is constructed based on various types of low-altitude platform equipment;
the allocation module 220 is configured to solve the resource optimization problem model by using a deep reinforcement learning algorithm, and determine a target deployment position, a target transmission power, and channel allocation information of each terminal of each low-altitude platform device in the air network; the multi-type low-altitude platform equipment comprises an aerial base station, an aerial radio frequency unit and a millimeter wave enhanced unmanned aerial vehicle; the channel allocation information is the channel information between the terminal and each low-altitude platform device.
The apparatus for allocating air network resources according to this embodiment may be configured to execute the above-mentioned embodiments of the method for allocating air network resources, and the principle and technical effect are similar, which are not described herein again.
The air network resource allocation device provided by the embodiment of the invention constructs an air network by adopting a network deployment mechanism jointly participated by various types of low-altitude platform equipment, wherein the various types of low-altitude platform equipment comprise an air base station, an air radio frequency unit and a millimeter wave reinforced unmanned aerial vehicle, and an emergency air network with high energy efficiency as a target is formed on the basis of the large-range coverage capability of the air base station, the network coverage expansion capability of the air radio frequency unit and the capacity expansion capability of the millimeter wave reinforced unmanned aerial vehicle, so that the communication coverage area of emergency air network recovery can be improved, and local capacity reinforcement and resource reuse can be realized; by taking the maximum air network energy efficiency as a target and combining a target constraint condition, a resource optimization problem model of the air network is established, and in view of the fact that the model belongs to a non-convex optimization problem, the resource optimization problem model is solved by adopting a deep reinforcement learning algorithm according to the advantages of the deep reinforcement learning algorithm for solving the non-convex optimization problem, resource optimization and allocation can be effectively carried out on the air network, the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network are determined, the energy efficiency optimization of the emergency air network is realized, the air network performance can be ensured, and the network deployment cost can be saved.
Fig. 3 is a schematic physical structure diagram of an electronic device provided in the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor) 310, a communication Interface (Communications Interface) 320, a memory (memory) 330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may invoke logic instructions in the memory 330 to perform the over-the-air network resource allocation method provided by the above methods, the method comprising: establishing a resource optimization problem model of the air network by taking the maximum air network energy efficiency as a target and combining target constraint conditions; the air network is constructed based on multi-class low-altitude platform equipment; solving the resource optimization problem model by adopting a deep reinforcement learning algorithm, and determining the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network; the multi-type low-altitude platform equipment comprises an aerial base station, an aerial radio frequency unit and a millimeter wave enhanced unmanned aerial vehicle; the channel allocation information is the channel information between the terminal and each low-altitude platform device.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer-readable storage medium, the computer program, when executed by a processor, being capable of executing the over-the-air network resource allocation method provided by the above methods, the method comprising: establishing a resource optimization problem model of the air network by taking the maximum air network energy efficiency as a target and combining target constraint conditions; the air network is constructed based on multiple types of low-altitude platform equipment; solving the resource optimization problem model by adopting a deep reinforcement learning algorithm, and determining the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network; the multi-type low-altitude platform equipment comprises an aerial base station, an aerial radio frequency unit and a millimeter wave enhanced unmanned aerial vehicle; the channel allocation information is the channel information between the terminal and each low-altitude platform device.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the over-the-air network resource allocation method provided by the above methods, the method comprising: establishing a resource optimization problem model of the air network by taking the maximum air network energy efficiency as a target and combining target constraint conditions; the air network is constructed based on multi-class low-altitude platform equipment; solving the resource optimization problem model by adopting a deep reinforcement learning algorithm, and determining the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network; the multi-type low-altitude platform equipment comprises an aerial base station, an aerial radio frequency unit and a millimeter wave enhanced unmanned aerial vehicle; the channel allocation information is the channel information between the terminal and each low-altitude platform device.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An over-the-air network resource allocation method, comprising:
establishing a resource optimization problem model of the air network by taking the maximum air network energy efficiency as a target and combining target constraint conditions; the air network is constructed based on multiple types of low-altitude platform equipment;
solving the resource optimization problem model by adopting a deep reinforcement learning algorithm, and determining the target deployment position and the target transmitting power of each low-altitude platform device in the air network and the channel allocation information of each terminal; the multi-type low-altitude platform equipment comprises an aerial base station, an aerial radio frequency unit and a millimeter wave enhanced unmanned aerial vehicle; the channel allocation information is the channel information between the terminal and each low-altitude platform device.
2. The method for allocating air network resources according to claim 1, wherein the establishing a resource optimization problem model of an air network with the goal of maximizing air network energy efficiency and in combination with a goal constraint condition comprises:
determining the network capacity of the air network based on the transmission rate of each terminal and the channel allocation information of each terminal, and determining the network capacity under unit cost based on the network capacity of the air network and the total deployment cost of the air network; a total deployment cost of the over-the-air network is determined based on hardware costs and energy consumption costs of each of the low altitude platform devices;
establishing an objective function according to the maximum network capacity under the unit cost as an optimization objective; the air network energy efficiency comprises the network capacity at the unit cost;
determining the target constraint condition based on the transmitting power of each low-altitude platform device in the air network and the channel allocation information of each terminal;
and establishing a resource optimization problem model of the air network based on the objective function and the objective constraint condition.
3. The method according to claim 2, wherein the determining the target deployment location, the target transmission power, and the channel allocation information of each terminal of each low-altitude platform device in the air network by solving the resource optimization problem model using a deep reinforcement learning algorithm comprises:
performing iterative training on a deep Q network model in the deep reinforcement learning algorithm based on the state information and the reward function of each low-altitude platform device in the air network to obtain an optimal action value profit value of the air network energy efficiency; the reward function is determined based on a network capacity of the over-the-air network and a total deployment cost of the over-the-air network;
and determining the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network based on the optimal action value revenue value.
4. The air network resource allocation method according to claim 3, wherein the iterative training of the deep Q network model in the deep reinforcement learning algorithm is performed based on the state information of each low-altitude platform device in the air network and the reward function, so as to obtain the optimal action value profit value of the air network energy efficiency, and the method comprises the following steps:
step 1, establishing a state space and an action space according to state information of each low-altitude platform device in the air network;
step 2, determining the initial state of the air network based on the state space, and dividing the time of each training into a plurality of time intervals; the initial state is the state of the over-the-air network in a first time interval;
step 3, determining a first reward value after the air network executes a first action in a first state in a current time interval and a second state in a next time interval; the first action is determined based on the action space;
step 4, storing the first state, the first action, the first reward value and the second state as a data sample to a memory unit, randomly extracting a data sample from the memory unit, and updating the network parameters of the deep Q network model; the first reward value is determined based on the reward function;
step 5, traversing the plurality of time intervals, executing the step 3 to the step 4, completing one iterative training of the deep Q network model, and obtaining a maximum action value profit value of the trained air network energy efficiency;
step 6, traversing preset iteration times, executing the steps 3 to 5, and training the depth Q network model for the preset iteration times to obtain the trained optimal action value income value of the air network energy efficiency; and the optimal action value profit value is the maximum action value profit value of the air network energy efficiency obtained through the last iterative training.
5. The air network resource allocation method according to claim 3, wherein before iteratively training a deep Q network model in the deep reinforcement learning algorithm based on the state information and the reward function of each low-altitude platform device in the air network to obtain the optimal action value revenue value of the air network energy efficiency, the method further comprises:
dividing the time of each training into a plurality of time intervals;
determining, based on a network capacity of the over-the-air network and a total deployment cost of the over-the-air network, a first increment of the network capacity at a current time interval relative to a previous time interval and a second increment of the total deployment cost at the current time interval relative to the previous time interval;
determining a reward function in the deep reinforcement learning algorithm based on a ratio of the first increment to the second increment.
6. The method of claim 4, wherein establishing a state space and an action space according to the state information of each low-altitude platform device in the air network comprises:
establishing a state space according to the deployment position and the transmitting power of each low-altitude platform device and the channel allocation information of each terminal;
and establishing an action space according to the movement type of each low-altitude platform device, the adjustment type of the transmitting power of each low-altitude platform device and the access state information between each low-altitude platform device and each terminal.
7. The air network resource allocation method according to any of claims 1-6, wherein said air base station is adapted to control the radio coverage of a cell; the aerial radio frequency unit is used for expanding the wireless coverage range under the control of the aerial base station; the millimeter wave enhanced unmanned aerial vehicle is used for expanding the capacity of the aerial base station and the aerial radio frequency unit.
8. An over-the-air network resource allocation apparatus, comprising:
the system comprises an establishing module, a resource optimization problem model and a resource optimization problem model, wherein the establishing module is used for establishing a resource optimization problem model of an air network by taking the maximum air network energy efficiency as a target and combining target constraint conditions; the air network is constructed based on multiple types of low-altitude platform equipment;
the allocation module is used for solving the resource optimization problem model by adopting a deep reinforcement learning algorithm and determining the target deployment position, the target transmitting power and the channel allocation information of each terminal of each low-altitude platform device in the air network; the multi-type low-altitude platform equipment comprises an aerial base station, an aerial radio frequency unit and a millimeter wave reinforced unmanned aerial vehicle; the channel allocation information is the channel information between the terminal and each low-altitude platform device.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the over the air network resource allocation method of any of claims 1 to 7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the over the air network resource allocation method of any of claims 1 to 7.
CN202211048190.5A 2022-08-30 2022-08-30 Air network resource allocation method, device, electronic equipment and storage medium Pending CN115567093A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211048190.5A CN115567093A (en) 2022-08-30 2022-08-30 Air network resource allocation method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211048190.5A CN115567093A (en) 2022-08-30 2022-08-30 Air network resource allocation method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115567093A true CN115567093A (en) 2023-01-03

Family

ID=84738294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211048190.5A Pending CN115567093A (en) 2022-08-30 2022-08-30 Air network resource allocation method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115567093A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180084530A1 (en) * 2007-04-30 2018-03-22 Lg Electronics Inc. Method for controlling radio resource allocation in mobile communication system
CN111988762A (en) * 2020-09-01 2020-11-24 重庆邮电大学 Energy efficiency maximum resource allocation method based on unmanned aerial vehicle D2D communication network
CN113055075A (en) * 2021-03-02 2021-06-29 中国电子科技集团公司第三十八研究所 HAP deployment and resource allocation method in air-space-ground integrated network system
CN114189891A (en) * 2021-12-14 2022-03-15 沈阳航空航天大学 Unmanned aerial vehicle heterogeneous network energy efficiency optimization method based on deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180084530A1 (en) * 2007-04-30 2018-03-22 Lg Electronics Inc. Method for controlling radio resource allocation in mobile communication system
CN111988762A (en) * 2020-09-01 2020-11-24 重庆邮电大学 Energy efficiency maximum resource allocation method based on unmanned aerial vehicle D2D communication network
CN113055075A (en) * 2021-03-02 2021-06-29 中国电子科技集团公司第三十八研究所 HAP deployment and resource allocation method in air-space-ground integrated network system
CN114189891A (en) * 2021-12-14 2022-03-15 沈阳航空航天大学 Unmanned aerial vehicle heterogeneous network energy efficiency optimization method based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
US11703853B2 (en) Multiple unmanned aerial vehicles navigation optimization method and multiple unmanned aerial vehicles system using the same
CN112118556B (en) Unmanned aerial vehicle track and power joint optimization method based on deep reinforcement learning
CN113162679B (en) DDPG algorithm-based IRS (intelligent resilient software) assisted unmanned aerial vehicle communication joint optimization method
US9986440B2 (en) Interference and mobility management in UAV-assisted wireless networks
CN113162682B (en) PD-NOMA-based multi-beam LEO satellite system resource allocation method
CN108419286B (en) 5G unmanned aerial vehicle communication combined beam and power distribution method
Zhan et al. Energy-efficient data uploading for cellular-connected UAV systems
CN111381499B (en) Internet-connected aircraft self-adaptive control method based on three-dimensional space radio frequency map learning
CN111432433B (en) Unmanned aerial vehicle relay intelligent flow unloading method based on reinforcement learning
CN113055078B (en) Effective information age determination method and unmanned aerial vehicle flight trajectory optimization method
CN110312265B (en) Power distribution method and system for unmanned aerial vehicle formation communication coverage
CN112672361B (en) Large-scale MIMO capacity increasing method based on unmanned aerial vehicle cluster deployment
CN115499921A (en) Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network
CN114339842B (en) Method and device for designing dynamic trajectory of unmanned aerial vehicle cluster in time-varying scene based on deep reinforcement learning
CN117270559A (en) Unmanned aerial vehicle cluster deployment and track planning method based on reinforcement learning
Chenxiao et al. Energy-efficiency maximization for fixed-wing UAV-enabled relay network with circular trajectory
CN113419561A (en) Distributed air base station high dynamic deployment method
CN116847460A (en) Unmanned aerial vehicle-assisted general sense integrated system resource scheduling method
CN116882270A (en) Multi-unmanned aerial vehicle wireless charging and edge computing combined optimization method and system based on deep reinforcement learning
CN115567093A (en) Air network resource allocation method, device, electronic equipment and storage medium
Shukla et al. Particle swarm optimization algorithms for altitude and transmit power adjustments in UAV-assisted cellular networks
Catté et al. Cost-efficient and QoS-aware user association and 3D placement of 6G aerial mobile access points
Yang et al. Deep reinforcement learning in NOMA-assisted UAV networks for path selection and resource offloading
Wang et al. Seamless coverage strategies of FANET
Zhang et al. UAV-Assisted Hybrid Throughput Optimization Based on Deep Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination