CN110809274B - Unmanned aerial vehicle base station enhanced network optimization method for narrowband Internet of things - Google Patents

Unmanned aerial vehicle base station enhanced network optimization method for narrowband Internet of things Download PDF

Info

Publication number
CN110809274B
CN110809274B CN201911030397.8A CN201911030397A CN110809274B CN 110809274 B CN110809274 B CN 110809274B CN 201911030397 A CN201911030397 A CN 201911030397A CN 110809274 B CN110809274 B CN 110809274B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
base station
network
flight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911030397.8A
Other languages
Chinese (zh)
Other versions
CN110809274A (en
Inventor
李凡
徐友云
威力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201911030397.8A priority Critical patent/CN110809274B/en
Publication of CN110809274A publication Critical patent/CN110809274A/en
Application granted granted Critical
Publication of CN110809274B publication Critical patent/CN110809274B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/18Network planning tools
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/18502Airborne stations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Astronomy & Astrophysics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses an enhanced network optimization method of an unmanned aerial vehicle base station oriented to a narrowband Internet of things, which comprises the steps of optimizing the height of an unmanned aerial vehicle base station deployed by a radius pair of a certain load base station, enabling the unmanned aerial vehicle base station to cover all congestion networks and paralysis networks in the load base station to the maximum extent, optimizing the path of the deployed unmanned aerial vehicle base station by applying a deep Q learning network, ensuring that the path can reach a designated deployment position in the shortest time, further providing a diversion service to the network or establishing an airspace communication network, and improving the communication service quality. The invention improves the intelligence of the unmanned aerial vehicle through deep reinforcement learning, reduces human resources, optimizes a congestion network, solves network paralysis, improves the communication service quality of the network, and achieves the aim of optimizing the network.

Description

Unmanned aerial vehicle base station enhanced network optimization method for narrowband Internet of things
Technical Field
The invention belongs to the technical field of the Internet of things, and particularly relates to an unmanned aerial vehicle base station enhanced network optimization method for a narrowband Internet of things.
Background
The development of the unmanned aerial vehicle is mainly applied to the aspects of street view shooting, monitoring and inspection, electric power inspection, traffic monitoring, post-disaster rescue, military and the like, along with the popularization of fifth-generation mobile communication technology, the unmanned aerial vehicle technology is increasingly applied to the communication field, the detection and the guarantee of a base station and a communication line are replaced by the mature technology at present, the cost is greatly reduced, the efficiency of maintenance and optimization work is improved, but the fifth-generation mobile communication technology also brings new challenges, the number of the terminal equipment in the future is continuously increased, the breakthrough of the narrowband Internet of things technology further aggravates the number of the terminal equipment, and the communication infrastructure on the ground is extremely easy to be in a congestion state under some sudden emergency disaster relief scenes, even communication paralysis is caused, so that the deployment of rescue operations is further influenced. Therefore, in the local network congestion and network paralysis scenarios, the use of an unmanned aerial vehicle air base station to assist or replace a ground base station has become an efficient solution. In a network congestion scene, an air-ground integrated network is established by the air base station of the unmanned aerial vehicle and the ground base station together, so as to provide a shunting service for the cell base station and effectively improve the communication service quality; in a network paralysis scene, an air base station of the unmanned aerial vehicle replaces a ground base station, an airspace communication network is established, communication is quickly recovered, and communication service quality is improved.
Unmanned aerial vehicles rapidly develop in communication due to the characteristics of small size, strong flexibility, low price and the like, but also face great challenges. Although unmanned aerial vehicles have made some progress in assisting in the research of ground base stations, unmanned aerial vehicles have not been studied in depth in terms of control. The power of the drone is limited and there is also a great challenge to how to deploy the drone to maximize user coverage. Artificial intelligence has evolved over several decades, with significant breakthroughs in recent years, and has entered the new generation of artificial intelligence era. The new generation artificial intelligence is pushed by the technologies of the Internet, big data, neural networks, deep learning and the like, the intelligent level is more and more similar to that of human beings, and the intelligent level is even superior to that of human beings in certain fields. Deep learning and reinforcement learning are two large fields of artificial intelligence, and the deep learning has strong perceptibility, but lacks certain decision-making capability; and reinforcement learning has decision making capability and is not in charge of sensing problems. Therefore, the deep reinforcement learning combines the perception capability of the deep learning and the decision capability of the reinforcement learning, has complementary advantages, can directly learn the control strategy from high-dimensional original data, and is an artificial intelligence method which is closer to the human thinking mode. Deep reinforcement learning has been a breakthrough in the fields of games, robots, recommendation systems, etc. The current unmanned aerial vehicle technology mostly needs artificial accurate control, combines unmanned aerial vehicle technology and deep reinforcement learning, can realize unmanned aerial vehicle's intellectuality, replaces the human beings to accomplish some complicated works even, has reduced manpower resources. When the network is congested or even paralyzed, the deep reinforcement learning network is firstly applied to simulate the unmanned aerial vehicle, the unmanned aerial vehicle is trained to quickly and effectively reach a target ground, then the unmanned aerial vehicle is deployed, the network diversion service is realized for the congested network, and the load of the ground network is reduced; and for the paralysis network, the unmanned aerial vehicle is used as an air base station to replace a ground load base station, an airspace communication network is established, and ground communication is quickly recovered.
The current research on the unmanned aerial vehicle base station comprises the research on the development of the coverage capability of the unmanned aerial vehicle base station for emergency communication, the research on the development of the unmanned aerial vehicle base station resource allocation method in an emergency scene, the research on the unmanned aerial vehicle base station site selection and path optimization method and the like, but the optimization method on the unmanned aerial vehicle base station network is less.
Disclosure of Invention
The invention aims to: the invention provides a network optimization method for enhancing unmanned aerial vehicle base stations facing a narrowband Internet of things, which improves the intelligence of unmanned aerial vehicles through deep reinforcement learning, optimizes a congested network, solves network paralysis, improves the communication service quality of the network and achieves the aim of optimizing the network.
The invention comprises the following steps: the invention discloses a method for optimizing an enhanced network of an unmanned aerial vehicle base station oriented to a narrowband Internet of things, which comprises the following steps:
(1) Optimizing the height of the base station of the unmanned aerial vehicle through the radius of the load base station, deploying the unmanned aerial vehicle right above the base station, and giving the current real-time position coordinate of the unmanned aerial vehicle as the current state s of the unmanned aerial vehicle t
(2) According to the state s of the unmanned aerial vehicle t Different flight schemes of the unmanned aerial vehicle under different flight modes are obtained through the current value neural network, so that the simulation environment complies with epsilon-greedy strategy, and one flight scheme a is selected from the flight schemes t
(3) According to the state s of the unmanned aerial vehicle t Flight plan a selected by simulation environment t The simulation environment gives a flight scheme a t Lower prize r t Scheme a is adopted to unmanned aerial vehicle t New state s of late arrival t+1
(4) Training a neural network based on two strategies of experience playback and fixed target Q value according to a deep Q learning network of deep reinforcement learning, updating network parameters, and updating the Q value under the epsilon-greedy strategy.
Further, the altitude optimization of the base station of the unmanned aerial vehicle in the step (1) may be represented by the following formula:
Figure GDA0003936233630000031
wherein h is the optimal height of the unmanned aerial vehicleDegree, R is the radius of the coverage cell of the base station, c is the speed of light, PL max Is the maximum path loss that can be supported, f is the carrier frequency; order the
Figure GDA0003936233630000032
The optimal height of the unmanned aerial vehicle base station can be calculated.
Further, the different flight modes described in the step (2) include six flight modes of the unmanned aerial vehicle including upward, downward, leftward, rightward, forward and backward.
Further, the epsilon-greedy policy followed by the simulation environment in the step (2) is:
Figure GDA0003936233630000033
wherein ,
Figure GDA0003936233630000034
a, representing a flight scheme with maximum estimated Q value of a neural network with current value random Meaning that one flight plan is randomly selected among all possible flight plans.
Further, the step (4) includes the steps of:
(41) And (3) obtaining the unmanned plane state s obtained in the step (1) t Unmanned aerial vehicle flight scheme a obtained in step (2) t The prize r obtained in the step (3) t And next state s t+1 Form a data set (s t ,a t ,r t ,s t+1 ) Storing the data groups into a memory bank, and deleting the earliest generated data group after the number of the data groups reaches the capacity of the memory bank;
(42) After the data in the memory bank reach the capacity, a certain batch of data sets are randomly selected to train the neural network, and the actual Q value is calculated in each training:
Figure GDA0003936233630000035
at the same timeCalculating an error function of each training: l (θ) = (y) i -Q(s i ,a i ;θ)) 2 The method comprises the steps of carrying out a first treatment on the surface of the And then the loss function uses gradient descent
Figure GDA0003936233630000036
Counter-propagating to update network parameter θ, wherein +.>
Figure GDA0003936233630000037
Represents the partial derivative of L (θ) to θ, α represents the learning rate;
(43) Setting a fixed step number C, and transmitting network parameters to a target value neural network every C steps by a current value neural network Q (s, a; theta)
Figure GDA0003936233630000041
Namely theta - =θ;
(44) The neural network parameter θ changes, and the operation (41) (42) is repeated until the error function converges, i.e., the predicted Q value of the current value neural network Q (s, a; θ) is close to the actual Q value.
The beneficial effects are that: compared with the prior art, the invention has the beneficial effects that: 1. according to the method, the hovering position of the base station of the unmanned aerial vehicle is calculated according to the coverage areas of different base stations, the simulation environment is changed according to the environment from the unmanned aerial vehicle to the base station, and other parts of the algorithm are completely consistent, so that the applicability is strong; 2. under the network congestion scene and network paralysis of the narrow-band Internet of things, the unmanned aerial vehicle base station can quickly reach a target place, provide a diversion service for a network or establish a airspace communication network, and improve the communication service quality.
Drawings
FIG. 1 is a diagram of an enhanced network scenario for a base station of a drone;
FIG. 2 is a block diagram of a deep Q learning network;
FIG. 3 is a flow chart of a deep Q learning algorithm execution;
fig. 4 shows the relationship between the loss function and the training frequency at different learning rates α.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings. The unmanned aerial vehicle base station enhanced network optimization method for the narrowband Internet of things comprises two parts, wherein one part is to determine the optimal height of an unmanned aerial vehicle base station to be deployed according to the coverage area condition of the base station. And the other part is to simulate the path planning from the unmanned aerial vehicle to the target site through deep reinforcement learning, and select the optimal path.
As shown in fig. 1, there is network congestion and network paralysis in the network in the coverage of the base station, and the network in the coverage of the base station is optimized by the base station of the unmanned aerial vehicle. And determining the optimal height of the unmanned aerial vehicle base station according to the relation between the height of the unmanned aerial vehicle base station and the radius of the coverage cell of the base station. In urban environments, because radio signals emitted by unmanned aerial vehicle base stations propagate in free space and then reach the urban environment, path loss exists in the free space, and in the urban environment, the unmanned aerial vehicle base stations have two environments of line of sight (LoS) and non-line of sight (NLoS), and the path loss in the two environments of the invention is respectively as follows:
Figure GDA0003936233630000042
wherein f represents carrier frequency, c represents speed of light, h represents height of the unmanned aerial vehicle, and R is radius of coverage cell of the base station.
For urban environments, the probability of line-of-sight occurrence can be represented by a simple modified sigmoid function:
Figure GDA0003936233630000051
where a, b are called S curve parameters, a, b are respectively 0.3 and 500 in urban environment, θ represents the angle between the coverage boundary of the unmanned aerial vehicle to the base station and the horizontal plane, and the position θ is marked in fig. 1.
The probability of non-line of sight occurrence is P (NLoS, θ) =1-P (LoS, θ). Because the unmanned aerial vehicle base station is the height belonging to the low-altitude platform, the cell radius covered by the base station can be written as
Figure GDA0003936233630000052
Where PL is path loss and the maximum path loss formula is
PL max =P(LoS,θ)×PL LoS +P(NLoS,θ)×PL NLoS (3)
For a specific base station, the coverage area range is known, and the network inside the base station is congested, so that the optimal height of the unmanned aerial vehicle base station can be calculated according to the coverage radius of the base station, and the unmanned aerial vehicle is deployed right above the base station. And building a three-dimensional unmanned aerial vehicle simulation environment according to the actual environment from the unmanned aerial vehicle to the base station, and selecting a proper coordinate system. Acquiring real-time position of unmanned aerial vehicle as current state s by environment t . The initial position coordinates of the unmanned aerial vehicle are (a) 0 ,b 0 ,c 0 ) Knowing that the radius of the coverage cell of the base station is R, the relationship with the altitude of the base station of the unmanned aerial vehicle can be obtained:
Figure GDA0003936233630000053
for a given maximum path loss value PL max By the following constitution
Figure GDA0003936233630000054
The optimal unmanned aerial vehicle height can be found.
Deploying the drone directly above the base station, i.e. at (a) 1 ,b 1 ,c 1 ) As destination coordinates, c 1 The method is that the optimal height of the base station of the unmanned aerial vehicle is obtained, the obstacle in the specific unmanned aerial vehicle environment is expressed in a coordinate form in a simulation environment, and the current real-time position of the unmanned aerial vehicle is taken as the current state s t I.e. by s (x t ,y t ,z t ) And the coordinates of the unmanned aerial vehicle in the three-dimensional coordinate system of the simulation environment at the current moment are represented.
When the unmanned aerial vehicle is deployed right above the base station, because the unmanned aerial vehicle aerial emergency base station is deployed right above the base station when network congestion and network paralysis occur in the coverage area of the base station, the emergency base station can optimize the network congestion and the network paralysis of any area covered by the base station, and the communication quality of the network covered by the whole base station is improved.
And (II) determining the optimal path from the unmanned aerial vehicle to the target site according to a Deep Q-learning Network method (Deep Q learning Network), wherein a Deep Q learning Network algorithm records actions in each state, rewards and results of the next state by using a memory bank, and the size of the memory bank is limited. Training a neural network based on two strategies of experience playback and fixed target Q value according to a deep Q learning network algorithm of deep reinforcement learning, updating network parameters, and updating the Q value under the epsilon-greedy strategy. The empirical playback strategy refers to a deep Q learning network using a memory bank with a certain capacity for storing the data sets generated by the simulation environment at each instant (s t ,a t ,r t ,s t+1 ) The training data of the current value neural network and the target value neural network are randomly selected from the memory bank, so that the historical data can be effectively utilized, and the time sequence correlation between the data is avoided. The fixed target Q-value strategy refers to a deep Q-learning network algorithm using two neural networks: current value neural network Q (s, a; θ) and target value neural network
Figure GDA0003936233630000061
The two neural networks use the same structure, the current value neural network Q (s, a; θ) is used to predict the Q value, and the target value neural network +.>
Figure GDA0003936233630000062
Is used for providing the actual Q value, and the neural network with fixed target value +.>
Figure GDA0003936233630000063
The current value neural network Q (s, a; θ) transmits the network parameters to the target value neural network +_at intervals>
Figure GDA0003936233630000064
The current value neural network is used for prediction, and the target value neural network structure and the current valueThe neural network has consistent structure, so the parameters of the current value network are copied periodically, the difference is that the weight of the target value neural network is updated slowly, and a part of data is extracted from the memory library for updating when the parameters are updated each time, so as to break the relevance among the data. On the basis of the deep Q learning network structure in fig. 2, the algorithm execution flow chart 3 shows the following specific operation steps:
step 1: according to the environment from a specific unmanned aerial vehicle to a base station, constructing a virtual simulation environment, and establishing a three-dimensional coordinate system in the simulation environment, wherein the starting position of the unmanned aerial vehicle is (a) 0 ,b 0 ,c 0 ) The target position is (a 1 ,b 1 ,c 1 ),c 1 The optimal height of the unmanned aerial vehicle is obtained. The simulation environment can provide the state of the unmanned aerial vehicle at each moment, and the current state is represented by the coordinates of the position of the unmanned aerial vehicle at the current moment, namely s t =s(x t ,y t ,z t ). The starting position of the unmanned aerial vehicle and the target position of the unmanned aerial vehicle are known in the simulation environment; can also provide a simulated environment to act a according to epsilon-greedy strategy t Prize r obtained later t And the next time unmanned state s t+1
Step 2: initializing algorithm parameters: initializing a memory bank, wherein the capacity of the memory bank is D, and the memory bank is used for storing a data set of a training process; initializing a current value neural network Q (s, a; θ), initializing a target value neural network
Figure GDA0003936233630000071
And synchronizing parameters of two neural networks by theta - Setting a discount coefficient gamma, a learning rate alpha, a greedy strategy epsilon probability value, a batch processing capacity N and a target value neural network parameter updating interval C.
Step 3: according to the state s of the unmanned aerial vehicle t Obtaining a set of estimated flight schemes through the current value neural network, thereby simulating environment to follow epsilon-greedy strategy and selecting one flight scheme a from the set of estimated flight schemes t . The flying modes of the unmanned aerial vehicle in the invention are respectively upward, downward, leftward and rightwardSix flight schemes are adopted, namely forward and backward. The epsilon-greedy policy that simulates environment compliance refers to: the simulation environment randomly selects a flight scheme a with a probability of 1-epsilon or selects a flight scheme a with a probability of epsilon
Figure GDA0003936233630000072
Wherein A represents all possible flight schemes, aεA represents any one of the flight schemes, Q(s) t A; θ) represents the unmanned plane state s derived from the current value neural network t Q value, θ of the lower flight scenario a represents the parameters of the current value neural network. Unmanned aerial vehicle current state s t Selecting flight scheme a following an ε -greedy strategy t The selection strategy is as follows:
Figure GDA0003936233630000073
wherein ,
Figure GDA0003936233630000074
a, representing a flight scheme with maximum estimated Q value of a neural network with current value random Meaning that one flight plan is randomly selected among all possible flight plans.
Step 4: unmanned aerial vehicle executes flight scheme a t The simulated environment then gives a prize r for this flight scenario t Scheme a is adopted to unmanned aerial vehicle t State s of the next moment of arrival t+1 And(s) t ,a t ,r t ,s t+1 ) Stored as a set of data sets in a memory bank. The simulation environment is provided with an obstacle for the unmanned aerial vehicle to fly and a destination for the unmanned aerial vehicle to fly, if the unmanned aerial vehicle collides with the obstacle after adopting a certain flight scheme in the simulation environment, the simulation environment gives a larger negative reward to the unmanned aerial vehicle, and the unmanned aerial vehicle is represented as the unmanned aerial vehicle to fly failure; if the unmanned aerial vehicle reaches a destination after adopting a certain flight scheme, the simulation environment gives a larger positive reward to the unmanned aerial vehicle, which represents that the unmanned aerial vehicle is successful in flight; if the unmanned aerial vehicle does not strike an obstacle or reach a destination after adopting a certain flight scheme, the simulation environment gives the unmanned aerial vehicle a smaller negative valueRewards, representing the power consumption of the drone.
Step 5: if the capacity D of the memory bank is not full, let s t =s t+1 Returning to the step 3; if so, go to step 6.
Step 6: from the memory bank, N sets of data sets (s i ,a i ,r i ,s i+1 ) Training two neural networks, for each set of data, the current value neural network training results in an estimated Q value, i.e., Q(s) i ,a i The method comprises the steps of carrying out a first treatment on the surface of the θ), the target value neural network obtains the actual Q value through calculation, and the calculation formula is as follows:
Figure GDA0003936233630000081
wherein y is calculated i When using a target neural network
Figure GDA0003936233630000082
Rather than the current value neural network Q.
Step 7: calculate error function L (θ) = (y) i -Q(s i ,a i ;θ)) 2 And updating the parameter theta of the current value neural network according to the gradient descent. Loss function uses gradient descent
Figure GDA0003936233630000083
Counter-propagating to update network parameter θ, where
Figure GDA0003936233630000084
Representing the partial derivative of L (θ) with θ, while α represents the learning rate, if α is set small, the number of iterations required for network convergence will be very high; if the setting of α is large, each iteration of the network may not reduce the error function, and even exceed the local minimum, resulting in failure to converge, so the setting of the learning rate α is important.
Step 8: neural network for target value every C steps
Figure GDA0003936233630000085
Synchronous with the network parameters of the neural network Q of the current value, i.e. theta - =θ。
Step 9: and (3) returning to the step (3) until the current value function neural network Q converges, and ending.
Step 10: and obtaining an optimal flight strategy c according to the value function Q (s, a), and then applying the optimal flight strategy c to the simulation environment of the unmanned aerial vehicle to obtain an optimal path of the unmanned aerial vehicle.
Current value neural network Q(s) t A; θ) and target value neural network
Figure GDA0003936233630000086
The full-connection neural network with two hidden layers (64 neurons) is adopted, the optimization method adopts an RMS-Prop optimizer, and the main parameters of the network are set as shown in table 1.
TABLE 1 Main parameter settings
Figure GDA0003936233630000087
Figure GDA0003936233630000091
Fig. 4 is a schematic diagram of the relationship between the loss function obtained by training under the network parameters and the training times at different learning rates α, and the relationship diagram can show that the learning rate α has the fastest convergence and the best effect when the learning rate α is 0.001, and can obviously show that the loss function has the oscillating effect with the increase of the training times and the worst effect when the learning rate α is 0.01, and the learning rate α has far less training times when the learning rate α is 0.01 than the training times when the learning rate α is 0.001 although the loss function decreases with the increase of the training times. In summary, the learning rate alpha of the invention is 0.001, the network can achieve convergence only by training about 350 steps, an optimal path is obtained, short-time high efficiency is realized on the aspect of coping with network congestion and network paralysis, the network is rapidly optimized under the condition that the communication quality is not influenced, the problem of explosive growth of the number of devices in the narrowband Internet of things is effectively solved, the communication service quality of the congested network is improved, and the communication interruption problem caused by network paralysis is also recovered.

Claims (3)

1. The unmanned aerial vehicle base station enhanced network optimization method for the narrowband Internet of things is characterized by comprising the following steps of:
(1) Optimizing the height of the base station of the unmanned aerial vehicle through the radius of the load base station, deploying the unmanned aerial vehicle right above the base station, and giving the current real-time position coordinate of the unmanned aerial vehicle as the current state s of the unmanned aerial vehicle t
(2) According to the state s of the unmanned aerial vehicle t Different flight schemes of the unmanned aerial vehicle under different flight modes are obtained through the current value neural network, so that the simulation environment complies with epsilon-greedy strategy, and one flight scheme a is selected from the flight schemes t
(3) According to the state s of the unmanned aerial vehicle t Flight plan a selected by simulation environment t The simulation environment gives a flight scheme a t Lower prize r t Unmanned aerial vehicle adopts flight scheme a t New state s of late arrival t+1
(4) Training a neural network based on two strategies of experience playback and a fixed target Q value according to a deep Q learning network of deep reinforcement learning, updating network parameters, and updating the Q value under an epsilon-greedy strategy;
the altitude optimization of the unmanned aerial vehicle base station in the step (1) can be achieved by the following formula:
Figure FDA0003936233620000011
where h is the optimal altitude of the unmanned aerial vehicle, R is the radius of the coverage cell of the base station, c is the speed of light, PL max Is the maximum path loss that can be supported, f is the carrier frequency; order the
Figure FDA0003936233620000012
The optimal height of the unmanned aerial vehicle base station can be calculated;
the step (4) comprises the following steps:
(41) And (3) obtaining the unmanned plane state s obtained in the step (1) t Unmanned aerial vehicle flight scheme a obtained in step (2) t The prize r obtained in the step (3) t And next state s t+1 Form a data set (s t ,a t ,r t ,s t+1 ) Storing the data groups into a memory bank, and deleting the earliest generated data group after the number of the data groups reaches the capacity of the memory bank;
(42) After the data in the memory bank reach the capacity, a certain batch of data sets are randomly selected to train the neural network, and the actual Q value is calculated in each training:
Figure FDA0003936233620000021
simultaneously calculating an error function of each training: l (θ) = (y) i -Q(s i ,a i ;θ)) 2 The method comprises the steps of carrying out a first treatment on the surface of the And then the loss function uses gradient descent
Figure FDA0003936233620000022
Counter-propagating to update network parameter θ, wherein +.>
Figure FDA0003936233620000023
Represents the partial derivative of L (θ) to θ, α represents the learning rate;
(43) Setting a fixed step number C, and transmitting network parameters to a target value neural network every C steps by a current value neural network Q (s, a; theta)
Figure FDA0003936233620000024
Namely theta - =θ;
(44) The neural network parameter θ changes, and the operation (41) (42) is repeated until the error function converges, i.e., the predicted Q value of the current value neural network Q (s, a; θ) is close to the actual Q value.
2. The method for optimizing the network of the base station enhancement of the unmanned aerial vehicle for the narrowband internet of things according to claim 1, wherein the different flight modes in the step (2) comprise six flight modes of the unmanned aerial vehicle, namely upward, downward, leftward, rightward, forward and backward.
3. The method for optimizing the enhanced network of the unmanned aerial vehicle base station for the narrowband internet of things according to claim 1, wherein the simulation environment in the step (2) complies with an epsilon-greedy policy of:
Figure FDA0003936233620000025
wherein ,
Figure FDA0003936233620000026
a, representing a flight scheme with maximum estimated Q value of a neural network with current value random Meaning that one flight plan is randomly selected among all possible flight plans. />
CN201911030397.8A 2019-10-28 2019-10-28 Unmanned aerial vehicle base station enhanced network optimization method for narrowband Internet of things Active CN110809274B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911030397.8A CN110809274B (en) 2019-10-28 2019-10-28 Unmanned aerial vehicle base station enhanced network optimization method for narrowband Internet of things

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911030397.8A CN110809274B (en) 2019-10-28 2019-10-28 Unmanned aerial vehicle base station enhanced network optimization method for narrowband Internet of things

Publications (2)

Publication Number Publication Date
CN110809274A CN110809274A (en) 2020-02-18
CN110809274B true CN110809274B (en) 2023-04-21

Family

ID=69489294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911030397.8A Active CN110809274B (en) 2019-10-28 2019-10-28 Unmanned aerial vehicle base station enhanced network optimization method for narrowband Internet of things

Country Status (1)

Country Link
CN (1) CN110809274B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111263332A (en) * 2020-03-02 2020-06-09 湖北工业大学 Unmanned aerial vehicle track and power joint optimization method based on deep reinforcement learning
CN111565065B (en) * 2020-03-24 2021-06-04 北京邮电大学 Unmanned aerial vehicle base station deployment method and device and electronic equipment
CN111506104B (en) * 2020-04-03 2021-10-01 北京邮电大学 Method and device for planning position of unmanned aerial vehicle
CN111786713B (en) * 2020-06-04 2021-06-08 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN112512115B (en) * 2020-11-20 2022-02-11 北京邮电大学 Method and device for determining position of air base station and electronic equipment
CN112511250B (en) * 2020-12-03 2022-06-03 中国人民解放军火箭军工程大学 DRL-based multi-unmanned aerial vehicle air base station dynamic deployment method and system
CN112636811B (en) * 2020-12-08 2021-11-30 北京邮电大学 Relay unmanned aerial vehicle deployment method and device
CN112867023B (en) * 2020-12-30 2021-11-19 北京理工大学 Method for minimizing perception data acquisition delay through dynamic scheduling of unmanned terminal
CN112867117B (en) * 2021-01-20 2022-04-12 重庆邮电大学 Energy-saving method based on Q learning in NB-IoT
CN113286314B (en) * 2021-05-25 2022-03-08 重庆邮电大学 Unmanned aerial vehicle base station deployment and user association method based on Q learning algorithm
CN114142912B (en) * 2021-11-26 2023-01-06 西安电子科技大学 Resource control method for guaranteeing time coverage continuity of high-dynamic air network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108848561A (en) * 2018-04-11 2018-11-20 湖北工业大学 A kind of isomery cellular network combined optimization method based on deeply study
CN109194583B (en) * 2018-08-07 2021-05-14 中国地质大学(武汉) Network congestion link diagnosis method and system based on deep reinforcement learning
CN109916372B (en) * 2019-01-18 2021-02-12 南京邮电大学 Method for calculating optimal height of unmanned aerial vehicle base station under condition of inaccurate channel state information
CN109819453B (en) * 2019-03-05 2021-07-06 西安电子科技大学 Cost optimization unmanned aerial vehicle base station deployment method based on improved genetic algorithm

Also Published As

Publication number Publication date
CN110809274A (en) 2020-02-18

Similar Documents

Publication Publication Date Title
CN110809274B (en) Unmanned aerial vehicle base station enhanced network optimization method for narrowband Internet of things
Bayerlein et al. Trajectory optimization for autonomous flying base station via reinforcement learning
CN112902969B (en) Path planning method of unmanned aerial vehicle in data collection process
CN111158401B (en) Distributed unmanned aerial vehicle path planning system and method for encouraging space-time data exploration
CN113162679A (en) DDPG algorithm-based IRS (inter-Range instrumentation System) auxiliary unmanned aerial vehicle communication joint optimization method
CN110650039B (en) Multimodal optimization-based network cooperative communication model for unmanned aerial vehicle cluster auxiliary vehicle
CN112929866B (en) Unmanned aerial vehicle deployment method for adaptively optimizing network coverage of urban disaster area
CN116185079B (en) Unmanned aerial vehicle construction inspection route planning method based on self-adaptive cruising
CN111381499B (en) Internet-connected aircraft self-adaptive control method based on three-dimensional space radio frequency map learning
US20230239037A1 (en) Space-air-ground integrated uav-assisted iot data collectioncollection method based on aoi
CN112367111A (en) Unmanned aerial vehicle relay deployment method and system, computer equipment and application
CN113283169B (en) Three-dimensional group exploration method based on multi-head attention asynchronous reinforcement learning
CN113188547A (en) Unmanned aerial vehicle path planning method and device, controller and storage medium
CN111818535B (en) Wireless local area network three-dimensional optimization deployment method fusing multi-population optimization algorithm
CN112462805A (en) 5G networked unmanned aerial vehicle flight path planning method based on improved ant colony algorithm
CN114826380B (en) Unmanned aerial vehicle auxiliary air-ground communication optimization algorithm based on deep reinforcement learning algorithm
Hu et al. Multi-UAV coverage path planning: a distributed online cooperation method
CN113382060B (en) Unmanned aerial vehicle track optimization method and system in Internet of things data collection
Parvaresh et al. A continuous actor–critic deep Q-learning-enabled deployment of UAV base stations: Toward 6G small cells in the skies of smart cities
Sobouti et al. Managing sets of flying base stations using energy efficient 3D trajectory planning in cellular networks
CN117053790A (en) Single-antenna unmanned aerial vehicle auxiliary communication flight route-oriented planning method
CN110021168B (en) Grading decision method for realizing real-time intelligent traffic management under Internet of vehicles
CN116723470A (en) Determination method, device and equipment of movement track prediction model of air base station
CN116321237A (en) Unmanned aerial vehicle auxiliary internet of vehicles data collection method based on deep reinforcement learning
CN112783192A (en) Unmanned aerial vehicle path planning method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 201, building 2, phase II, No.1 Kechuang Road, Yaohua street, Qixia District, Nanjing City, Jiangsu Province

Applicant after: NANJING University OF POSTS AND TELECOMMUNICATIONS

Address before: Yuen Road Qixia District of Nanjing City, Jiangsu Province, No. 9 210046

Applicant before: NANJING University OF POSTS AND TELECOMMUNICATIONS

GR01 Patent grant
GR01 Patent grant