CN110809274B

CN110809274B - Unmanned aerial vehicle base station enhanced network optimization method for narrowband Internet of things

Info

Publication number: CN110809274B
Application number: CN201911030397.8A
Authority: CN
Inventors: 李凡; 徐友云; 威力
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2023-04-21
Anticipated expiration: 2039-10-28
Also published as: CN110809274A

Abstract

The invention discloses an enhanced network optimization method of an unmanned aerial vehicle base station oriented to a narrowband Internet of things, which comprises the steps of optimizing the height of an unmanned aerial vehicle base station deployed by a radius pair of a certain load base station, enabling the unmanned aerial vehicle base station to cover all congestion networks and paralysis networks in the load base station to the maximum extent, optimizing the path of the deployed unmanned aerial vehicle base station by applying a deep Q learning network, ensuring that the path can reach a designated deployment position in the shortest time, further providing a diversion service to the network or establishing an airspace communication network, and improving the communication service quality. The invention improves the intelligence of the unmanned aerial vehicle through deep reinforcement learning, reduces human resources, optimizes a congestion network, solves network paralysis, improves the communication service quality of the network, and achieves the aim of optimizing the network.

Description

Unmanned aerial vehicle base station enhanced network optimization method for narrowband Internet of things

Technical Field

The invention belongs to the technical field of the Internet of things, and particularly relates to an unmanned aerial vehicle base station enhanced network optimization method for a narrowband Internet of things.

Background

The development of the unmanned aerial vehicle is mainly applied to the aspects of street view shooting, monitoring and inspection, electric power inspection, traffic monitoring, post-disaster rescue, military and the like, along with the popularization of fifth-generation mobile communication technology, the unmanned aerial vehicle technology is increasingly applied to the communication field, the detection and the guarantee of a base station and a communication line are replaced by the mature technology at present, the cost is greatly reduced, the efficiency of maintenance and optimization work is improved, but the fifth-generation mobile communication technology also brings new challenges, the number of the terminal equipment in the future is continuously increased, the breakthrough of the narrowband Internet of things technology further aggravates the number of the terminal equipment, and the communication infrastructure on the ground is extremely easy to be in a congestion state under some sudden emergency disaster relief scenes, even communication paralysis is caused, so that the deployment of rescue operations is further influenced. Therefore, in the local network congestion and network paralysis scenarios, the use of an unmanned aerial vehicle air base station to assist or replace a ground base station has become an efficient solution. In a network congestion scene, an air-ground integrated network is established by the air base station of the unmanned aerial vehicle and the ground base station together, so as to provide a shunting service for the cell base station and effectively improve the communication service quality; in a network paralysis scene, an air base station of the unmanned aerial vehicle replaces a ground base station, an airspace communication network is established, communication is quickly recovered, and communication service quality is improved.

Unmanned aerial vehicles rapidly develop in communication due to the characteristics of small size, strong flexibility, low price and the like, but also face great challenges. Although unmanned aerial vehicles have made some progress in assisting in the research of ground base stations, unmanned aerial vehicles have not been studied in depth in terms of control. The power of the drone is limited and there is also a great challenge to how to deploy the drone to maximize user coverage. Artificial intelligence has evolved over several decades, with significant breakthroughs in recent years, and has entered the new generation of artificial intelligence era. The new generation artificial intelligence is pushed by the technologies of the Internet, big data, neural networks, deep learning and the like, the intelligent level is more and more similar to that of human beings, and the intelligent level is even superior to that of human beings in certain fields. Deep learning and reinforcement learning are two large fields of artificial intelligence, and the deep learning has strong perceptibility, but lacks certain decision-making capability; and reinforcement learning has decision making capability and is not in charge of sensing problems. Therefore, the deep reinforcement learning combines the perception capability of the deep learning and the decision capability of the reinforcement learning, has complementary advantages, can directly learn the control strategy from high-dimensional original data, and is an artificial intelligence method which is closer to the human thinking mode. Deep reinforcement learning has been a breakthrough in the fields of games, robots, recommendation systems, etc. The current unmanned aerial vehicle technology mostly needs artificial accurate control, combines unmanned aerial vehicle technology and deep reinforcement learning, can realize unmanned aerial vehicle's intellectuality, replaces the human beings to accomplish some complicated works even, has reduced manpower resources. When the network is congested or even paralyzed, the deep reinforcement learning network is firstly applied to simulate the unmanned aerial vehicle, the unmanned aerial vehicle is trained to quickly and effectively reach a target ground, then the unmanned aerial vehicle is deployed, the network diversion service is realized for the congested network, and the load of the ground network is reduced; and for the paralysis network, the unmanned aerial vehicle is used as an air base station to replace a ground load base station, an airspace communication network is established, and ground communication is quickly recovered.

The current research on the unmanned aerial vehicle base station comprises the research on the development of the coverage capability of the unmanned aerial vehicle base station for emergency communication, the research on the development of the unmanned aerial vehicle base station resource allocation method in an emergency scene, the research on the unmanned aerial vehicle base station site selection and path optimization method and the like, but the optimization method on the unmanned aerial vehicle base station network is less.

Disclosure of Invention

The invention aims to: the invention provides a network optimization method for enhancing unmanned aerial vehicle base stations facing a narrowband Internet of things, which improves the intelligence of unmanned aerial vehicles through deep reinforcement learning, optimizes a congested network, solves network paralysis, improves the communication service quality of the network and achieves the aim of optimizing the network.

The invention comprises the following steps: the invention discloses a method for optimizing an enhanced network of an unmanned aerial vehicle base station oriented to a narrowband Internet of things, which comprises the following steps:

(1) Optimizing the height of the base station of the unmanned aerial vehicle through the radius of the load base station, deploying the unmanned aerial vehicle right above the base station, and giving the current real-time position coordinate of the unmanned aerial vehicle as the current state s of the unmanned aerial vehicle _t ；

(2) According to the state s of the unmanned aerial vehicle _t Different flight schemes of the unmanned aerial vehicle under different flight modes are obtained through the current value neural network, so that the simulation environment complies with epsilon-greedy strategy, and one flight scheme a is selected from the flight schemes _t ；

(3) According to the state s of the unmanned aerial vehicle _t Flight plan a selected by simulation environment _t The simulation environment gives a flight scheme a _t Lower prize r _t Scheme a is adopted to unmanned aerial vehicle _t New state s of late arrival _t+1 ；

(4) Training a neural network based on two strategies of experience playback and fixed target Q value according to a deep Q learning network of deep reinforcement learning, updating network parameters, and updating the Q value under the epsilon-greedy strategy.

Further, the altitude optimization of the base station of the unmanned aerial vehicle in the step (1) may be represented by the following formula:

wherein h is the optimal height of the unmanned aerial vehicleDegree, R is the radius of the coverage cell of the base station, c is the speed of light, PL _max Is the maximum path loss that can be supported, f is the carrier frequency; order the

The optimal height of the unmanned aerial vehicle base station can be calculated.

Further, the different flight modes described in the step (2) include six flight modes of the unmanned aerial vehicle including upward, downward, leftward, rightward, forward and backward.

Further, the epsilon-greedy policy followed by the simulation environment in the step (2) is:

wherein ,

a, representing a flight scheme with maximum estimated Q value of a neural network with current value _random Meaning that one flight plan is randomly selected among all possible flight plans.

Further, the step (4) includes the steps of:

(41) And (3) obtaining the unmanned plane state s obtained in the step (1) _t Unmanned aerial vehicle flight scheme a obtained in step (2) _t The prize r obtained in the step (3) _t And next state s _t+1 Form a data set (s _t ,a _t ,r _t ,s _t+1 ) Storing the data groups into a memory bank, and deleting the earliest generated data group after the number of the data groups reaches the capacity of the memory bank;

(42) After the data in the memory bank reach the capacity, a certain batch of data sets are randomly selected to train the neural network, and the actual Q value is calculated in each training:

at the same timeCalculating an error function of each training: l (θ) = (y) _i -Q(s _i ,a _i ；θ)) ² The method comprises the steps of carrying out a first treatment on the surface of the And then the loss function uses gradient descent

Counter-propagating to update network parameter θ, wherein +.>

Represents the partial derivative of L (θ) to θ, α represents the learning rate;

(43) Setting a fixed step number C, and transmitting network parameters to a target value neural network every C steps by a current value neural network Q (s, a; theta)

Namely theta ^- ＝θ；

(44) The neural network parameter θ changes, and the operation (41) (42) is repeated until the error function converges, i.e., the predicted Q value of the current value neural network Q (s, a; θ) is close to the actual Q value.

The beneficial effects are that: compared with the prior art, the invention has the beneficial effects that: 1. according to the method, the hovering position of the base station of the unmanned aerial vehicle is calculated according to the coverage areas of different base stations, the simulation environment is changed according to the environment from the unmanned aerial vehicle to the base station, and other parts of the algorithm are completely consistent, so that the applicability is strong; 2. under the network congestion scene and network paralysis of the narrow-band Internet of things, the unmanned aerial vehicle base station can quickly reach a target place, provide a diversion service for a network or establish a airspace communication network, and improve the communication service quality.

Drawings

FIG. 1 is a diagram of an enhanced network scenario for a base station of a drone;

FIG. 2 is a block diagram of a deep Q learning network;

FIG. 3 is a flow chart of a deep Q learning algorithm execution;

fig. 4 shows the relationship between the loss function and the training frequency at different learning rates α.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings. The unmanned aerial vehicle base station enhanced network optimization method for the narrowband Internet of things comprises two parts, wherein one part is to determine the optimal height of an unmanned aerial vehicle base station to be deployed according to the coverage area condition of the base station. And the other part is to simulate the path planning from the unmanned aerial vehicle to the target site through deep reinforcement learning, and select the optimal path.

As shown in fig. 1, there is network congestion and network paralysis in the network in the coverage of the base station, and the network in the coverage of the base station is optimized by the base station of the unmanned aerial vehicle. And determining the optimal height of the unmanned aerial vehicle base station according to the relation between the height of the unmanned aerial vehicle base station and the radius of the coverage cell of the base station. In urban environments, because radio signals emitted by unmanned aerial vehicle base stations propagate in free space and then reach the urban environment, path loss exists in the free space, and in the urban environment, the unmanned aerial vehicle base stations have two environments of line of sight (LoS) and non-line of sight (NLoS), and the path loss in the two environments of the invention is respectively as follows:

wherein f represents carrier frequency, c represents speed of light, h represents height of the unmanned aerial vehicle, and R is radius of coverage cell of the base station.

For urban environments, the probability of line-of-sight occurrence can be represented by a simple modified sigmoid function:

where a, b are called S curve parameters, a, b are respectively 0.3 and 500 in urban environment, θ represents the angle between the coverage boundary of the unmanned aerial vehicle to the base station and the horizontal plane, and the position θ is marked in fig. 1.

The probability of non-line of sight occurrence is P (NLoS, θ) =1-P (LoS, θ). Because the unmanned aerial vehicle base station is the height belonging to the low-altitude platform, the cell radius covered by the base station can be written as

Where PL is path loss and the maximum path loss formula is

PL _max ＝P(LoS,θ)×PL _LoS +P(NLoS,θ)×PL _NLoS (3)

For a specific base station, the coverage area range is known, and the network inside the base station is congested, so that the optimal height of the unmanned aerial vehicle base station can be calculated according to the coverage radius of the base station, and the unmanned aerial vehicle is deployed right above the base station. And building a three-dimensional unmanned aerial vehicle simulation environment according to the actual environment from the unmanned aerial vehicle to the base station, and selecting a proper coordinate system. Acquiring real-time position of unmanned aerial vehicle as current state s by environment _t . The initial position coordinates of the unmanned aerial vehicle are (a) ₀ ,b ₀ ,c ₀ ) Knowing that the radius of the coverage cell of the base station is R, the relationship with the altitude of the base station of the unmanned aerial vehicle can be obtained:

for a given maximum path loss value PL _max By the following constitution

The optimal unmanned aerial vehicle height can be found.

Deploying the drone directly above the base station, i.e. at (a) ₁ ,b ₁ ,c ₁ ) As destination coordinates, c ₁ The method is that the optimal height of the base station of the unmanned aerial vehicle is obtained, the obstacle in the specific unmanned aerial vehicle environment is expressed in a coordinate form in a simulation environment, and the current real-time position of the unmanned aerial vehicle is taken as the current state s _t I.e. by s (x _t ,y _t ,z _t ) And the coordinates of the unmanned aerial vehicle in the three-dimensional coordinate system of the simulation environment at the current moment are represented.

When the unmanned aerial vehicle is deployed right above the base station, because the unmanned aerial vehicle aerial emergency base station is deployed right above the base station when network congestion and network paralysis occur in the coverage area of the base station, the emergency base station can optimize the network congestion and the network paralysis of any area covered by the base station, and the communication quality of the network covered by the whole base station is improved.

And (II) determining the optimal path from the unmanned aerial vehicle to the target site according to a Deep Q-learning Network method (Deep Q learning Network), wherein a Deep Q learning Network algorithm records actions in each state, rewards and results of the next state by using a memory bank, and the size of the memory bank is limited. Training a neural network based on two strategies of experience playback and fixed target Q value according to a deep Q learning network algorithm of deep reinforcement learning, updating network parameters, and updating the Q value under the epsilon-greedy strategy. The empirical playback strategy refers to a deep Q learning network using a memory bank with a certain capacity for storing the data sets generated by the simulation environment at each instant (s _t ,a _t ,r _t ,s _t+1 ) The training data of the current value neural network and the target value neural network are randomly selected from the memory bank, so that the historical data can be effectively utilized, and the time sequence correlation between the data is avoided. The fixed target Q-value strategy refers to a deep Q-learning network algorithm using two neural networks: current value neural network Q (s, a; θ) and target value neural network

The two neural networks use the same structure, the current value neural network Q (s, a; θ) is used to predict the Q value, and the target value neural network +.>

Is used for providing the actual Q value, and the neural network with fixed target value +.>

The current value neural network Q (s, a; θ) transmits the network parameters to the target value neural network +_at intervals>

The current value neural network is used for prediction, and the target value neural network structure and the current valueThe neural network has consistent structure, so the parameters of the current value network are copied periodically, the difference is that the weight of the target value neural network is updated slowly, and a part of data is extracted from the memory library for updating when the parameters are updated each time, so as to break the relevance among the data. On the basis of the deep Q learning network structure in fig. 2, the algorithm execution flow chart 3 shows the following specific operation steps:

step 1: according to the environment from a specific unmanned aerial vehicle to a base station, constructing a virtual simulation environment, and establishing a three-dimensional coordinate system in the simulation environment, wherein the starting position of the unmanned aerial vehicle is (a) ₀ ,b ₀ ,c ₀ ) The target position is (a ₁ ,b ₁ ,c ₁ )，c ₁ The optimal height of the unmanned aerial vehicle is obtained. The simulation environment can provide the state of the unmanned aerial vehicle at each moment, and the current state is represented by the coordinates of the position of the unmanned aerial vehicle at the current moment, namely s _t ＝s(x _t ,y _t ,z _t ). The starting position of the unmanned aerial vehicle and the target position of the unmanned aerial vehicle are known in the simulation environment; can also provide a simulated environment to act a according to epsilon-greedy strategy _t Prize r obtained later _t And the next time unmanned state s _t+1 。

Step 2: initializing algorithm parameters: initializing a memory bank, wherein the capacity of the memory bank is D, and the memory bank is used for storing a data set of a training process; initializing a current value neural network Q (s, a; θ), initializing a target value neural network

And synchronizing parameters of two neural networks by theta ^- Setting a discount coefficient gamma, a learning rate alpha, a greedy strategy epsilon probability value, a batch processing capacity N and a target value neural network parameter updating interval C.

Step 3: according to the state s of the unmanned aerial vehicle _t Obtaining a set of estimated flight schemes through the current value neural network, thereby simulating environment to follow epsilon-greedy strategy and selecting one flight scheme a from the set of estimated flight schemes _t . The flying modes of the unmanned aerial vehicle in the invention are respectively upward, downward, leftward and rightwardSix flight schemes are adopted, namely forward and backward. The epsilon-greedy policy that simulates environment compliance refers to: the simulation environment randomly selects a flight scheme a with a probability of 1-epsilon or selects a flight scheme a with a probability of epsilon

Wherein A represents all possible flight schemes, aεA represents any one of the flight schemes, Q(s) _t A; θ) represents the unmanned plane state s derived from the current value neural network _t Q value, θ of the lower flight scenario a represents the parameters of the current value neural network. Unmanned aerial vehicle current state s _t Selecting flight scheme a following an ε -greedy strategy _t The selection strategy is as follows:

wherein ,

Step 4: unmanned aerial vehicle executes flight scheme a _t The simulated environment then gives a prize r for this flight scenario _t Scheme a is adopted to unmanned aerial vehicle _t State s of the next moment of arrival _t+1 And(s) _t ,a _t ,r _t ,s _t+1 ) Stored as a set of data sets in a memory bank. The simulation environment is provided with an obstacle for the unmanned aerial vehicle to fly and a destination for the unmanned aerial vehicle to fly, if the unmanned aerial vehicle collides with the obstacle after adopting a certain flight scheme in the simulation environment, the simulation environment gives a larger negative reward to the unmanned aerial vehicle, and the unmanned aerial vehicle is represented as the unmanned aerial vehicle to fly failure; if the unmanned aerial vehicle reaches a destination after adopting a certain flight scheme, the simulation environment gives a larger positive reward to the unmanned aerial vehicle, which represents that the unmanned aerial vehicle is successful in flight; if the unmanned aerial vehicle does not strike an obstacle or reach a destination after adopting a certain flight scheme, the simulation environment gives the unmanned aerial vehicle a smaller negative valueRewards, representing the power consumption of the drone.

Step 5: if the capacity D of the memory bank is not full, let s _t ＝s _t+1 Returning to the step 3; if so, go to step 6.

Step 6: from the memory bank, N sets of data sets (s _i ,a _i ,r _i ,s _i+1 ) Training two neural networks, for each set of data, the current value neural network training results in an estimated Q value, i.e., Q(s) _i ,a _i The method comprises the steps of carrying out a first treatment on the surface of the θ), the target value neural network obtains the actual Q value through calculation, and the calculation formula is as follows:

wherein y is calculated _i When using a target neural network

Rather than the current value neural network Q.

Step 7: calculate error function L (θ) = (y) _i -Q(s _i ,a _i ；θ)) ² And updating the parameter theta of the current value neural network according to the gradient descent. Loss function uses gradient descent

Counter-propagating to update network parameter θ, where

Representing the partial derivative of L (θ) with θ, while α represents the learning rate, if α is set small, the number of iterations required for network convergence will be very high; if the setting of α is large, each iteration of the network may not reduce the error function, and even exceed the local minimum, resulting in failure to converge, so the setting of the learning rate α is important.

Step 8: neural network for target value every C steps

Synchronous with the network parameters of the neural network Q of the current value, i.e. theta ^- ＝θ。

Step 9: and (3) returning to the step (3) until the current value function neural network Q converges, and ending.

Step 10: and obtaining an optimal flight strategy c according to the value function Q (s, a), and then applying the optimal flight strategy c to the simulation environment of the unmanned aerial vehicle to obtain an optimal path of the unmanned aerial vehicle.

Current value neural network Q(s) _t A; θ) and target value neural network

The full-connection neural network with two hidden layers (64 neurons) is adopted, the optimization method adopts an RMS-Prop optimizer, and the main parameters of the network are set as shown in table 1.

TABLE 1 Main parameter settings

Fig. 4 is a schematic diagram of the relationship between the loss function obtained by training under the network parameters and the training times at different learning rates α, and the relationship diagram can show that the learning rate α has the fastest convergence and the best effect when the learning rate α is 0.001, and can obviously show that the loss function has the oscillating effect with the increase of the training times and the worst effect when the learning rate α is 0.01, and the learning rate α has far less training times when the learning rate α is 0.01 than the training times when the learning rate α is 0.001 although the loss function decreases with the increase of the training times. In summary, the learning rate alpha of the invention is 0.001, the network can achieve convergence only by training about 350 steps, an optimal path is obtained, short-time high efficiency is realized on the aspect of coping with network congestion and network paralysis, the network is rapidly optimized under the condition that the communication quality is not influenced, the problem of explosive growth of the number of devices in the narrowband Internet of things is effectively solved, the communication service quality of the congested network is improved, and the communication interruption problem caused by network paralysis is also recovered.

Claims

1. The unmanned aerial vehicle base station enhanced network optimization method for the narrowband Internet of things is characterized by comprising the following steps of:

(3) According to the state s of the unmanned aerial vehicle _t Flight plan a selected by simulation environment _t The simulation environment gives a flight scheme a _t Lower prize r _t Unmanned aerial vehicle adopts flight scheme a _t New state s of late arrival _t+1 ；

(4) Training a neural network based on two strategies of experience playback and a fixed target Q value according to a deep Q learning network of deep reinforcement learning, updating network parameters, and updating the Q value under an epsilon-greedy strategy;

the altitude optimization of the unmanned aerial vehicle base station in the step (1) can be achieved by the following formula:

where h is the optimal altitude of the unmanned aerial vehicle, R is the radius of the coverage cell of the base station, c is the speed of light, PL _max Is the maximum path loss that can be supported, f is the carrier frequency; order the

The optimal height of the unmanned aerial vehicle base station can be calculated;

the step (4) comprises the following steps:

simultaneously calculating an error function of each training: l (θ) = (y) _i -Q(s _i ,a _i ；θ)) ² The method comprises the steps of carrying out a first treatment on the surface of the And then the loss function uses gradient descent

Counter-propagating to update network parameter θ, wherein +.>

Namely theta ^- ＝θ；

2. The method for optimizing the network of the base station enhancement of the unmanned aerial vehicle for the narrowband internet of things according to claim 1, wherein the different flight modes in the step (2) comprise six flight modes of the unmanned aerial vehicle, namely upward, downward, leftward, rightward, forward and backward.

3. The method for optimizing the enhanced network of the unmanned aerial vehicle base station for the narrowband internet of things according to claim 1, wherein the simulation environment in the step (2) complies with an epsilon-greedy policy of:

wherein ,

a, representing a flight scheme with maximum estimated Q value of a neural network with current value _random Meaning that one flight plan is randomly selected among all possible flight plans. />