CN109743210B - Unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning - Google Patents

Unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning Download PDF

Info

Publication number
CN109743210B
CN109743210B CN201910074944.6A CN201910074944A CN109743210B CN 109743210 B CN109743210 B CN 109743210B CN 201910074944 A CN201910074944 A CN 201910074944A CN 109743210 B CN109743210 B CN 109743210B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
access
base station
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910074944.6A
Other languages
Chinese (zh)
Other versions
CN109743210A (en
Inventor
梁应敞
曹阳
张蔺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910074944.6A priority Critical patent/CN109743210B/en
Publication of CN109743210A publication Critical patent/CN109743210A/en
Application granted granted Critical
Publication of CN109743210B publication Critical patent/CN109743210B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

The invention belongs to the technical field of wireless communication, and relates to an unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning. The invention provides a deep reinforcement learning framework under the condition of adapting to multi-user access in an unmanned aerial vehicle network by utilizing the inherent change rule in the deep reinforcement learning environment, and realizes the multi-user access control scheme of the unmanned aerial vehicle network based on the deep reinforcement learning under the condition of unknown global network information. Compared with the traditional access control mode, the access control mode provided by the invention can realize higher system throughput and lower switching times. Meanwhile, different compromises can be realized on throughput and switching times by adjusting the switching penalty items, and the performance can be guaranteed under different switching penalty conditions.

Description

Unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning
Technical Field
The invention belongs to the technical field of wireless communication, and relates to an unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning.
Background
Conventional access control techniques utilize a threshold comparison method, which is implemented by selecting different metrics (e.g., received signal strength, etc.) and selecting an appropriate threshold. When the received signal strength of the User Equipment (UE) from the source base station is lower than the set threshold, the base station which can provide the received signal strength higher than the threshold is selected for access. However, for the network of the unmanned aerial vehicle using the unmanned aerial vehicle as the base station, because the base station has mobility, the relative distance between the base station and the user changes frequently, which causes the intensity of the received signal at the user to change violently, and at this time, the conventional access control technology brings the problem of frequent switching, which causes a large amount of extra signal overhead; in addition, when a plurality of UEs are switched simultaneously, the conventional access control technology can only ensure the throughput of a single user, but cannot ensure the throughput of the entire system.
Disclosure of Invention
In order to solve the problem of frequent switching of the traditional access control technology in an unmanned aerial vehicle network and ensure the overall throughput of a multi-user access situation network, the invention mainly focuses on the conditions of the long-term throughput and the switching times of the overall system. Because the deep reinforcement learning has excellent performance in the complex dynamic environment decision problem, in order to overcome the problem that the global network information in the unmanned aerial vehicle network environment is difficult to collect, the invention provides a deep reinforcement learning framework which is suitable for the condition of multi-user access in the unmanned aerial vehicle network by utilizing the inherent change rule in the deep reinforcement learning environment, and realizes the unmanned aerial vehicle network multi-user access control scheme based on the deep reinforcement learning under the condition that the global network information is unknown.
In the invention, a system model is established from the perspective of providing service for ground users by using the unmanned aerial vehicle as a mobile base station, and the unmanned aerial vehicle moves according to a preset track to provide downlink transmission service for ground UE. In the invention, each UE is regarded as an independent decision maker, and selects a suitable drone base station for access in each time slot. The invention completely hands over the decision process to the UE for execution, and the unmanned aerial vehicle base station is only responsible for receiving the access request and providing the transmission service. In the invention, information interaction does not exist among a plurality of UEs in the decision process, namely the decision process of the UE only depends on the network information obtained by the UE, thereby reducing the overall signal overhead.
In order to solve the problem of multi-user access decision, the invention provides a deep strong learning framework for distributed decision centralized training, namely, a central node is responsible for training neural network parameters of all UE. In the deep reinforcement learning framework provided by the invention, each UE is provided with a neural network with the same structure, and a corresponding access strategy is obtained after local network information is input into the neural network; the central node is responsible for collecting experience information from each UE and training neural network parameters, and the central node transmits the trained parameters to the user after each training stage is completed. And after the UE acquires the trained neural network parameters from the central node, updating the local neural network parameters. The invention separates the decision-making process and the training process, so that the UE only needs to utilize the trained neural network, and the calculation complexity at the UE is reduced.
In order to solve the problem that the position information of the base station in the unmanned aerial vehicle network is difficult to collect, the invention avoids the position information in the design of the user state, mainly adopts the information such as the received signal strength of the user, and the information can be directly measured locally. In order to avoid frequent switching and ensure the throughput performance of the whole network under the condition of multiple users, the invention not only considers the throughput of the users in the design of the deep reinforcement learning reward function, but also considers the switching inhibition of the UE and the influence of the access action of a single UE on other related UEs.
In order to better capture and learn the received signal strength change rule at the UE, the invention also introduces a long-short term memory (LSTM) network into the neural network design. The neural network of the invention has simple design, and the LSTM is used for extracting the characteristics and then is processed by the three-layer full-connection network to obtain the corresponding access decision output.
Compared with the traditional access control mode, the access control mode provided by the invention can realize higher system throughput and lower switching times. Meanwhile, different compromises can be realized on throughput and switching times by adjusting the switching penalty items, and the performance can be guaranteed under different switching penalty conditions.
Drawings
Figure 1 shows a system model of a drone network in accordance with the present invention;
FIG. 2 illustrates a deep reinforcement learning framework model in accordance with the present invention;
FIG. 3 shows a structural model of a neural network in the present invention;
fig. 4 shows the throughput and the number of handovers of the access control scheme proposed by the present invention compared to a conventional access control scheme.
Detailed Description
The invention is described in detail below with reference to the drawings and simulation examples so that those skilled in the art can better understand the invention.
FIG. 1 shows a system model of the present invention. The wireless communication system has two parts, namely an unmanned aerial vehicle base station and ground UE. The unmanned aerial vehicle basic station flies according to the fixed orbit in the air, ground UE. Since the drone base station is flying in the air, there are two components in the channel, line of sight (LOS) and non-line of sight (NLOS), the proportion of the two components appearing is mainly determined by the elevation angle between the drone and the ground user. Both LOS and NLOS components include large-scale fading and small-scale fading, the large-scale fading is mainly determined by the distance between the UE and the base station, and the small-scale fading follows rice distribution and rayleigh distribution, respectively. In particular, the channel gain model between the jth drone base station and the ith ground UE may be expressed as:
Figure BDA0001958444740000031
wherein the content of the first and second substances,
Figure BDA0001958444740000032
and
Figure BDA0001958444740000033
respectively showing the proportion of the occurrence of LOS and NLOS components,
Figure BDA0001958444740000034
and
Figure BDA0001958444740000035
respectively, representing the corresponding channel gains. f denotes the carrier frequency and v denotes the speed of light. Mu.sLOSAnd muNLOSAttenuation factors, l, corresponding to LOS and NLOS, respectivelyi,jIndicating the distance between the drone base station and the UE, αLOSAnd αNLOSPath LOSs indices for LOS and NLOS, respectively.
In the established system model, each drone has the same transmission power, and since small-scale fading exists in the channel gain model, in order to eliminate the small-scale fading, the UE performs sampling averaging on the received signal during access selection, and the average received signal strength adopted can be represented as:
Figure BDA0001958444740000036
wherein, PtFor transmission of unmanned aerial vehicle base stationThe output power, N, represents the number of signal samples to average.
Because all unmanned aerial vehicle basic stations utilize same spectrum resource to transmit, so ground UE when inserting an unmanned aerial vehicle and transmitting, can receive the interference from other unmanned aerial vehicles, the SINR of user department can be expressed as:
Figure BDA0001958444740000037
wherein
Figure BDA0001958444740000038
Representing the set of unmanned aerial vehicle base stations in the network, σ2Representing the noise power.
The user selects a suitable unmanned aerial vehicle base station to access in each time slot, and for the base station with a plurality of users accessing in a single time slot, the base station selects a Time Division Multiple Access (TDMA) form to serve the users, namely, the time slot is averagely divided into subslots with the same size as the number of the accessed users. The reception rate of the UE may be expressed as:
Figure BDA0001958444740000041
wherein B represents the frequency bandwidth used for base station transmission, Nj(t) represents the number of users accessed by the base station at that time.
Fig. 2 shows the proposed deep reinforcement learning framework. The frame is composed of 3 parts, namely an unmanned aerial vehicle base station, a central node and UE. The unmanned aerial vehicle base station is responsible for transmitting service, the central node is responsible for training neural network parameters of the UE, and the UE makes appropriate base station access selection in each decision phase. Each UE is provided with the same neural network as the central node, and the neural network parameters at the UE are obtained from the central node and can be regarded as a replica at the central node. Each UE is regarded as an independent individual in the framework, information interaction does not occur between the UE and the UE, and the UE independently selects the unmanned aerial vehicle base station to access and is responsible for transmitting network information of the UE to the central node.
For a single UE, other users and drone base stations may be considered as environments. Therefore, the whole information interaction process is composed of two parts, namely an interaction process between the UE and the environment, and a transmission process of experience information and network parameters between the UE and the central node. In each access selection stage, each UE selects a proper unmanned aerial vehicle base station to access according to the state of the UE. Since we mainly pay attention to the maximization of user throughput, and the receiving rate of a user mainly relates to the strength of a received signal and the number of access users of a base station, the number of user connections and the strength of the received signal are mainly used as state elements, and a specific state can be expressed as follows:
Figure BDA0001958444740000042
wherein u isi,jThe binary indicator variable may also be referred to as an access indicator variable, and if "1" indicates that the base station is accessed, and if "0" indicates that the base station is not selected to be accessed. In the state design, the access indication variable u of the user at the last moment is includedi,j(t-1), the last time and the received signal strength at this time
Figure BDA0001958444740000043
And
Figure BDA0001958444740000044
number N of access users of each base station at last moment0(t-1),ωi(t-1) represents the throughput of the UE at the previous time instant.
After making self access selection, the UE sends an access request to the selected unmanned aerial vehicle base station, and after receiving the request, the unmanned aerial vehicle provides transmission service for the UE. After all UE access decisions are made, the environmental information is updated, and the unmanned aerial vehicle base station counts the number of access users per se and sends new network information to each UE to form a new state of the UE. All the UEs transmit the original state transition, the access selection made, the throughput condition and the new state to the central node. And the central node calculates the reward function of each UE and perfects the experience information. The final reward function may be expressed as:
Figure BDA0001958444740000051
wherein the content of the first and second substances,
Figure BDA0001958444740000052
indicating the impact of the UE on the performance of other relevant users after making access selections. a isi(t) and ai(t-1) denotes the access actions taken by the user at time t and time t-1, respectively, C denotes the penalty for creating the handover, η is a control factor.
After collecting experience information of all the UEs, the central node stores all the information into a local storage in a queue form, and summarizes the experience information of all the users. And then the central node randomly samples the data from the central node by using a random gradient descent method to serve as a training sample of the training, and the neural network parameters are trained. And after each training is finished, the central node sends the trained neural network parameters to each UE. And after acquiring the new neural network parameters, the UE updates the local parameters and makes a switching decision by using the updated neural network according to the new state of the UE.
Figure 3 shows the neural network architecture employed in the present invention. The neural network structure is composed of two parts of networks: LSTM networks and fully connected networks. The LSTM network is responsible for extracting time continuity features in input parameters, and data of M moments needs to be input simultaneously in the LSTM network; the full-connection network is responsible for processing the features extracted by the LSTM network to obtain a corresponding access strategy.
Fig. 4 shows the performance of the system throughput and the handover times of the access control technique proposed by the present invention under different handover penalty coefficients. Wherein, the test result is the result when the test time is 1000 time slots. It can be seen that the proposed access control method can achieve higher system throughput with a smaller number of handovers compared to conventional access control methods (received signal strength based access control methods and learning algorithm based access control methods). And under different switching punishment conditions, the proposed access control technology can realize the optimal performance, and different compromises between the switching times and the system throughput can be realized by adjusting different switching punishment items.

Claims (3)

1. An unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning is used for a system which takes an unmanned aerial vehicle as a mobile base station to provide service for ground user UE, and is characterized in that the control method comprises the following steps:
constructing a deep reinforcement learning framework for distributed decision-making centralized training, namely configuring a neural network with the same structure for each UE, and independently acquiring a strategy of accessing an unmanned aerial vehicle base station by each UE according to the neural network of each UE; meanwhile, a central node with the same neural network is arranged and used for collecting experience information from each UE and training neural network parameters, and the central node transmits the trained parameters to each UE after each training stage is completed;
the specific method for the central node to collect experience information from each UE is as follows:
the UE needs to select a proper action according to its own state, and obtains a corresponding reward after execution, and the throughput of the UE is mainly related to the number of access users of the base station and the strength of the received signal, so the states of i UEs are expressed as:
Figure FDA0002312871150000011
wherein u isi,jThe defined access indication variable is a binary indication variable, namely, 1 indicates that the jth unmanned aerial vehicle base station is accessed, and 0 indicates that the jth unmanned aerial vehicle base station is not selected to be accessed; the state includes the access indicator variable u of the user at the last momenti,j(t-1), the last time and the received signal strength at this time
Figure FDA0002312871150000012
And
Figure FDA0002312871150000013
number N of access users of each base station at last moment0(t-1),ωi(t-1) represents the throughput of the UE at the previous time instant;
after making self access selection, the UE sends an access request to a selected unmanned aerial vehicle base station, and after receiving the request, the unmanned aerial vehicle provides transmission service for the UE;
after all UE access decisions are made, the environmental information is updated, and the unmanned aerial vehicle base station counts the number of access users per se and sends new network information to each UE to form a new state of the UE; all the UEs transmit the original state, the access selection made, the throughput condition and the new state to the central node, the central node calculates the reward function of each UE, and perfects the experience information:
Figure FDA0002312871150000014
wherein, ω isi(t) represents the throughput of the UE at the current time instant,
Figure FDA0002312871150000021
indicating the change in throughput of the UE to other relevant users after making an access selection, defined as the impact on other users' performance,a i (t) Anda i (t-1) represents the user istTime of day andt1 access actions taken respectively, C represents a penalty for generating a handover,ηis a control coefficient.
2. The unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning of claim 1, wherein the specific method for the central node to train neural network parameters is as follows:
after the central node collects experience information of all the UEs, all the information is stored in a local memory in a queue form, the experience information of all the UEs is collected, random sampling is carried out by using a random gradient descent method, and an obtained sample is used as a training sample of the training to train the neural network parameters.
3. The deep reinforcement learning-based unmanned aerial vehicle network multi-user access control method according to claim 2, wherein the neural network is composed of a long-short term memory network and a fully-connected network: the long-short term memory network is responsible for extracting time continuity characteristics in input parameters, and data of M moments needs to be input simultaneously in the long-short term memory network; the full-connection network is responsible for processing the features extracted by the long-term and short-term memory network to obtain a corresponding access strategy.
CN201910074944.6A 2019-01-25 2019-01-25 Unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning Active CN109743210B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910074944.6A CN109743210B (en) 2019-01-25 2019-01-25 Unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910074944.6A CN109743210B (en) 2019-01-25 2019-01-25 Unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN109743210A CN109743210A (en) 2019-05-10
CN109743210B true CN109743210B (en) 2020-04-17

Family

ID=66366151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910074944.6A Active CN109743210B (en) 2019-01-25 2019-01-25 Unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN109743210B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4181559A4 (en) * 2020-07-13 2024-01-10 Huawei Technologies Co., Ltd. Communication method and communication device

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110351252A (en) * 2019-06-19 2019-10-18 南京航空航天大学 It is a kind of can synchronism switching unmanned plane ad hoc network adaptive media connection control method
CN110458283A (en) * 2019-08-13 2019-11-15 南京理工大学 Maximization overall situation handling capacity method under static environment based on deeply study
CN110661566B (en) * 2019-09-29 2021-11-19 南昌航空大学 Unmanned aerial vehicle cluster networking method and system adopting depth map embedding
CN111083767B (en) * 2019-12-23 2021-07-27 哈尔滨工业大学 Heterogeneous network selection method based on deep reinforcement learning
CN111884740B (en) * 2020-06-08 2022-04-29 江苏方天电力技术有限公司 Unmanned aerial vehicle channel optimal allocation method and system based on frequency spectrum cognition
CN112947541B (en) * 2021-01-15 2022-07-26 南京航空航天大学 Unmanned aerial vehicle intention track prediction method based on deep reinforcement learning
CN113342030B (en) * 2021-04-27 2022-07-08 湖南科技大学 Multi-unmanned aerial vehicle cooperative self-organizing control method and system based on reinforcement learning
CN115454646B (en) * 2022-09-29 2023-08-25 电子科技大学 Multi-agent reinforcement learning acceleration method for clustered unmanned plane decision

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107680151A (en) * 2017-09-27 2018-02-09 千寻位置网络有限公司 Strengthen the method and its application of the indicative animation fulfillment capability in Web3D

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104950906A (en) * 2015-06-15 2015-09-30 中国人民解放军国防科学技术大学 Unmanned aerial vehicle remote measuring and control system and method based on mobile communication network
CN106094606A (en) * 2016-05-19 2016-11-09 南通航运职业技术学院 A kind of unmanned surface vehicle navigation and control remote-controlled operation platform
US9826415B1 (en) * 2016-12-01 2017-11-21 T-Mobile Usa, Inc. Tactical rescue wireless base station
US10332320B2 (en) * 2017-04-17 2019-06-25 Intel Corporation Autonomous vehicle advanced sensing and response
CN107205225B (en) * 2017-08-03 2019-10-11 北京邮电大学 The switching method and apparatus of unmanned aerial vehicle onboard base station based on user trajectory prediction
CN108684047B (en) * 2018-07-11 2020-09-01 北京邮电大学 Unmanned aerial vehicle bearing small base station communication system and method
CN109195135B (en) * 2018-08-06 2021-03-26 同济大学 Base station selection method based on deep reinforcement learning in LTE-V

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107680151A (en) * 2017-09-27 2018-02-09 千寻位置网络有限公司 Strengthen the method and its application of the indicative animation fulfillment capability in Web3D

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4181559A4 (en) * 2020-07-13 2024-01-10 Huawei Technologies Co., Ltd. Communication method and communication device

Also Published As

Publication number Publication date
CN109743210A (en) 2019-05-10

Similar Documents

Publication Publication Date Title
CN109743210B (en) Unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning
CN110809306B (en) Terminal access selection method based on deep reinforcement learning
Cao et al. Deep reinforcement learning for multi-user access control in non-terrestrial networks
CN113873434B (en) Communication network hotspot area capacity enhancement oriented multi-aerial base station deployment method
CN111526592B (en) Non-cooperative multi-agent power control method used in wireless interference channel
CN110380776B (en) Internet of things system data collection method based on unmanned aerial vehicle
CN114900225B (en) Civil aviation Internet service management and access resource allocation method based on low-orbit giant star base
CN113115344B (en) Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization
CN113055078B (en) Effective information age determination method and unmanned aerial vehicle flight trajectory optimization method
CN115499921A (en) Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network
Cao et al. Deep reinforcement learning for multi-user access control in UAV networks
CN114980169A (en) Unmanned aerial vehicle auxiliary ground communication method based on combined optimization of track and phase
Chen et al. An actor-critic-based UAV-BSs deployment method for dynamic environments
Najla et al. Machine learning for power control in D2D communication based on cellular channel gains
CN106060917A (en) Antenna and power joint allocation algorithm based on group match
CN114268348A (en) Honeycomb-free large-scale MIMO power distribution method based on deep reinforcement learning
CN101217345A (en) A detecting method on vertical layered time space code message system
Cao et al. Multi-tier collaborative deep reinforcement learning for non-terrestrial network empowered vehicular connections
He et al. Guest editorial 5G wireless communications with high mobility
CN116866974A (en) Federal learning client selection method based on deep reinforcement learning
CN116634450A (en) Dynamic air-ground heterogeneous network user association enhancement method based on reinforcement learning
Si et al. UAV-assisted Semantic Communication with Hybrid Action Reinforcement Learning
Zheng et al. NSATC: An interference aware framework for multi-cell NOMA TUAV airborne provisioning
CN114980205A (en) QoE (quality of experience) maximization method and device for multi-antenna unmanned aerial vehicle video transmission system
Tarekegn et al. Channel Quality Estimation in 3D Drone Base Station for Future Wireless Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant