CN117880858B - Multi-unmanned aerial vehicle track optimization and power control method based on communication learning - Google Patents

Multi-unmanned aerial vehicle track optimization and power control method based on communication learning Download PDF

Info

Publication number
CN117880858B
CN117880858B CN202410275005.9A CN202410275005A CN117880858B CN 117880858 B CN117880858 B CN 117880858B CN 202410275005 A CN202410275005 A CN 202410275005A CN 117880858 B CN117880858 B CN 117880858B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
neural network
information
communication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410275005.9A
Other languages
Chinese (zh)
Other versions
CN117880858A (en
Inventor
毕远国
袁梓梦
刘羽霏
刘雨衡
郑彤
樊彦伯
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Priority to CN202410275005.9A priority Critical patent/CN117880858B/en
Publication of CN117880858A publication Critical patent/CN117880858A/en
Application granted granted Critical
Publication of CN117880858B publication Critical patent/CN117880858B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

The invention belongs to the technical field of intelligent decision and control of robots, and discloses a multi-unmanned aerial vehicle track optimization and power control method based on communication learning, which can maximize the number of users meeting service quality under the condition of lower energy consumption. The invention designs a communication mechanism, and adds memory storage equipment to help UAV-ABS acquire and utilize experiences of other unmanned aerial vehicles and store self-learned experiences. In order to construct a more effective unmanned aerial vehicle cooperation strategy and reduce the cost of network model deployment, the invention designs a concentrated attention commentator neural network, which can reduce redundant information and solve the problem of dimension disasters caused by the increase of the number of UAV-ABS and ground users.

Description

Multi-unmanned aerial vehicle track optimization and power control method based on communication learning
Technical Field
The invention relates to the technical field of intelligent decision and control of robots, in particular to a multi-unmanned aerial vehicle track optimization and power control method based on communication learning.
Background
The use of Unmanned aerial vehicles as air base stations (UAV-ABS) has attracted considerable academic and industrial attention. Compared with a ground fixed wireless base station, the UAV-ABS has the following advantages: firstly, the three-dimensional movement of the wireless communication device can bring a higher visual angle, so that a sight distance wireless transmission link with higher possibility is provided for ground users, and the communication quality is improved. Second, UAV-ABS is an ideal solution for post-disaster areas such as floods, earthquakes, or hot spots of short-lived communication needs such as concert, performance sites. Finally, unmanned networking is more flexible and the cost of the required links is lower. In addition, a network using a drone as an air base station is becoming an indispensable component in the next generation mobile communication network system.
The deployment of UAV-ABS still faces many challenges, including limitations in communication range, bandwidth, and energy consumption. UAV-ABS needs to move in almost every slot to provide high quality wireless service near ground users. However, excessive unnecessary movement may cause an increase in energy consumption, thereby degrading communication quality. Furthermore, we need to control the power allocation of the drone to achieve a tradeoff between communication quality and interference management. Thus, we have urgent need for a carefully designed strategy to help multiple UAV-ABS adaptively power distribution and optimize flight trajectories.
Traditional mathematical methods convert the non-convex problem into a convex problem for solution, however, this method sacrifices accuracy and cannot handle the mobility of the ground user. The existing method mainly utilizes a multi-agent reinforcement learning algorithm, such as a multi-agent depth deterministic strategy gradient algorithm, to solve, but cannot realize direct communication and information sharing between UAV-ABS. Such limitations may lead to information asymmetry and hinder cooperation between the UAV-ABS. In addition, as the number of UAV-ABS increases, the likelihood that the critics network will be affected by irrelevant information also increases, creating a problem of dimension disasters.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a multi-unmanned aerial vehicle track optimization and power control method based on communication learning.
The technical scheme of the invention is as follows: a multi-unmanned aerial vehicle track optimization and power control method based on communication learning comprises the following steps:
1) Building a multi-unmanned aerial vehicle auxiliary wireless communication system, wherein the multi-unmanned aerial vehicle auxiliary wireless communication system comprises an unmanned aerial vehicle, a ground user and a training center; defining a motion model, a communication model and an energy consumption model, and constructing a joint optimization objective function;
2) Converting the joint optimization objective function in the step 1) into Markov games, determining observation, actions and rewards, and designing a multi-agent reinforcement learning algorithm for solving the problem of the joint optimization objective function; the multi-agent reinforcement learning algorithm comprises a communication actor neural network NN1, a target actor neural network NN2, a concentrated attention commentary neural network NN3 and a target commentary family neural network NN4;
3) Initializing the position of the unmanned aerial vehicle and the position of a ground user; initializing neural network parameters in an empirical buffer and multi-agent reinforcement learning algorithm, including communication actor neural network parameters Target actor neural network parameters/>Focusing on commentator neural network parameters/>And target comment home neural network parameters/>
4) Training begins and memory storage device is initialized; For each drone, its communication actor neural network NN1 is derived from memory storage device/>Reading communication information/>And combine local observation information/>Pair/>Coding, learning and updating; update information/>Is stored to a memory storage device/>In (a) and (b);
The update information The acquisition steps are as follows, firstly, the local observation information is encodedWherein/>Representing a fully connected neural network;
After encoding the local observation information, the communication actor neural network NN1 in the unmanned aerial vehicle reads experience from the memory storage device and learns the experience by combining the experience with the previous encoded information, and the formula is expressed as ; Wherein the method comprises the steps ofGating cell/>By concatenating information vectors/>Linear mapping acquisition, wherein/>Representing an activation function,/>Is extracted by gating unit only/>Context vector of spatiotemporal information, formulated as/>,/>Is a linear mapped learnable vector; /(I)Learning information of the unmanned aerial vehicle;
the unmanned aerial vehicle selectively updates and stores the learned information in the memory storage device In (a) and (b); two gating units/>And/>Wherein/>And/>Is a parameter that can be learned; candidate update information is denoted/>Wherein/>As a learnable parameter, the last updated information is denoted/>
5) Communication actor neural network based on local observation informationCommunication information/>And memory storage device/>Update information stored therein/>Making an action, the action comprising unmanned aerial vehicle/>Shaft speed/>Y-axis speed/>And transmit power/>
6) When all unmanned aerial vehicles giveShaft speed/>Y-axis speed/>And transmit power/>After that, the training center receives environmental feedback including rewards and the status of the next time slot; the training center packages all unmanned plane states, actions, communication information and rewards of the next time slot into experiences, and stores the experiences into an experience buffer area of the training center;
7) When the experience stored in the experience buffer exceeds a certain amount, the training center extracts a batch of experience to update the neural network parameters; the attentive critic network NN3 calculates the current action value using a batch of experience And updating the parameter/>, by minimizing the joint regression function; Updating parameters/>, by means of strategy gradient descent, of communication actor neural network NN1; Soft updating the target actor neural network NN2 and the target critic network NN4; soft update target comment home neural network NN4 parameter,/>For super parameters, soft updating the target actor neural network NN2 parameters,
8) The unmanned aerial vehicle outputs a track decision result and a power control decision result by utilizing a self-deployed trained communication actor neural network NN 1.
The multi-drone assisted wireless communication system in a multi-drone wireless communication scenario,The individual unmanned aerial vehicle is deployed as an air base station, as/>Providing communication service by individual ground users; /(I)Defined as a collection of unmanned aerial vehicles,Defined as a ground user set; multi-unmanned aerial vehicle auxiliary wireless communication system with equal length/>Run on consecutive time slots, where/>Defined as a set of time slots;
defining a motion model of the unmanned aerial vehicle; the three-dimensional coordinates of the ground user are defined as The three-dimensional coordinates of the unmanned aerial vehicle are/>Wherein/>The flying height of the unmanned aerial vehicle; unmanned aerial vehicle is atThe coordinates of the time slots are expressed as,
(1)
Wherein the method comprises the steps ofIndicating that unmanned aerial vehicle is in time slot/>Running speed of/>Is an environmental impact factor,/>Representing the duration of a time slot,/>Expressed as maximum speed that the unmanned aerial vehicle can reach;
Defining an unmanned aerial vehicle communication model; the unmanned aerial vehicle communication model simultaneously considers the sight distance LoS and the non-sight distance NLoS; the probability of the line of sight LoS between the drone n and the ground user m is expressed as:
(2)
The probability of NLoS is that ; Where a, b are constants that depend on the environment,Is the elevation angle between the drone n and the ground user m,
(3)
Wherein,Representing the horizontal distance between the unmanned plane and the user; the channel gains of the drone and the ground user are expressed as:
(4)
Wherein the method comprises the steps of Is the path loss index,/>Is the path loss of LoS,/>Is the path loss of NLoS; definition of the definitionFor white gaussian noise, the signal to interference plus noise ratio received by ground users and unmanned aerial vehicles is defined as:
(5)
Wherein the method comprises the steps of Representing the transmitting power of the current unmanned aerial vehicle; /(I)The transmitting power is expressed as the transmitting power of other unmanned aerial vehicles; in the case of applying the frequency division multiple access wireless communication technology, the data transmission rate of the terrestrial users is defined as follows using shannon's capacity theorem:
(6)
Wherein the method comprises the steps of Representing the number of ground users within the range of the current unmanned plane;
Defining an energy consumption model; the energy consumption of the unmanned aerial vehicle comprises communication energy consumption of data transmission and flight energy consumption of the unmanned aerial vehicle in the moving process; to simplify the analysis, the energy consumption of the unmanned aerial vehicle during take-off, landing and hover is eliminated; definition of the definition For blade power,/>For blade speed, unmanned aerial vehicle's flight energy consumption represents:
(7)
the communication energy consumption of data transmission is expressed as ,/>Representing rated emission power of the unmanned aerial vehicle; the energy consumption model is expressed as:
(8)
Defining a joint optimization objective function; the goal of the multi-unmanned aerial vehicle auxiliary wireless communication system is to control the operation speed and the emission power of unmanned aerial vehicles in T time slots, and simultaneously maximize the number of ground users meeting the service quality requirement and minimize the energy consumption, and the joint optimization objective function is expressed as:
(9)
Is limited by
Is limited by
Is limited by
Wherein the method comprises the steps ofThe maximum energy consumption of the unmanned aerial vehicle is achieved; /(I)Is an index function for measuring the service quality of the ground user ifThe formula value is 1, otherwise is/>Is the minimum transmission rate/>, which the surface user needs to meetA weight factor representing the relative importance of adjusting energy consumption and ground user quality of service in a joint optimization objective function.
The markov game is defined as that the unmanned aerial vehicle obtains real-time rewards by selecting actions, observing the current state and obtaining real-time rewards at each time slot, the rewards being used for interacting with the environment; a common goal of all drones is to maximize long-term accumulated rewards by selecting the best sequence of actions;
Defining observation information, actions, and rewards; the observation information of the unmanned aerial vehicle comprises the position information of all unmanned aerial vehicles and the position information of users in the communication range of the unmanned aerial vehicle;
(10)
The unmanned aerial vehicle comprises unmanned aerial vehicle Shaft speed/>Y-axis speed/>And transmit power/>It is expressed as:
(11)
all unmanned aerial vehicles aim at maximizing the number of ground users meeting the quality of service while minimizing the energy consumption of the multi-unmanned aerial vehicle auxiliary wireless communication system; meeting various constraints while executing tasks, when not adhering to the constraints, being penalized ; The prize is therefore formulated as:
(12)。
the communication actor neural network in the step 5) combines the self-observation information Learning information/>Updating informationGive action, its formula is expressed as/>,/>Neural network NN1 for communicating actors; the communication actor neural network NN1 comprises a characteristic network, an action network and a series of linear transformations; the characteristic network is a three-layer full-connection neural network, and the action network is a two-layer full-connection neural network; in the forward propagation process, self-observed informationGenerating update information/>, through feature network, linear transformation, sigmoid function and write operation; The action network combines the unmanned aerial vehicle with the self-observation information/>Learning information/>And update information/>And splicing, and calculating the final action output through the two full-connection layers and the activation function.
The training center in step 6) obtains rewardsNext state/>; Defining a set of statesAnd communication experience set/>; Experience/>Stored in an experience buffer, and the action value/> is calculated by using the concentrated attention commentator neural network NN3A value; the focused critic neural network NN3 architecture is as follows: the input information of the concentrated attention commentator neural network NN3 is the states and actions of all unmanned aerial vehicles; encoding input information by using a linear layer formed by a plurality of fully connected networks to obtain encoded information/>; Secondly, the multi-head attention layer is utilized to screen information, and the information is encoded/>Mapped to three trainable and shared weight matrices: query/>Bond/>Sum/>The formula is:
(13)
(14)
(15)
Wherein the method comprises the steps of 、/>And/>Is the mapped result; the attention weight of a single attention head is then calculated by three weight matrices, where/>Representing a key vector, the formula is:
(16)
the final multi-headed attention layer output is a weighted sum of all attention weights ; Next/>And/>Inputting the information into an addition and normalization layer, and performing addition and normalization operation on the information to ensure the validity and consistency of the information; normalized information/>Linear and nonlinear transformation by feed-forward layer to yield output/>The activation function of the feed-forward layer is Relu functions; finally will/>And/>The predicted/>, is finally obtained as input through addition and normalization processingValues.
Updating parameters by means of strategy gradient descent in the step 7)
(17)
Wherein the method comprises the steps ofCommunication actor neural network NN1 parameters representing nth agent,/>Operation for describing gradients in three-dimensional space,/>Representing the predicted Q value of the concentrated attention commentator neural network NN3,/>Rewards representing cumulative policy,/>Representing a desire to fix a value function of an action of the other agent; /(I)To balance maximum entropy and rewards;
Updating the concentrated attention commentator neural network NN3 parameters, and updating the formula to be expressed as:
(18)
Wherein the method comprises the steps of Representing target/>Value, final goal is to minimize the difference between predicted value and target value/>Representing the current prize value,/>Representing the predicted Q value of the target critic neural network NN 4; /(I)Representing target actor neural network NN2 parameters; /(I)To balance maximum entropy and rewards.
The invention has the beneficial effects that: aiming at the defects of the prior art, the invention provides a multi-unmanned aerial vehicle track optimization and power control method based on communication learning. To help the actor's network obtain more a priori information before performing the action, we devised a communication mechanism that adds a memory storage device to help the UAV-ABS obtain and utilize the experience of other drones and store the experience learned by itself. In order to build more efficient unmanned aerial vehicle cooperation strategies and reduce the cost of network model deployment, we designed a focused review home network that can reduce redundant information and solve the dimension disaster problem that occurs with increasing numbers of UAV-ABS and ground users.
Drawings
FIG. 1 is a flow chart of a method for multi-unmanned aerial vehicle trajectory optimization and power control based on communication learning;
FIG. 2 is a multi-unmanned auxiliary wireless communication system model;
FIG. 3 is a framework of a communication actor's concentrated attention critics algorithm;
FIG. 4 is a block diagram of a focused commentator neural network;
FIG. 5 is a training prize diagram for different algorithms;
FIG. 6 is a graph of energy consumption versus various algorithms.
Fig. 7 is a graph of the number of users meeting the quality of service for different algorithms.
Detailed Description
The present invention will be described in further detail with reference to the drawings and embodiments, in order to make the objects, technical solutions and advantages of the present invention more apparent.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: a multi-unmanned aerial vehicle track optimization and power control method based on communication learning comprises the following steps:
Step1: and constructing a multi-unmanned aerial vehicle auxiliary wireless communication system, defining a motion model, a communication model and an energy consumption model, and constructing a joint optimization objective function.
Step 2: converting the joint optimization objective function in the step 1 into Markov games, determining observation, actions and rewards, and designing a multi-agent reinforcement learning algorithm to solve the problem of the joint optimization objective function. The multi-agent reinforcement learning algorithm comprises a plurality of types of neural networks: communication actor neural network NN1, target actor neural network NN2, concentration reviewer neural network NN3, and target review home neural network NN4.
Step 3: initializing the position of the unmanned aerial vehicle and the position of a ground user; initializing neural network parameters in an empirical buffer and multi-agent reinforcement learning algorithm, including communication actor neural network parametersTarget actor neural network parameters/>Focusing on commentator neural network parameters/>And target comment home neural network parameters/>
Step 4: the invention designs a communication mechanism to help the unmanned aerial vehicle acquire, learn and update the communication experience of other unmanned aerial vehicles, and then stores the updated communication experience. Training begins and memory storage device is initialized; For each drone, its communication actor neural network NN1 is derived from memory storage device/>Reading communication information/>And incorporate local observation informationPair/>Coding, learning and updating; update information/>Is stored to a memory storage device/>Is a kind of medium.
Step 5: communication actor neural network based on local observation informationCommunication information/>And memory storage device/>Update information stored therein/>Making an action, the action comprising unmanned aerial vehicle/>Shaft speed/>Y-axis speed/>And transmit power/>
Step 6: when all unmanned aerial vehicles giveShaft speed/>Y-axis speed/>And transmit power/>After that, the training center receives environmental feedback including rewards and the status of the next time slot; the training center packages all unmanned plane states, actions, communication information and rewards of the next time slot into experiences, and stores the experiences into an experience buffer area of the training center;
Step 7: when the experience stored in the experience buffer exceeds a certain amount, the training center extracts a batch of experience to update the neural network parameters; the attentive critic network NN3 calculates the current action value using a batch of experience And updating the parameter/>, by minimizing the joint regression function; Updating parameters of communication actor neural network NN1 in a strategy gradient descending manner; Soft updating the target actor neural network NN2 and the target critic network NN4;
step 8: the unmanned aerial vehicle outputs track decision and power control decision results by using a trained communication actor neural network NN1 deployed by the unmanned aerial vehicle.
The flow chart of the technical scheme is shown in figure 1.
The specific steps of the step 1 include:
step 1.1: in a multi-drone wireless communication scenario, The individual unmanned aerial vehicle is deployed as an air base station feed/>Individual ground subscribers provide communication services. /(I)Defined as unmanned aerial vehicle set,/>Defining a ground user set; multi-unmanned aerial vehicle auxiliary wireless communication system with equal length/>Run on consecutive time slots, where/>Defined as a set of time slots;
a multi-drone assisted wireless communication system model is shown in fig. 2.
Step 1.2: a motion model of the drone is defined. The three-dimensional coordinates of the user are defined asThe three-dimensional coordinates of the unmanned aerial vehicle are/>Wherein/>Is the flying height of the unmanned aerial vehicle. Unmanned aerial vehicle is/>The coordinates of the time slots are expressed as,
Wherein the method comprises the steps ofIndicating that unmanned aerial vehicle is in time slot/>Running speed of/>Is an environmental impact factor,/>Representing the duration of a time slot,/>Expressed as the maximum speed that the unmanned aerial vehicle can reach.
Step 1.3: a drone communication model is defined. The unmanned aerial vehicle communication model considers both line of sight LoS and non-line of sight NLoS. The probability of the line of sight LoS between the drone n and the ground user m is expressed as:
The probability of NLoS is that . Where a, b are constants dependent on the environment,/>Is the elevation angle between the two,
Wherein,Representing the horizontal distance of the drone from the user. The channel gains of the drone and the ground user are expressed as:
Wherein the method comprises the steps of Is the path loss index,/>Is the path loss of LoS,/>Is the path loss of NLoS. Definition of the definitionFor white gaussian noise, the signal to interference plus noise ratio received by ground users and unmanned aerial vehicles is defined as:
Wherein the method comprises the steps of Representing the current unmanned aerial vehicle transmitting power,/>Represented as the transmit power of the other drones. In the case of applying the frequency division multiple access wireless communication technology, the data transmission rate of the terrestrial users is defined as follows using shannon's capacity theorem:
Wherein the method comprises the steps of Representing the number of ground users within range of the current drone.
Step 1.4: an energy consumption model is defined. The energy consumption of the unmanned aerial vehicle comprises communication energy consumption of data transmission and flight energy consumption of the unmanned aerial vehicle in the moving process. To simplify the analysis, the energy consumption of the unmanned aerial vehicle during take-off, landing and hover is excluded. Definition of the definitionFor blade power,/>For blade speed, unmanned aerial vehicle's flight energy consumption represents:
the communication energy consumption of data transmission is expressed as ,/>Indicating the rated transmit power of the drone. The energy consumption model is expressed as:
Step 1.5: a joint optimization objective function is defined. Multi-unmanned aerial vehicle assisted wireless communication system targeting through control The operation speed and the emission power of the unmanned aerial vehicle in each time slot are maximized, the number of ground users meeting the service quality requirement is maximized, and the energy consumption is minimized, and the joint optimization objective function is expressed as:
Is limited by
Is limited by
Is limited by
Wherein the method comprises the steps ofIs the maximum energy consumption of the unmanned aerial vehicle. /(I)Is an index function for measuring the service quality of the ground user ifThe formula value is 1, otherwise 0. /(I)Is the minimum transmission rate that the surface user needs to meet. /(I)A weight factor representing the relative importance of adjusting energy consumption and ground user quality of service in a joint optimization objective function.
The specific steps of the step 2 include:
Step 2.1: to establish a clear problem setting for the algorithm, markov games are defined before the deep reinforcement learning approach is used. Markov games are defined as unmanned aerial vehicles that interact with the environment by selecting actions, observing the current state, and obtaining real-time rewards at each time slot. A common goal of all drones is to maximize long-term accumulated rewards by selecting the best sequence of actions.
Step 2.2: observation information, actions, and rewards are defined. The observation information of the unmanned aerial vehicle comprises the position information of all unmanned aerial vehicles and the position information of users in the communication range, and the observation information is expressed as:
The unmanned aerial vehicle comprises unmanned aerial vehicle Shaft speed/>、/>Shaft speed/>And transmit power/>It is expressed as:
all unmanned aerial vehicles aim to maximize the number of ground users meeting quality of service while minimizing the energy consumption of the multi-unmanned aerial vehicle assisted wireless communication system. Meeting various constraints while executing tasks, when not adhering to the constraints, being penalized . The prize is therefore formulated as:
(12)
Step 2.3: the communication actor concentrating on the reviewer algorithm CACAC is designed to solve the joint optimization problem, and the algorithm framework diagram is shown in fig. 3. Algorithms include a variety of neural networks: communication actor neural network NN1, target actor neural network NN2, concentration reviewer neural network NN3, and target review home neural network NN4.
Initializing a user, a unmanned aerial vehicle position, a neural network parameter and an experience buffer. The neural network parameters include communication actor neural network parametersTarget actor neural network parameters/>Focusing on commentator neural network parameters/>And target comment home neural network parameters/>
Training begins and memory storage device is initialized
For each unmanned aerial vehicle, its communication actor neural network NN1 is from memory storage deviceReading communication messagesAnd combine local observation information/>Pair/>Coding, learning and updating are carried out. Then update information/>Is stored to a memory storage device/>Is a kind of medium. Communication actor neural network NN1 based on local observations/>Communication message/>And update information/>An action is made. After all unmanned aerial vehicles give actions, the training center will receive environmental feedback including rewards and status of the next time slot. The training center packages all the unmanned plane states, the states of the next time slot, actions, communication information and rewards into experiences and stores the experiences into an experience buffer. When the experience stored in the experience buffer exceeds a certain amount, the training center may extract a batch of experience to update the neural network parameters. The concentrated commentator network NN3 calculates the current action value/>, using a batch of experienceAnd updating the parameter/>, by minimizing the joint regression function. Updating parameters/>, by means of strategy gradient descent, of communication actor neural network NN1
The communication mechanism comprises three steps of acquisition, learning and updating. For each drone, they acquire local observationsCommunication message/>. The communication message is encoded, read and updated. First encoding local observations/>Wherein/>Representing a fully connected neural network.
After encoding the local observations, the communication actor neural network NN1 in the unmanned aerial vehicle reads experience from the communication storage device and learns by combining the previous encoded information, expressed as a formula; Wherein the method comprises the steps ofGating cell/>By concatenating information vectors/>Linear mapping acquisition, wherein/>Representing an activation function,/>Is used by the gating unit only to extract/>Context vector of spatio-temporal information, formulated as,/>Is a linear mapped leachable vector,/>Is learning information of the unmanned aerial vehicle.
Finally, the unmanned aerial vehicle selectively updates the learned information and stores the information in the memory storage deviceIs a kind of medium. Two gating units/>And/>Wherein/>And/>Is a parameter that can be learned. Candidate update information is denoted/>The last updated information is represented as
The step 5 comprises the following steps:
Step 5.1: the unmanned plane gives actions by combining self observation, learning information and updating information, and the formula is expressed as Wherein/>For communication of actor neural networks NN1.
Step 5.2: the communication actor neural network NN1 is composed of a feature network, an action network, and a series of linear transformations. The characteristic network is a three-layer fully connected neural network, and the action network is a two-layer fully connected neural network. In the forward propagation process, self-observed informationObtaining feature vector/>, through feature network coding. Then by a read operation, calculate/>, using a linear transformation and a sigmoid functionFor controlling the reading of the message. Then by a write operation, a linear transformation and activation function is used to calculate/>、/>And/>For generating updated messages/>. Finally, the action network combines the unmanned aerial vehicle with the self-observation information/>Learning information/>And update information/>And splicing, and calculating the final action output through the two full-connection layers and the activation function. /(I)
The specific steps of the step 6 include:
Step 6.1: obtaining rewards Next state/>
Step 6.2: defining a set of statesAnd communication experience set/>And will experienceAnd storing the action values into an experience buffer area, and calculating the action values by using the concentrated attention commentator neural network NN 3.
Step 6.3: as shown in fig. 4, the concentrated commentator neural network NN3 has the following architecture: the input information of the model is the states and actions of all unmanned aerial vehicles; firstly, we encode input information by using a linear layer formed by a plurality of fully connected networks to obtain encoded information; Secondly, the multi-head attention layer is utilized to screen information, and the information is encoded/>Mapped to three trainable and shared weight matrices: query/>Bond/>Sum/>The formula is:
Wherein the method comprises the steps of 、/>And/>Is the result after mapping. The attention weight of a single attention head is then calculated by three weight matrices, formulated as: /(I)
The final multi-headed attention layer output is a weighted sum of all attention weights. Next/>And/>Input into the addition and normalization layer, and perform addition and normalization operation on the information to ensure the validity and consistency of the information. Normalized information/>Linear and nonlinear transformation by feed-forward layer to yield output/>The activation function of the feed-forward layer is Relu functions. Finally will/>And/>The predicted/>, is finally obtained as input through addition and normalization processingValues.
The specific steps of the step 7 include:
step 7.1: updating actor neural network NN1 parameters using strategy gradient descent:
Wherein the method comprises the steps of Represents the/>Communication actor network parameters of individual agents,/>Rewards representing cumulative policy,/>Is a desire to fix the value function of an action under other agent actions.
Step 7.2: updating the concentrated attention commentator neural network NN3 parameters, and updating the formula to be expressed as:
Wherein the method comprises the steps of Representing/>, of concentrated-attention commentator neural network NN3 predictionsValue/>Representing target/>Value, final goal is to minimize the difference between predicted value and target value/>。/>Representing the current prize value,/>To balance maximum entropy and rewards.
Step 7.3: soft update target comment home neural network NN4 parameterTarget actor neural network NN2 parameters/>
The present invention implements the proposed method using python3.7 as a programming language and performs subsequent training and verification. The training times of the algorithm are 30000 times, the training rounds are 25 steps each time, the training set size extracted by the experience buffer area is 256, and the learning rate of the communication actor neural network NN1 and the concentrated attention critter neural network is 0.001. In order to evaluate the effectiveness of the method, the invention is compared with the current advanced MADDPG, commnet reinforcement learning algorithm. The CACAC algorithm provided by the invention performs comparison analysis from three indexes of rewarding, the number of users meeting the service quality and energy consumption. Fig. 5 shows the change of rewards during training of three algorithms, wherein the solid line is the method proposed by the present invention. Fig. 6 and 7 show the number of users and the energy consumption of 3 algorithms, respectively, that meet the quality of service during the execution phase, wherein the diamonds are the method proposed by the present invention. It can be found from the figure that the method provided by the invention can converge faster and reach higher rewarding value in the training process, and the algorithm can enable more ground users to meet the service quality under lower energy consumption in the execution stage, i.e. the method provided by the invention has better performance.
The present invention is not limited to the above-described embodiments, and any modifications, variations, substitutions, etc. which are within the spirit and principles of the present invention should be included in the scope of the present invention. What is not described in detail in the present specification belongs to the prior art known to those skilled in the art.

Claims (6)

1. The multi-unmanned aerial vehicle track optimization and power control method based on communication learning is characterized by comprising the following steps of:
1) Building a multi-unmanned aerial vehicle auxiliary wireless communication system, wherein the multi-unmanned aerial vehicle auxiliary wireless communication system comprises an unmanned aerial vehicle, a ground user and a training center; defining a motion model, a communication model and an energy consumption model, and constructing a joint optimization objective function;
2) Converting the joint optimization objective function in the step 1) into Markov games, determining observation, actions and rewards, and designing a multi-agent reinforcement learning algorithm for solving the problem of the joint optimization objective function; the multi-agent reinforcement learning algorithm comprises a communication actor neural network NN1, a target actor neural network NN2, a concentrated attention commentary neural network NN3 and a target commentary family neural network NN4;
3) Initializing the position of the unmanned aerial vehicle and the position of a ground user; initializing neural network parameters in an empirical buffer and multi-agent reinforcement learning algorithm, including communication actor neural network parameters Target actor neural network parameters/>Focusing on commentator neural network parameters/>And target comment home neural network parameters/>
4) Training begins and memory storage device is initialized; For each drone, its communication actor neural network NN1 is derived from memory storage device/>Reading communication information/>And combine local observation information/>Pair/>Coding, learning and updating; update information/>Is stored to a memory storage device/>In (a) and (b);
The update information The acquisition steps are as follows, firstly encoding/>, local observation informationWherein/>Representing a fully connected neural network;
After encoding the local observation information, the communication actor neural network NN1 in the unmanned aerial vehicle reads experience from the memory storage device and learns the experience by combining the experience with the previous encoded information, and the formula is expressed as ; Wherein the method comprises the steps ofGating cell/>By concatenating information vectors/>Linear mapping acquisition, wherein/>Representing an activation function,/>Is extracted by gating unit only/>Context vector of spatiotemporal information, formulated as/>,/>Is a linear mapped learnable vector; /(I)Learning information of the unmanned aerial vehicle;
the unmanned aerial vehicle selectively updates and stores the learned information in the memory storage device In (a) and (b); two gating units/>And/>Wherein/>And/>Is a parameter that can be learned; candidate update information is denoted/>Wherein/>Is a parameter that can be learned; the last updated information is denoted/>
5) Communication actor neural network based on local observation informationCommunication information/>And memory storage device/>Update information stored therein/>Making an action, the action comprising unmanned aerial vehicle/>Shaft speed/>Y-axis speed/>And transmit power
6) When all unmanned aerial vehicles giveShaft speed/>Y-axis speed/>And transmit power/>After that, the training center receives environmental feedback including rewards and the status of the next time slot; the training center packages all unmanned plane states, actions, communication information and rewards of the next time slot into experiences, and stores the experiences into an experience buffer area of the training center;
7) When the experience stored in the experience buffer exceeds a certain amount, the training center extracts a batch of experience to update the neural network parameters; the attentive critic network NN3 calculates the current action value using a batch of experience And updating the parameter/>, by minimizing the joint regression function; Updating parameters/>, by means of strategy gradient descent, of communication actor neural network NN1; Soft updating the target actor neural network NN2 and the target critic network NN4; soft update target comment home neural network NN4 parameter,/>Soft updating target actor neural network NN2 parameters for super parameters,/>
8) The unmanned aerial vehicle outputs a track decision result and a power control decision result by utilizing a self-deployed trained communication actor neural network NN 1.
2. The method for optimizing and controlling power of multiple unmanned aerial vehicle trajectories based on communication learning according to claim 1, wherein the multiple unmanned aerial vehicle auxiliary wireless communication system is used for controlling power of multiple unmanned aerial vehicles in a multiple unmanned aerial vehicle wireless communication scene,The individual unmanned aerial vehicle is deployed as an air base station, as/>Providing communication service by individual ground users; /(I)Defined as a collection of unmanned aerial vehicles,Defined as a ground user set; multi-unmanned aerial vehicle auxiliary wireless communication system with equal length/>Run on consecutive time slots, where/>Defined as a set of time slots;
defining a motion model of the unmanned aerial vehicle; the three-dimensional coordinates of the ground user are defined as The three-dimensional coordinates of the unmanned aerial vehicle are/>Wherein/>The flying height of the unmanned aerial vehicle; unmanned aerial vehicle is/>The coordinates of the time slots are expressed as,
(1)
Wherein the method comprises the steps ofIndicating that unmanned aerial vehicle is in time slot/>Running speed of/>Is an environmental impact factor,/>Representing the duration of a time slot,/>Expressed as maximum speed that the unmanned aerial vehicle can reach;
Defining an unmanned aerial vehicle communication model; the unmanned aerial vehicle communication model simultaneously considers the sight distance LoS and the non-sight distance NLoS; the probability of the line of sight LoS between the drone n and the ground user m is expressed as:
(2)
The probability of NLoS is that ; Where a, b are constants dependent on the environment,/>Is the elevation angle between the drone n and the ground user m,
(3)
Wherein,Representing the horizontal distance between the unmanned plane and the user; the channel gains of the drone and the ground user are expressed as:
(4)
Wherein the method comprises the steps of Is the path loss index,/>Is the path loss of LoS,/>Is the path loss of NLoS; definition/>For white gaussian noise, the signal to interference plus noise ratio received by ground users and unmanned aerial vehicles is defined as:
(5)
Wherein the method comprises the steps of Representing the transmitting power of the current unmanned aerial vehicle; /(I)The transmitting power is expressed as the transmitting power of other unmanned aerial vehicles; in the case of applying the frequency division multiple access wireless communication technology, the data transmission rate of the terrestrial users is defined as follows using shannon's capacity theorem:
(6)
Wherein the method comprises the steps of Representing the number of ground users within the range of the current unmanned plane;
Defining an energy consumption model; the energy consumption of the unmanned aerial vehicle comprises communication energy consumption of data transmission and flight energy consumption of the unmanned aerial vehicle in the moving process; to simplify the analysis, the energy consumption of the unmanned aerial vehicle during take-off, landing and hover is eliminated; definition of the definition For blade power,/>For blade speed, unmanned aerial vehicle's flight energy consumption represents:
(7)
the communication energy consumption of data transmission is expressed as ,/>Representing rated emission power of the unmanned aerial vehicle; the energy consumption model is expressed as:
(8)
Defining a joint optimization objective function; the goal of the multi-unmanned aerial vehicle auxiliary wireless communication system is to control the operation speed and the emission power of unmanned aerial vehicles in T time slots, and simultaneously maximize the number of ground users meeting the service quality requirement and minimize the energy consumption, and the joint optimization objective function is expressed as:
(9)
Is limited by
Is limited by
Is limited by
Wherein the method comprises the steps ofThe maximum energy consumption of the unmanned aerial vehicle is achieved; /(I)Is an index function for measuring the service quality of the ground user ifThe formula value is 1, otherwise is/>Is the minimum transmission rate that the surface user needs to meetA weight factor representing the relative importance of adjusting energy consumption and ground user quality of service in a joint optimization objective function.
3. A multi-drone trajectory optimization and power control method based on communication learning according to claim 2, characterized in that the markov game is defined as that the drone obtains real-time rewards for interacting with the environment by selecting actions, observing the current state and at each time slot; a common goal of all drones is to maximize long-term accumulated rewards by selecting the best sequence of actions;
Defining observation information, actions, and rewards; the observation information of the unmanned aerial vehicle comprises the position information of all unmanned aerial vehicles and the position information of users in the communication range of the unmanned aerial vehicle;
(10)
The unmanned aerial vehicle comprises unmanned aerial vehicle Shaft speed/>Y-axis speed/>And transmit power/>It is expressed as:
(11)
all unmanned aerial vehicles aim at maximizing the number of ground users meeting the quality of service while minimizing the energy consumption of the multi-unmanned aerial vehicle auxiliary wireless communication system; meeting various constraints while executing tasks, when not adhering to the constraints, being penalized ; The prize is therefore formulated as:
(12)。
4. The method for optimizing and controlling power of multiple unmanned aerial vehicle trajectories based on communication learning of claim 3, wherein said step 5) of communicating actor neural networks incorporates self-observed information Learning information/>Updating informationGive action, its formula is expressed as/>,/>Neural network NN1 for communicating actors; the communication actor neural network NN1 comprises a characteristic network, an action network and a series of linear transformations; the characteristic network is a three-layer full-connection neural network, and the action network is a two-layer full-connection neural network; in the forward propagation process, self-observed information/>Generating update information/>, through feature network, linear transformation, sigmoid function and write operation; The action network combines the unmanned aerial vehicle with the self-observation information/>Learning information/>And update information/>And splicing, and calculating the final action output through the two full-connection layers and the activation function.
5. The method for optimizing and controlling power of multiple unmanned aerial vehicle trajectories based on communication learning of claim 4, wherein the training center in step 6) obtains rewardsNext state/>; Define state set/>And communication experience set/>; Experience/>Stored in an experience buffer, and the action value/> is calculated by using the concentrated attention commentator neural network NN3A value; the focused critic neural network NN3 architecture is as follows: the input information of the concentrated attention commentator neural network NN3 is the states and actions of all unmanned aerial vehicles; encoding input information by using a linear layer formed by a plurality of fully connected networks to obtain encoded information/>; Secondly, the multi-head attention layer is utilized to screen information, and the information is encoded/>Mapped to three trainable and shared weight matrices: query/>Bond/>Sum/>The formula is:
(13)
(14)
(15)
Wherein the method comprises the steps of 、/>And/>Is the mapped result; the attention weight of a single attention head is then calculated by three weight matrices, where/>Representing a key vector, the formula is:
(16)
the final multi-headed attention layer output is a weighted sum of all attention weights ; Next/>And/>Inputting the information into an addition and normalization layer, and performing addition and normalization operation on the information to ensure the validity and consistency of the information; normalized informationLinear and nonlinear transformation by feed-forward layer to yield output/>The activation function of the feed-forward layer is Relu functions; finally will/>And/>The predicted/>, is finally obtained as input through addition and normalization processingValues.
6. The method for optimizing and controlling power of multiple unmanned aerial vehicles based on communication learning according to claim 5, wherein the parameters are updated in step 7) by means of strategy gradient descent
(17)
Wherein the method comprises the steps ofCommunication actor neural network NN1 parameters representing nth agent,/>Operation for describing gradients in three-dimensional space,/>Representing the predicted Q value of the concentrated attention commentator neural network NN3,/>Rewards representing cumulative policy,/>Representing a desire to fix a value function of an action of the other agent; /(I)To balance maximum entropy and rewards;
Updating the concentrated attention commentator neural network NN3 parameters, and updating the formula to be expressed as:
(18)
Wherein the method comprises the steps of Representing target/>Value, final goal is to minimize the difference between predicted value and target value/>Representing the current prize value,/>Representing the predicted Q value of the target critic neural network NN 4; /(I)Representing target actor neural network NN2 parameters; /(I)To balance maximum entropy and rewards.
CN202410275005.9A 2024-03-12 2024-03-12 Multi-unmanned aerial vehicle track optimization and power control method based on communication learning Active CN117880858B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410275005.9A CN117880858B (en) 2024-03-12 2024-03-12 Multi-unmanned aerial vehicle track optimization and power control method based on communication learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410275005.9A CN117880858B (en) 2024-03-12 2024-03-12 Multi-unmanned aerial vehicle track optimization and power control method based on communication learning

Publications (2)

Publication Number Publication Date
CN117880858A CN117880858A (en) 2024-04-12
CN117880858B true CN117880858B (en) 2024-05-10

Family

ID=90579489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410275005.9A Active CN117880858B (en) 2024-03-12 2024-03-12 Multi-unmanned aerial vehicle track optimization and power control method based on communication learning

Country Status (1)

Country Link
CN (1) CN117880858B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113467508A (en) * 2021-06-30 2021-10-01 天津大学 Multi-unmanned aerial vehicle intelligent cooperative decision-making method for trapping task
CN116774584A (en) * 2023-06-25 2023-09-19 重庆邮电大学 Unmanned aerial vehicle differentiated service track optimization method based on multi-agent deep reinforcement learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113467508A (en) * 2021-06-30 2021-10-01 天津大学 Multi-unmanned aerial vehicle intelligent cooperative decision-making method for trapping task
CN116774584A (en) * 2023-06-25 2023-09-19 重庆邮电大学 Unmanned aerial vehicle differentiated service track optimization method based on multi-agent deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于延迟策略的最大熵优势演员评论家算法;祁文凯;桑国明;;小型微型计算机系统;20200815(08);全文 *

Also Published As

Publication number Publication date
CN117880858A (en) 2024-04-12

Similar Documents

Publication Publication Date Title
CN112351503B (en) Task prediction-based multi-unmanned aerial vehicle auxiliary edge computing resource allocation method
CN113162679B (en) DDPG algorithm-based IRS (intelligent resilient software) assisted unmanned aerial vehicle communication joint optimization method
CN110531617B (en) Multi-unmanned aerial vehicle 3D hovering position joint optimization method and device and unmanned aerial vehicle base station
CN109617584B (en) MIMO system beam forming matrix design method based on deep learning
WO2021017227A1 (en) Path optimization method and device for unmanned aerial vehicle, and storage medium
CN112902969B (en) Path planning method of unmanned aerial vehicle in data collection process
CN109936865B (en) Mobile sink path planning method based on deep reinforcement learning algorithm
CN113395654A (en) Method for task unloading and resource allocation of multiple unmanned aerial vehicles of edge computing system
CN115499921A (en) Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network
CN115065678A (en) Multi-intelligent-device task unloading decision method based on deep reinforcement learning
CN116436512A (en) Multi-objective optimization method, system and equipment for RIS auxiliary communication
Nasr-Azadani et al. Single-and multiagent actor–critic for initial UAV’s deployment and 3-D trajectory design
CN113382060B (en) Unmanned aerial vehicle track optimization method and system in Internet of things data collection
Zhang et al. Multi-objective optimization for UAV-enabled wireless powered IoT networks: an LSTM-based deep reinforcement learning approach
Shi et al. Age of information optimization with heterogeneous uavs based on deep reinforcement learning
CN117880858B (en) Multi-unmanned aerial vehicle track optimization and power control method based on communication learning
Wang et al. Trajectory planning of UAV-enabled data uploading for large-scale dynamic networks: A trend prediction based learning approach
CN114879726A (en) Path planning method based on multi-unmanned-aerial-vehicle auxiliary data collection
CN115119174A (en) Unmanned aerial vehicle autonomous deployment method based on energy consumption optimization in irrigation area scene
Si et al. UAV-assisted Semantic Communication with Hybrid Action Reinforcement Learning
CN112765892A (en) Intelligent switching judgment method in heterogeneous Internet of vehicles
Gu et al. UAV-enabled mobile radiation source tracking with deep reinforcement learning
Gao et al. Multi-agent reinforcement learning for UAVs 3D trajectory designing and mobile ground users scheduling with no-fly zones
Yang et al. Deep reinforcement learning in NOMA-assisted UAV networks for path selection and resource offloading
Wu et al. UAV-Assisted Data Synchronization for Digital-Twin-Enabled Vehicular Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant