CN116896777A - Unmanned aerial vehicle group general sense one-body energy optimization method based on reinforcement learning - Google Patents

Unmanned aerial vehicle group general sense one-body energy optimization method based on reinforcement learning Download PDF

Info

Publication number
CN116896777A
CN116896777A CN202310843486.4A CN202310843486A CN116896777A CN 116896777 A CN116896777 A CN 116896777A CN 202310843486 A CN202310843486 A CN 202310843486A CN 116896777 A CN116896777 A CN 116896777A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
value
energy consumption
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310843486.4A
Other languages
Chinese (zh)
Inventor
刘荣科
祝倩
刘启瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202310843486.4A priority Critical patent/CN116896777A/en
Publication of CN116896777A publication Critical patent/CN116896777A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0203Power saving arrangements in the radio access network or backbone network of wireless communication networks
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G5/00Traffic control systems for aircraft, e.g. air-traffic control [ATC]
    • G08G5/0047Navigation or guidance aids for a single aircraft
    • G08G5/0069Navigation or guidance aids for a single aircraft specially adapted for an unmanned aircraft
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0209Power saving arrangements in terminal devices
    • H04W52/0212Power saving arrangements in terminal devices managed by the network, e.g. network or access point is master and terminal is slave

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides an unmanned aerial vehicle group general sense integrated energy consumption optimization method based on reinforcement learning, which comprises the following steps: firstly, utilizing the positioning perception performance of the unmanned aerial vehicle cluster to endow an initial value to a Q value function network reflecting the relation between the state and the action; judging the current state; selecting a certain working point as an initial state; thirdly, selecting a current action based on an epsilon-greedy strategy according to the current state; introducing the perception performance of the unmanned aerial vehicle and the energy consumption of the communication task into the design of a Reward rewarding function, and acquiring an actual environment rewarding value of the previous action and the next state; fifthly, updating the Q value function network by using the actual environment rewarding value of the previous step; and sixthly, setting the new state as the current state, and repeating the steps of three to six until the value in the Q value function network reaches convergence. The invention can solve the problem of larger energy consumption in the ISAC network based on the unmanned aerial vehicle group in the prior art, simultaneously ensure the real-time service performance of communication and positioning of the terminal ground user, and effectively improve the service life of the network.

Description

Unmanned aerial vehicle group general sense one-body energy optimization method based on reinforcement learning
Technical Field
The invention belongs to the technical field of unmanned aerial vehicle communication perception integration, and particularly relates to an unmanned aerial vehicle group communication perception integrated energy optimization method based on reinforcement learning.
Background
The next generation wireless network (B5G/6G) is driving the continual innovation and development of wireless technology while providing key assistance for many emerging applications, such as interconnected intelligence, interconnected automobiles, smart cities, etc., which require high quality wireless communication connections and high precision perceptibility. Therefore, it is expected that the communication and sensing capabilities are required to be simultaneously provided in the B5G/6G network, so that the utilization rate of spectrum resources is improved. Among them, the communication awareness Integration (ISAC) technology is widely recognized as one of effective schemes for achieving this object. In order to meet the real demand of users for the cooperative coverage of the sense-of-general service domain, future wireless networks need to be deployed in complex terrains and electromagnetic environments such as dense cities and mountainous areas. However, in the above complex environment, the existing network technologies represented by cellular network and Global Navigation Satellite System (GNSS) have certain drawbacks, so that the network performance of the existing network technologies cannot meet the high-quality communication and perception requirements of users.
In particular, in severe environments, such as high-rise forests or remote mountainous areas, conventional satellite positioning services may result in poor quality of service for terrestrial users communicating and perceived due to network marginalization and the like. In this case, the unmanned aerial vehicle group is expected to make up for the deficiency by combining corresponding communication and sensing technologies by virtue of the advantages of high mobility, flexible deployment and the like. Therefore, unmanned aerial vehicle cluster assist technology has received extensive attention in academia and industry in recent years. However, the current power supply mode of the unmanned aerial vehicle is usually based on battery charging, and even if a few types of unmanned aerial vehicles can be subjected to energy replenishment through solar energy or other schemes, the energy supply is relatively limited, so that the unmanned aerial vehicle has the defect of limited resources. In this case, it is more important to reasonably allocate resources such as power and service users, so as to reduce energy consumption of the space-based platform. Therefore, how to minimize the energy consumption of the unmanned aerial vehicle by optimizing the power distribution and service strategies of the unmanned aerial vehicle on the basis of completing the communication and perception tasks is a core research problem of the invention. The research of the problem is expected to effectively reduce the energy consumption of the space-based cluster system on the premise of ensuring the communication and perception performances of the system, thereby prolonging the service life of the unmanned aerial vehicle cluster general sense integrated network.
Disclosure of Invention
The invention aims to provide an energy consumption optimization method for a group communication system of unmanned aerial vehicles based on reinforcement learning, which is used for solving the problem of larger energy consumption in an ISAC network based on the unmanned aerial vehicles in the prior art, guaranteeing the real-time service performance of communication and positioning of terminal ground users, and effectively improving the service life of the network.
The invention provides an unmanned aerial vehicle group general sense integrated energy consumption optimization method based on reinforcement learning, and the established system environment is as follows: in a single base station coverage cellular communication network, the base station coordinates are (x 0 ,y 0 ,z 0 ) The unmanned aerial vehicle is assembled intoThe ground user set is +.>In the invention, the one-time complete process of the unmanned plane completing positioning sensing and communication task unloading is a decision period, each moment t is assumed to be a decision period, a state set of all decision periods is called a state space set S, and the state space set S is expressed as: s= { S 1 ,s 2 ,......,s t ,......}. The set of actions for all decision periods is referred to as the action space set a, denoted as: a= { a 1 ,a 2 ,......,a t ,......}. At time t, the position coordinates of the unmanned plane-m and the user-l to be positioned are u respectively m (t)=(x m (t),y m (t)) T And v l =(x l ,y l ) T And assume that the height of the unmanned plane platform-m is a fixed value H m . All unmanned aerial vehicles in the cluster fly on a preset flight track meeting a certain threshold range, communication and position sensing services are provided for ground users, and all unmanned aerial vehicles can be locatedAnd identifying the user terminal to be served.
The first step, utilizing the positioning perception performance of the unmanned aerial vehicle cluster to give an initial value to a Q value function network reflecting the relation between the unmanned aerial vehicle state and the action.
And secondly, judging the current state of the unmanned aerial vehicle from the environment. A certain working point can be selected as an initial state of the unmanned aerial vehicle.
And thirdly, selecting the current action based on the epsilon-greedy strategy according to the current state of the unmanned aerial vehicle.
And fourthly, introducing the perception performance of the unmanned aerial vehicle and the energy consumption of the communication task into the design of a forward rewarding function, and acquiring an actual environment rewarding value of the last action and the next state of the unmanned aerial vehicle.
And fifthly, updating the Q value function network by using the actual environment rewarding value of the previous step.
And sixthly, setting the new state as the current state, and repeating the third step to the sixth step until the value in the Q value function network reaches convergence.
In the above steps, the following key technical points are mainly involved:
(1) Initializing Q value function network by using unmanned aerial vehicle cluster positioning perception performance
The geometry between the drone and the users to be served on the ground is a precondition for affecting their positioning awareness performance, which can be characterized by a position (three-dimensional) precision factor PDOP. The unmanned aerial vehicle selects different actions under different states, so that the geometric configuration presented between the unmanned aerial vehicle and the user is different, and the capability of providing perception service for the user to be served is different. Suppose at time t that the subset of unmanned aerial vehicle base stations providing location-aware services for user-l is S k (t) and assuming the number of unmanned aerial vehicles in the set to be M 0 And each. Then a position (three-dimensional) precision factor of the subset of unmanned aerial vehicles is calculatedThe value, which can be expressed as:
in the above-mentioned method, the step of,is a subset S of unmanned aerial vehicle base stations k The jacobian matrix of the localization perceptual observation equation of (t) may be further expressed as:
in the above, u 1 (t)=(x 1 (t),y 1 (t)) T ,(1∈s k (t))、 Respectively represent the base station subsets S of the unmanned aerial vehicle k Unmanned aerial vehicle-1, unmanned aerial vehicle-m in (t) 0 Unmanned plane-M 0 Coordinates of (c); similarly, H 1 、/>Respectively unmanned aerial vehicle base station subsets S k Unmanned aerial vehicle-1, unmanned aerial vehicle-m in (t) 0 Unmanned plane-M 0 V of fixed height value of (v) l =(x l ,y l ) T The position coordinates of the user-l to be located.
The method for initializing the Q value function network by using the unmanned aerial vehicle cluster positioning perception performance comprises the following steps: when the unmanned aerial vehicle is in a normal working state, selecting a subset S of unmanned aerial vehicle base stations at a time t k (t) when providing a sense-of-general service to a user, the corresponding value in the Q-table grid isThe remaining grid positions are assigned zero. Utilizing the positioning perception performance of the unmanned aerial vehicle cluster to give an initial value to the Q value function network, thereby providing a priori for the reinforcement learning network of the unmanned aerial vehicle intelligent bodyThe information is further beneficial to learning of the intelligent agent.
(2) Energy consumption for communication task upload
At time t, loS channel power gain from the mth drone to the first ground user may be expressed as:
where α is the channel loss coefficient of the channel, β 0 Unit (per meter) channel gain, d m,l (t) is the distance from the mth unmanned aerial vehicle to the first ground user, and hasFurther, the SNR of the link at that time m,l (t) can be expressed as:
wherein P is the constant transmitting power of the space-based platform, sigma 2 In order for the noise power to be high,whether the air-based platform provides a sense-of-general service for ground users is represented, and the specific meaning is as follows: when->When do not provide sense of general service, when +.>And providing a sense-of-general service. />For interference of other unmanned aerial vehicle platforms, i.e. co-channel interference, where P u (t) is the transmitting power of other unmanned aerial vehicle platforms at the moment t; g u,l (t) is LoS channel power increase of other unmanned aerial vehicle platforms at time tBenefit is provided; />Representing a collection of drones. Data transmission rate R of the mth unmanned aerial vehicle to the first ground user link m,l (t) can be expressed as:
R m,l (t)=B·log 2 (1+SNR m,l (t)) (5)
wherein B is the signal bandwidth. Further, the energy consumption E of the link m,l (t) can be expressed as:
wherein P is the constant transmitting power of the space-based platform,whether the air-based platform provides a sense-of-general service for ground users is represented, and the specific meaning is as follows: when->When do not provide sense of general service, when +.>And providing a sense-of-general service. P (P) m (t)∈[0,1,,5]And the number of service users of one unmanned plane is represented. It is assumed that the data uploading task of each ground user can only be sent to one unmanned aerial vehicle at most.
(3) Action selection policy
Q-learning algorithm is used as a value-based reinforcement learning algorithm, and iterative update of Q value function is utilized to find optimal strategy pi of unmanned aerial vehicle (intelligent agent) * . In the algorithm running process, the intelligent agent selects action execution according to an epsilon-greedy strategy, namely, the probability of random action selection of the intelligent agent is epsilon, and the probability of action corresponding to the maximum value of the Q value function network is 1 epsilon.
When the algorithm starts to execute, Q is first performed by the method described in (1)Initialization of the table, followed by selection of the current state s t For each action a in this state, there is a corresponding "state-action" value, denoted Q (s t And (a). In this case, the action in this state is selected according to the epsilon-greedy policy, that is, the action corresponding to the maximum value in the Q-value function network is selected, as shown in the following formula:
after the action is selected, the agent starts to perform the action and then enters the next state s t+1 And obtaining a prize value r (t) in the current action selection decision period, and updating the value of the corresponding position in the corresponding Q network at the same time:
wherein, gamma is a discount factor and gamma is [0,1]. For the prize value function r (t), see (4) for a detailed description.
(4) Prize value function r (t) design
In order to comprehensively ensure the communication and perception performances of the unmanned aerial vehicle group network, a reward function r (t) is designed to be expressed as the following formula at the time t:
wherein SNR is thr The signal-to-noise ratio threshold value of the known system parameter is set in advance, and the aim is to ensure the communication performance of the intelligent body; PDOP thr Is a three-dimensional precision factor threshold value of a known parameter. The purpose is to ensure the perception performance of the intelligent body. Based on the function, the general sense characteristic of the unmanned aerial vehicle group network in the energy consumption optimization action selection process is guaranteed.
Compared with the existing method, the unmanned aerial vehicle group general sense one-energy optimization method based on reinforcement learning has the advantages that:
(a) According to the invention, the problems of communication perception performance and network energy consumption of the unmanned aerial vehicle cluster are comprehensively considered, an energy efficiency optimal strategy based on the unmanned aerial vehicle cluster general perception integrated network is provided, and the energy consumption of the unmanned aerial vehicle network can be effectively reduced on the premise of ensuring the communication and perception performance of an unmanned aerial vehicle system, so that the network service life of the unmanned aerial vehicle cluster with limited resources is prolonged.
(b) The invention designs the intelligent decision algorithm based on reinforcement learning, so that an unmanned aerial vehicle can adaptively select the number of service users and power according to the dynamically-changed environment, and the maximum task uploading flow is unloaded with the minimum network energy consumption on the premise of guaranteeing the system communication performance, thereby avoiding the stiffness mode of the traditional centralized network control and overcoming the difficulty brought by the dynamic strategy formulation of the environment.
Drawings
In order to make the technical principle and the specific flow scheme of the proposed invention more clear, the following description will simply explain and describe the related drawings related to the embodiments. It is apparent that the drawings 1 to 3 described below are merely for description and illustration of embodiments, and that such other drawings may be obtained without inventive effort for a person of ordinary skill in the art.
Fig. 1 is a schematic flow chart of a reinforcement learning-based unmanned aerial vehicle group general sense one-energy consumption optimization method provided by the invention.
Fig. 2 is a schematic diagram of a learning flow of a Q-value function network according to the present invention.
Fig. 3 is a schematic view of a scene of a sense-of-general integrated network based on an unmanned aerial vehicle group.
Detailed Description
The features and principles of the present invention are further described below with reference to the drawings and examples, the examples being given for illustrative purposes only and not for limiting the scope of the invention.
Referring to FIG. 1, consider a cellular network covered by a single base station, in a scenario with a network radius of 500m, the present inventionThe method for optimizing the energy consumption of the unmanned aerial vehicle group through sensing based on reinforcement learning is provided. Will be described below in terms of parametersIndicating whether the corresponding selected user is serviced; p (P) m (t)∈[0,1,,5]Representing the number of service users of one unmanned plane; assume that the exploration rate epsilon=0.8 of the reinforcement learning network is set; discount factor γ=0.9; l (L) max =1000; communication threshold SNR thr =2db; positioning perception threshold PDOP thr =1.5; the number of unmanned aerial vehicles used for providing positioning perception service for each ground user is 4; and assuming that the data uploading task of each ground user can only be sent to 1 unmanned aerial vehicle at most, the provided specific implementation mode of the overall invention method is further explained and described in detail.
Step one: in the running process of the system, firstly, a Q-table grid is established, then, according to the proposal, the Q value function network is initialized by utilizing the positioning perception performance of the unmanned aerial vehicle cluster, namely, the three-dimensional precision factor, specifically, when the unmanned aerial vehicle is in a normal working state, the unmanned aerial vehicle base station subset S is selected at the time t k (t) when providing a sense-of-general service to a user, the value in the corresponding Q-table grid is-PDOP sk(t) The remaining grid positions are assigned zero.
Step two: and selecting a certain state as an initial state of the unmanned aerial vehicle intelligent body according to the current network environment.
Step three: selecting the second selected current state s based on epsilon-greedy strategy t Actions selected by the service status and number of service users of the next drone, i.e. determined according to formulas (7) and epsilon=0.8And P m The value of (t). Specifically, an action corresponding to the maximum value of the values in the Q-value function network, i.e., ++0.2, is selected with probability 1 ε=0.2>Randomly selecting an action with probability epsilon=0.8;
step four: after the action decision is completed, the unmanned aerial vehicle obtains the energy consumption of the communication uploading task in the action decision period, namely E is obtained through a formula (6) m,l (t), and willAnd P m The value of (t) and the communication threshold SNR thr =2db and positioning perception threshold PDOP thr =1.5 together with the formula (9), and the calculated prize value r (t) is transferred to the next state s at the same time t+1
Step five: substituting the search rate epsilon=0.8 and the discount factor gamma=0.9 of the reinforcement learning network into the formula (8) to obtain Q(s) under the action t ,a t ) Updating the value of the Q value function network;
step six: will be new state s t+1 Setting the current state, and repeating the steps three to six until the value in the Q value function network reaches convergence.
When the value in the Q value function network finally reaches a convergence state through continuous updating, the Q-table can be used for guiding the unmanned aerial vehicle to make an optimal decision in a corresponding state, namely, the optimal number of service users and the transmitting power in the corresponding state are selected, and the optimal user communication task flow unloading strategy, namely, the optimal energy efficiency of the unmanned aerial vehicle is obtained. The overall algorithm flow is given below:
Q-Learning algorithm: energy efficiency optimal strategy for unmanned aerial vehicle cluster general sense integrated network
Initialization is performed for any S e S, a e a (S),
utilizing key technology (1) to assign initial value to Q-table
Initializing t=1, epsilon=0.8, gamma=0.9,
repeating:
initializing state s according to current environmental information
Repeated execution in each action decision period t:
selecting action a in state s according to epsilon-greedy policy
Executing action a, obtaining the bonus function value and entering the next state s 'by using the formula (9)'
Updating the value corresponding to the Q-table by using the formula (8)
Let t=t+1, s' =s
Repeating the above steps until the maximum iteration number l is reached max =1000。
The overall detailed flow chart of the features and principles of the present invention as referred to in the above-listed embodiments is shown in fig. 2, and the related scene diagram for which the present invention is directed is shown in fig. 3.
In summary, according to the reinforcement learning-based unmanned aerial vehicle group general sense integrated energy consumption optimization method, the prior information is provided for the reinforcement learning network by utilizing the perception performance of the unmanned aerial vehicle, so that the unmanned aerial vehicle can reach the optimal target state more quickly and efficiently. On the other hand, on the premise of guaranteeing the cluster communication and the perception performance of the unmanned aerial vehicle, the energy consumption of system tasks is reduced, so that the service life of the unmanned aerial vehicle network is effectively prolonged, and the method has important significance for the unmanned aerial vehicle system with limited resources.

Claims (10)

1. The utility model provides an unmanned aerial vehicle crowd sense integrative energy consumption optimization method based on reinforcement learning which characterized in that: the method comprises the following steps:
firstly, utilizing the positioning perception performance of an unmanned aerial vehicle cluster to endow an initial value to a Q value function network reflecting the relation between the unmanned aerial vehicle state and the action;
secondly, judging the current state of the unmanned aerial vehicle from the environment; selecting a certain working point as an initial state of the unmanned aerial vehicle;
thirdly, selecting a current action based on an epsilon-greedy strategy according to the current state of the unmanned aerial vehicle;
introducing the perception performance of the unmanned aerial vehicle and the energy consumption of the communication task into the design of a forward rewarding function, and acquiring an actual environment rewarding value of the previous action and the next state of the unmanned aerial vehicle;
fifthly, updating the Q value function network by using the actual environment rewarding value of the previous step;
and sixthly, setting the new state as the current state, and repeating the third step to the sixth step until the value in the Q value function network reaches convergence.
2. The reinforcement learning-based unmanned aerial vehicle group general sense integrated energy consumption optimization method is characterized by comprising the following steps of: the Q value function network initialization value is: set at time t, the subset of unmanned aerial vehicle base stations providing location-aware services for user-l is S k (t) and the number of unmanned aerial vehicles in the set is M 0 A plurality of; then the positional accuracy factor of the subset of unmanned aerial vehicles is calculatedValues expressed as:
in the above-mentioned method, the step of,is a subset S of unmanned aerial vehicle base stations k A jacobian matrix of the localization perceptual observation equation of (t).
3. The reinforcement learning-based unmanned aerial vehicle group general sense integrated energy consumption optimization method is characterized by comprising the following steps of:further represented by the formula:
wherein u is 1 (t)=(x 1 (t),y 1 (t)) T ,(1∈s k (t))、 (M 0 ∈s k (t)) respectively represent a subset S of unmanned aerial vehicle base stations k Unmanned aerial vehicle-1, unmanned aerial vehicle-m in (t) 0 Unmanned plane-M 0 Coordinates of (c); similarly, H 1 、/>Respectively unmanned aerial vehicle base station subsets S k Unmanned aerial vehicle-1, unmanned aerial vehicle-m in (t) 0 Unmanned plane-M 0 V of fixed height value of (v) l =(x l ,y l ) T The position coordinates of the user-l to be located.
4. The reinforcement learning-based unmanned aerial vehicle group general sense integrated energy consumption optimization method according to claim 1 or 2, wherein the method comprises the following steps of: when the unmanned aerial vehicle is in a normal working state, selecting a subset S of unmanned aerial vehicle base stations at a time t k (t) when providing a sense-of-general service to a user, the corresponding value in the Q-table grid isThe rest grid positions are assigned to zero; and utilizing the positioning perception performance of the unmanned aerial vehicle cluster to assign an initial value to the Q-value function network.
5. The reinforcement learning-based unmanned aerial vehicle group general sense integrated energy consumption optimization method is characterized by comprising the following steps of: the communication task energy consumption is that at a time t, the LoS channel power gain from the mth unmanned aerial vehicle to the ith ground user is expressed as:
where α is the channel loss coefficient of the channel, β 0 Unit (per meter) channel gain, d m,l (t) is the distance from the mth unmanned aerial vehicle to the first ground user, and has
6. The reinforcement learning-based unmanned aerial vehicle group general sense integrated energy consumption optimization method is characterized by comprising the following steps of: signal-to-noise ratio SNR of a link at that time m,l (t) is expressed as:
wherein P is the constant transmitting power of the space-based platform, sigma 2 In order for the noise power to be high,whether the air-based platform provides a sense-of-general service for ground users is represented, and the specific meaning is as follows: when->When do not provide sense of general service, when +.>Providing a sense-of-general service; />For interference of other unmanned aerial vehicle platforms, i.e. co-channel interference, where P u (t) is the transmitting power of other unmanned aerial vehicle platforms at the moment t; g u,l (t) is the LoS channel power gain of other unmanned aerial vehicle platforms at time t; />Representing a collection of unmanned aerial vehicles; data transmission rate R of the mth unmanned aerial vehicle to the first ground user link m,l (t) is expressed as:
R m,l (t)=B·log 2 (1+SNR m,l (t)) (5)
wherein B is the signal bandwidth.
7. The reinforcement learning-based unmanned aerial vehicle group general sense integrated energy consumption optimization method is characterized by comprising the following steps of: energy consumption E of link m,l (t) is expressed as:
wherein P is the constant transmitting power of the space-based platform,whether the air-based platform provides a sense-of-general service for ground users is represented, and the specific meaning is as follows: when->When do not provide sense of general service, when +.>Providing a sense-of-general service; p (P) m (t)∈[0,1,…,5]Representing the number of service users of one unmanned plane; the data uploading task of each ground user can only be sent to one unmanned plane at most.
8. The reinforcement learning-based unmanned aerial vehicle group general sense integrated energy consumption optimization method is characterized by comprising the following steps of: selecting the current action as based on the epsilon-greedy policy: the probability of random action selection of the intelligent agent is epsilon, and the probability of action corresponding to the maximum value of the Q value function network is 1-epsilon;
when starting execution, first, the method in equation (1) is used to initialize the Q-table, and then the current state s is selected t For each action a in this state, there is a corresponding "state-action" value, denoted Q (s t A); in this case, the action in this state is selected in accordance with the epsilon-greedy policy, i.eSelecting an action corresponding to the maximum value of the values in the Q value function network, wherein the action is as follows:
9. the reinforcement learning-based unmanned aerial vehicle group general sense integrated energy consumption optimization method is characterized by comprising the following steps of: after the action is selected, the agent starts to perform the action and then enters the next state s t+1 And obtaining a prize value r (t) in the current action selection decision period, and updating the value of the corresponding position in the corresponding Q network at the same time:
wherein, gamma is a discount factor and gamma is [0,1].
10. The reinforcement learning-based unmanned aerial vehicle group general sense integrated energy consumption optimization method is characterized by comprising the following steps of: prize value r (t) design: the bonus function r (t) is designed to be expressed at time t as:
wherein SNR is thr The signal-to-noise ratio threshold value of the known system parameter is set in advance, and the aim is to ensure the communication performance of the intelligent body;
PDOP thr a three-dimensional precision factor threshold value which is a known parameter; the aim is to ensure the perception performance of the intelligent body; based on the function, the general sense characteristic of the unmanned aerial vehicle group network in the energy consumption optimization action selection process is guaranteed.
CN202310843486.4A 2023-07-11 2023-07-11 Unmanned aerial vehicle group general sense one-body energy optimization method based on reinforcement learning Pending CN116896777A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310843486.4A CN116896777A (en) 2023-07-11 2023-07-11 Unmanned aerial vehicle group general sense one-body energy optimization method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310843486.4A CN116896777A (en) 2023-07-11 2023-07-11 Unmanned aerial vehicle group general sense one-body energy optimization method based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN116896777A true CN116896777A (en) 2023-10-17

Family

ID=88314419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310843486.4A Pending CN116896777A (en) 2023-07-11 2023-07-11 Unmanned aerial vehicle group general sense one-body energy optimization method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN116896777A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117241300A (en) * 2023-11-16 2023-12-15 南京信息工程大学 Unmanned aerial vehicle-assisted general sense calculation network fusion method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117241300A (en) * 2023-11-16 2023-12-15 南京信息工程大学 Unmanned aerial vehicle-assisted general sense calculation network fusion method
CN117241300B (en) * 2023-11-16 2024-03-08 南京信息工程大学 Unmanned aerial vehicle-assisted general sense calculation network fusion method

Similar Documents

Publication Publication Date Title
CN113162679B (en) DDPG algorithm-based IRS (intelligent resilient software) assisted unmanned aerial vehicle communication joint optimization method
CN113162682B (en) PD-NOMA-based multi-beam LEO satellite system resource allocation method
CN112672361B (en) Large-scale MIMO capacity increasing method based on unmanned aerial vehicle cluster deployment
CN114980169B (en) Unmanned aerial vehicle auxiliary ground communication method based on track and phase joint optimization
CN115278729B (en) Unmanned plane cooperation data collection and data unloading method in ocean Internet of things
CN114142908B (en) Multi-unmanned aerial vehicle communication resource allocation method for coverage reconnaissance task
CN116896777A (en) Unmanned aerial vehicle group general sense one-body energy optimization method based on reinforcement learning
CN114422363A (en) Unmanned aerial vehicle loaded RIS auxiliary communication system capacity optimization method and device
CN115499921A (en) Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network
Parvaresh et al. A continuous actor–critic deep Q-learning-enabled deployment of UAV base stations: Toward 6G small cells in the skies of smart cities
CN113919483A (en) Method and system for constructing and positioning radio map in wireless communication network
CN114142912A (en) Resource control method for guaranteeing time coverage continuity of high-dynamic air network
CN114885340B (en) Ultra-dense wireless network power distribution method based on deep migration learning
Ma et al. Machine learning based joint offloading and trajectory design in UAV based MEC system for IoT devices
CN115407794A (en) Sea area safety communication unmanned aerial vehicle track real-time planning method based on reinforcement learning
CN114051252B (en) Multi-user intelligent transmitting power control method in radio access network
Huang et al. A method for deploying the minimal number of UAV base stations in cellular networks
CN117412267B (en) Communication method of unmanned aerial vehicle cluster network
CN114158010A (en) Unmanned aerial vehicle communication system and resource allocation strategy prediction method based on neural network
CN116704823B (en) Unmanned aerial vehicle intelligent track planning and general sense resource allocation method based on reinforcement learning
CN116009590B (en) Unmanned aerial vehicle network distributed track planning method, system, equipment and medium
CN115696352B (en) 6G unmanned aerial vehicle base station site planning method and system based on circle coverage power optimization
CN117053790A (en) Single-antenna unmanned aerial vehicle auxiliary communication flight route-oriented planning method
CN115765826A (en) Unmanned aerial vehicle network topology reconstruction method for on-demand service
CN113993099B (en) Three-dimensional space-oriented mobile unmanned aerial vehicle user switching parameter configuration method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination