CN114679729A - Radar communication integrated unmanned aerial vehicle cooperative multi-target detection method - Google Patents
Radar communication integrated unmanned aerial vehicle cooperative multi-target detection method Download PDFInfo
- Publication number
- CN114679729A CN114679729A CN202210336444.7A CN202210336444A CN114679729A CN 114679729 A CN114679729 A CN 114679729A CN 202210336444 A CN202210336444 A CN 202210336444A CN 114679729 A CN114679729 A CN 114679729A
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- detection
- mth
- radar
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 89
- 230000006854 communication Effects 0.000 title claims abstract description 74
- 238000004891 communication Methods 0.000 title claims abstract description 72
- 238000000034 method Methods 0.000 claims abstract description 32
- 230000009471 action Effects 0.000 claims abstract description 18
- 238000011156 evaluation Methods 0.000 claims abstract description 18
- 238000013468 resource allocation Methods 0.000 claims abstract description 5
- 239000003795 chemical substances by application Substances 0.000 claims description 83
- 230000006870 function Effects 0.000 claims description 37
- 230000033001 locomotion Effects 0.000 claims description 22
- 230000008569 process Effects 0.000 claims description 14
- 230000006399 behavior Effects 0.000 claims description 10
- 230000002787 reinforcement Effects 0.000 claims description 9
- 239000000523 sample Substances 0.000 claims description 8
- 230000007704 transition Effects 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 6
- 230000010354 integration Effects 0.000 claims description 4
- 239000008186 active pharmaceutical agent Substances 0.000 claims description 3
- 230000009916 joint effect Effects 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 2
- 238000012549 training Methods 0.000 abstract description 11
- 238000013528 artificial neural network Methods 0.000 abstract description 5
- 230000004927 fusion Effects 0.000 description 9
- 238000005070 sampling Methods 0.000 description 7
- 230000000875 corresponding effect Effects 0.000 description 6
- 230000003068 static effect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000021824 exploration behavior Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000005096 rolling process Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 241000023308 Acca Species 0.000 description 1
- 235000012068 Feijoa sellowiana Nutrition 0.000 description 1
- 241000218218 Ficus <angiosperm> Species 0.000 description 1
- 208000001613 Gambling Diseases 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/30—TPC using constraints in the total amount of available transmission power
- H04W52/34—TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading
- H04W52/346—TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading distributing total power among users or channels
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S13/00—Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
- G01S13/02—Systems using reflection of radio waves, e.g. primary radar systems; Analogous systems
- G01S13/50—Systems of measurement based on relative movement of target
- G01S13/52—Discriminating between fixed and moving objects or between objects moving at different speeds
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/02—Resource partitioning among network components, e.g. reuse partitioning
- H04W16/10—Dynamic resource partitioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/22—Traffic simulation tools or models
Abstract
The invention provides a radar communication integrated unmanned aerial vehicle cooperative multi-target detection method, which is characterized in that a plurality of unmanned aerial vehicles carry radar communication integrated equipment for cooperative detection, each unmanned aerial vehicle is set as an intelligent body, a stable detection strategy is trained, the trained strategy is used for controlling flight tracks of the unmanned aerial vehicles and resource allocation between radar and communication, and finally a given detection task is quickly completed. The method takes the radar, communication and unmanned aerial vehicle flight states observed by each intelligent agent as the input of a strategy generation module, uses a deep neural network to map the states and actions observed by each intelligent agent into a random strategy, uses a strategy evaluation module to evaluate the strategy of each intelligent agent, and obtains a better cooperative strategy through module training. According to the invention, the search of multiple targets in the designated area is realized by efficiently planning resources such as radar, communication and the like on multiple unmanned aerial vehicles, and the search and discovery efficiency of multiple targets is greatly improved.
Description
Technical Field
The invention belongs to the field of radar communication integration and cluster cooperative detection, and particularly relates to a radar communication integration unmanned aerial vehicle cooperative multi-target detection method.
Background
The simultaneous detection only considers resource allocation in a static environment, the track design of the unmanned aerial vehicle is not considered in work, and the track design of the unmanned aerial vehicle is important to exert maneuverability and flexibility. For example, a static radar communication integrated UAV network utility optimization method based on power control is designed for Ficus, Liupeng and Mianyi; the invention provides an unmanned aerial vehicle cluster static radar communication integrated resource allocation method under reinforcement learning. 3) The unmanned aerial vehicle often needs time-varying channels and limited observation information when allocating radar communication resources in a dynamic environment, and the traditional optimization method is difficult to solve the problems. Such as feijoa, liupeng, and new gambling, use game theory to distribute the power of radar-communication integrated UAVs.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of the prior art, and provides a radar communication integrated unmanned aerial vehicle cooperative multi-target detection method, which comprises the following steps:
step 1, modeling an unmanned aerial vehicle cooperative multi-target detection problem;
and 2, designing a multi-agent cooperative detection scheme.
The step 1 comprises the following steps:
step 1-1, defining a problem;
step 1-2, designing flight path constraints of the unmanned aerial vehicle;
step 1-3, designing resource allocation under the integration of radar communication of the unmanned aerial vehicle;
step 1-4, measuring the performance of radar and communication of the unmanned aerial vehicle;
step 1-5, carrying out multi-unmanned aerial vehicle cooperative detection reinforcement learning modeling;
and 1-6, designing a strategy learning module and a strategy evaluation module.
The step 1-1 comprises the following steps: setting each unmanned aerial vehicle as an intelligent body, wherein all the intelligent bodies cooperate to complete the detection tasks of the areas, each unmanned aerial vehicle sends the information obtained by detection to the control center in real time through a communication link, the total detection time is T, and the data rate and the detection performance of the unmanned aerial vehicles and the control center are expected to be maximized by allocating radar and communication resources and the tracks of the unmanned aerial vehicles in the given areas within the detection time, wherein the detection performance is expressed by the detection fairness of all targets.
The step 1-2 comprises the following steps: dividing the whole detection time into S time slots, wherein the duration of each time slot is tau, each intelligent agent finishes detection and communication tasks in a small time period from the beginning of each time slot, and other time periods are used for flying; here, the time for communication and sounding is determined by the channel bandwidth allocated to both, and assuming that the channel bandwidth allocated to both is x hz, the execution time is 1/x. Typically this time is much less than τ.
At each flight interval, each drone can face θmFlight in the (t) direction of 0, 2 pim(t)∈[0,lMax]Distance of lMaxRepresents the maximum distance a drone can fly during time τ, this distance being determined by the model of the drone;for a coordinate of [ x ]m(0),ym(0)]The departing agent, the movement within time t is represented as:
wherein lm(t) represents the actual moving distance of the mth unmanned plane in the tth time slot; thetam(t ') represents the flight direction of the mth drone during the t' th time slot;
set that the unmanned aerial vehicle can only be in [ X ]Min,XMax]×[YMin,YMax]Thus, there are:
XMin≤xm(t)≤XMax
YMin≤ym(t)≤YMax
wherein, XMin,XMax,YMin,YMaxRespectively representing the movement minimum value of the unmanned aerial vehicle movement coordinate on an x axis, the movement maximum value on the x axis, the movement minimum value on a y axis and the movement maximum value on the y axis; the three-dimensional rectangular coordinate system with the origin of 0 is used, the X-y axis represents the ground, and the minimum value and the maximum value of the unmanned aerial vehicle capable of flying in the X-axis direction are XMin,XMaxIn the Y-axis direction, the minimum value and the maximum value that each unmanned aerial vehicle can fly are YMin,YMax. The positive half axis of the z-axis represents the flight height of the drone.
Set for safe distance between the unmanned aerial vehicle, show as:
dmm′(t)≥DS
wherein, dmm′(t) represents the distance of the mth drone to the mth' drone at the tth time slot; dSRepresenting the safe distance between any two drones.
The steps 1-3 comprise: the resources allocated for each drone radar and communication process are transmit power and channel:
for a given total transmitted power P, a power division factor is usedThe sounding and communication functions allocate the respective power,representing the communication power allocated to the mth drone at time t, indicating the radar transmission power, beta, allocated to the mth drone at time tm(t) represents the power allocation factor of the mth agent at time t;
for a total of K channels, ρmk(t) denotes the selection of the kth channel at time t, pmkWhen (t) is 1, the mth agent selects the kth channel, ρmkWhen (t) is 0, the mth agent does not select the kth channel.
The steps 1 to 4 comprise:
according to the power distributed to the mth unmanned aerial vehicle at the moment tThe detection range of each agent is estimated using the following radar equation:
wherein B represents the drone communication channel bandwidth; phi is am(t) represents the farthest distance that the mth drone can probe in the tth time slot; gTxAnd GRxRespectively representing the gain of the transmission and the gain of the receiving antenna, λ representing the wavelength of the transmitted signal, σ representing the effective detection area, Γ representing the boltzmann constant, T0Representing thermodynamic temperature, F and gamma representing radar noise and detection loss, respectively, phiMinRepresents a minimum signal-to-noise ratio for drone detection;
the condition that the mth agent detects the nth agent is defined as follows: phi is am(t)≥dmn(t) wherein dmn(t) represents tThe distance between the mth agent and the nth target at the moment;
defining a probe score εn(t) is:
wherein, cn(t) represents the number of times the nth object was detected by time t;
the fairness g (t) that defines the target being probed is:
wherein, N represents the total number of detected targets.
The steps 1 to 5 comprise: using a 5-tupleTo describe a decision process whereinRefers to the viewing space of each agent,refers to the joint state space of all agents,refers to the action space of the intelligent agent,refers to the reward function of the agent,refers to the transition probability of each agent;
observation spaceIs defined asCurrent time coordinate (x) of m agentsm(t),ym(t)), distance l moved at previous timem(t-1), direction θm(t-1), the last time is the channel rho allocated to the communication function of the unmanned aerial vehiclem(t-1), communication and radar power distribution factor beta of the last momentm(t-1), communication data rate R obtained at the last timem(t-1), indicated as a whole by
Movement spaceThe motion space is defined as the mth agent moving direction theta in the current momentm(t) a distance l movable in this directionm(t), communication channel allocation factor ρm(t) and a power distribution factor βm(t), generally expressed as
Reward functionDefining the detection reward and punishment of error behavior of all agents, and expressing the punishment as Wherein R ism(t) represents the communication data rate measured by the mth agent at time t;andrespectively representing punishment obtained when the mth unmanned aerial vehicle crosses the boundary, punishment obtained when the unmanned aerial vehicles collide with each other and punishment obtained when the radar cannot cover the ground;
Steps 1-6 include: configuring a strategy learning module and a strategy evaluation module for each unmanned aerial vehicle, wherein the strategy learning module is used for generating strategies, and the strategy evaluation module is used for evaluating the generated strategies;
the strategy learning module comprises an online strategy network pi of the mth unmanned aerial vehicleθm (o, a), historical policy networkAn optimizer and a loss function; o and a represent the set of states and actions of the drone, respectively;
the online strategy network is used for generating a random strategy, mapping the collected state and corresponding action of each agent into strategy distribution through a neural network, and adopting a Gaussian model as the strategy distribution;
the historical strategy network is used for reusing the historical experience collected by each agent so as to enhance the sampling efficiency of each agent, and the loss function of each agent is set to be the expected return J (theta) of each agentm) Is shown asWherein theta ismA parameter representing the policy network in the mth agent,the function of the expectation is represented by,representing a probability ratio between a current policy and a historical policy; function fCLFor mixing x (theta)m) Restricted to [ 1-e, 1+ e]Is shown as
E represents a limiting parameter;
the policy evaluation module evaluates the policies obtained by each agent by generating a merit function, expressed asWhereinRepresenting an evaluation network value function in the m-th agent, wherein omega represents a parameter of a corresponding evaluation network, and gamma represents a discount factor;representing the reward obtained by the mth drone at time t;
enhancing exploratory behavior of an agent in an environment by introducing a state entropy function, represented asWhereinRepresenting the entropy function of the online policy pi.
The step 2 comprises the following steps:
step 2-1, initializing model parameters: initializing parameters of different modules, including a parameter θ of an online policy networkmParameters of a history policy networkEvaluating a parameter omega of a networkmLearning rate beta of policy networkAEvaluation network learning rate betaIAnd a discount factor γ;
step 2-2, collecting samples:
obtaining an observation vector after each unmanned aerial vehicle observes the environmentIncluding the coordinates of each drone at the current moment and the movement information of each drone at the previous moment, expressed as
Step 2-3, inputting the observation vector into a deep neural network to obtain online strategy distribution, and then sampling from the online strategy distribution to obtain a corresponding action vector:
Adopting a Gaussian model as strategy distribution, and for the mth unmanned aerial vehicle, the online strategy distribution pi of the mth unmanned aerial vehicleθm (o, a) is represented by:
wherein o ismAnd amRespectively represent the m-thThe state observed and actions performed by the individual agent; μ and σ represent mean and standard deviation functions, respectively;
step 2-4, sampling and executing actions:
allocating power of P beta (t) for communication process of each unmanned aerial vehicle, allocating (1-beta (t)) P radar transmission power for radar process, and selecting the secondA channel in whichRepresenting an upper rounding function;
controlling each unmanned aerial vehicle to be in thetamFlight in the direction of (t) < i >m(t) distance;
step 2-5, detecting punishment action:
defining three punishment behaviors for each unmanned aerial vehicle, wherein the punishment behaviors comprise boundary crossing, mutual collision and incapability of covering the ground;
respectively represent the penalty obtained by the mth unmanned plane crossing the boundary, and are represented as:
wherein xi1Represents a penalty value;the penalty obtained by mutual collision between the mth unmanned aerial vehicle and the mth' unmanned aerial vehicle is represented as:
wherein xi2Represents a penalty value; dmm′(t) represents a distance between the mth drone and the mth' drone,DSdefining a safe distance between any two drones;
wherein xi3Represents a penalty value; h represents the farthest distance that can be detected;
calculating the final reward obtained by each unmanned aerial vehicle by counting the punishment obtained by each unmanned aerial vehicle
After the action of the current time slot is finished, each unmanned aerial vehicle observes and obtains the state when the next time slot starts
Checking whether the mth unmanned aerial vehicle has three punishment behaviors, if so, rolling back to the current state at the next moment
Step 2-6, generating the joint state information:
each unmanned aerial vehicle sends respective state information to the information fusion center, and the information fusion center integrates all observation informationSending the state information of the current moment to each unmanned aerial vehicle;representing a set of drones;
each unmanned aerial vehicle continuously repeats step 2-2 to step 2-6 until the jth batch is obtained, N in totalBAn observation information Bs,jStatus informationAction informationThe jth lot of awards is expressed as
And 2-7, updating the network parameters.
wherein L isA(θm)=J(θm)+fE(θm) A loss function representing the policy network,represents a gradient;
copying parameters in online policy network directly to historical policy networkπθRepresenting a policy obtained from an online network,representing historical policies of the agent;
using Bs,j,Br,jUpdating the parameter phi, using Bs,j,Updating parameters of an evaluation network βIIndicates the learning rate of the evaluation network, AI(ωm) The function of the merit is expressed as,represents the pair omegamA gradient of (a);
and (3) repeating the steps 2-1 to 2-7, and if all the targets are detected or one training round is finished, performing a new round of training until all the unmanned aerial vehicles finish all rounds of training.
Aiming at the problems of the existing unmanned aerial vehicle cluster cooperative target detection method, the method provided by the invention has the advantages that firstly, the radar communication is integrated, the communication function and the detection function share the radar frequency spectrum, the problem of communication frequency spectrum resource shortage is solved, meanwhile, the load of the unmanned aerial vehicle is reduced, the hardware cost is saved, and the weight of the unmanned aerial vehicle is reduced; secondly, aiming at the problem of radar communication resource interference and the problem of resource planning, the same detection signal waveform is designed to complete the communication and radar functions, and the unified planning of the radar communication resources is carried out based on the reinforcement learning intelligence, so that the adaptability of the dynamic complex scene is improved; and thirdly, when planning radar communication resources, the speed and the direction of each unmanned aerial vehicle in the unmanned aerial vehicle cluster are controlled in real time, the flight track of the unmanned aerial vehicle is controlled by designing a multi-agent strategy facing to incomplete information search, collision and flying out of a detection area between the unmanned aerial vehicles are avoided, and the adaptability of searching unknown environments is ensured. And fourthly, aiming at the problem that only part of targets are detected and the unknown targets at the long-distance edge are difficult to detect when a plurality of targets wait to be detected in the given environment, providing a geographic fairness index to measure the fairness of the detected targets, and ensuring that all the targets can be detected by maximizing the index.
The invention is different from the prior detection method based on vision, and the invention uses radar to detect the target, thereby solving the problem that the common vision detection is sensitive to the environmental condition. Meanwhile, the radar and communication integrated technology is used for assisting the detection process, so that the unmanned aerial vehicle can finish radar detection and communication functions only by carrying one device, and the flight parameters of the unmanned aerial vehicle are adjusted through deep reinforcement learning of multiple intelligent agents, and different resources are allocated for the radar and the communication functions to carry out efficient target detection.
Compared with the prior art, the invention has the remarkable advantages that: (1) dynamic environment detection under the integrated assistance of radar communication is considered, and the maneuverability and flexibility of the unmanned aerial vehicle are fully exerted; (2) the detection strategy is learned by using a deep learning technology, so that the method can be applied to large-scale complex detection tasks; (3) and multi-agent reinforcement learning is designed to drive cooperative detection among the unmanned aerial vehicles, so that a plurality of unmanned aerial vehicles can efficiently complete detection tasks.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
Fig. 1 is a radar communication integrated auxiliary unmanned aerial vehicle cooperative target detection flow chart.
Fig. 2 is a schematic diagram of a multi-unmanned-aerial-vehicle cooperative detection model with radar communication integrated assistance.
FIG. 3 is a conceptual diagram of the method of the present invention.
Detailed Description
As shown in fig. 1, 2 and 3, the invention provides a radar communication integrated unmanned aerial vehicle cooperative multi-target detection method, in the scheme, based on unmanned aerial vehicle trajectory control and resource control, reinforcement learning is assisted, a multi-unmanned aerial vehicle cooperative detection scene is shown in fig. 3, each unmanned aerial vehicle is provided with radar communication dual-function equipment to detect a target in a given area, and meanwhile, the unmanned aerial vehicle cooperative multi-target detection method keeps communication with an information fusion center. A multi-agent deep reinforcement learning algorithm is configured in a controller of each unmanned aerial vehicle, information observed by each agent in the environment is learned, corresponding actions are output at the same time, and the method structure is shown in fig. 2. The whole control system is shown as the attached figure 1 and comprises:
step 1: multi-agent collaborative process definition
The method firstly defines a multi-unmanned aerial vehicle cooperative detection process as a Markov decision process. The process uses a 5-tupleIs described, whereinRefers to the viewing space of each agent,refers to the joint state space of all agents,refers to the action space of the intelligent agent,refers to the reward function of the agent,refers to the transition probability of each agent.
Observation spaceContains 7 elements, which are respectively the current time coordinate (x) of the mth intelligent agentm(t),ym(t)), distance l moved at previous timem(t-1), direction θm(t-1), the last time is the channel rho allocated to the communication function of the unmanned aerial vehiclem(t-1), communication and radar power distribution factor beta of the last momentm(t-1), the communication data rate R obtained at the present timem(t)。
That is to say that the first and second electrodes,the observation of the mth agent at time t may be represented as
Movement spaceDefined as the mth agent moving direction theta in the current momentm(t) a distance l movable in this directionm(t), communication channel allocation factor ρm(t) and a power distribution factor βm(t) of (d). That is, the action of the mth agent at the tth time is represented as:
Reward functionDefining the detection reward and the punishment of the error behavior of all the agents, and the observation of the mth agent at the t moment is expressed as:
wherein R ism(t) represents the communication data rate measured by the mth agent at time t;andrespectively representing punishment obtained when the mth unmanned aerial vehicle crosses the boundary, the unmanned aerial vehicles collide with each other and the radar cannot cover the ground; g (t) shows that the geographic fairness is obtained at the current moment, and the calculation method comprises the following steps:
wherein, N represents the total number of detected targets. c. Cn(t) represents the number of times the nth object was detected by time t.
wherein, the first and the second end of the pipe are connected with each other,representing a set of drones.
Step 2: initializing model parameters
Initializing parameters of different modules, including a parameter θ of an online policy networkmParameters of a history policy networkParameter omega of evaluation network of distributed schememLearning rate beta of policy networkAEvaluation network learning rate betaIAnd a discount factor gamma. Here, the parameters used by both the policy network and the evaluation network are randomly initialized. The learning rates of the policy network and the learning network are important parameters affecting the learning effect, the convergence of the algorithm is easy to be very slow when an excessively small learning rate is set, and the convergence of the algorithm to a local optimal point is easy when an excessively large learning rate is set, so that the two parameters are debugged through multiple experiments. When the learning rate is adjusted, the discount factor can also be adjusted in a similar way, and a higher value, such as 0.99, is set, and is decreased by 0.01 or 0.02 each time until the algorithm converges to a larger total average reward.
After all the parameters are debugged, the online learning stage can be entered.
And 3, step 3: sample collection
First, each drone needs to collect sufficient samples for training of the policy network and the evaluation network.
Every m drones first need to determine the current position coordinates, i.e. xm(t),ym(t), this position can be obtained by a GPS positioning device carried on the drone.
In addition, every m unmanned aerial vehicles need to extract the distance l moved at the last moment from the memorym(t-1), topDirection of movement theta at a momentm(t-1), communication channel ρ allocated at the last timem(t-1) and the power distribution factor beta at the previous momentm(t-1) and the data rate R at the previous momentm(t-1). It is worth noting that when the sample is collected by the unmanned aerial vehicle at the time 0, the sample at the previous time is a random sample, and a value is generally taken from a random number generator of 0-1.
Therefore, in the sampling step, the observation information output by the mth drone is represented as:
and 4, step 4: an online policy distribution is generated. And inputting the observation vector into a deep neural network to obtain online strategy distribution, and then sampling from the strategy distribution to obtain a corresponding action vector.
The input to this step is the observation information collected in the previous step. Thus for the mth drone, the sequence of observations entered is
Then inputting the observation value sequence into a decision neural network to output corresponding strategy distribution, and adopting Gaussian distribution to fit the strategy distribution, wherein the strategy distribution is expressed as:
where μ and σ represent mean and standard deviation functions.
And 5: motion sampling and execution
First from the obtained strategic distribution πθm (o, a), namely the distance l that the mth unmanned aerial vehicle needs to move at the current momentm(t) direction of required deflection θm(t) channel rho distributed for communication of mth unmanned aerial vehicle and information fusion center at current momentm(t) and power allocation factor are collectively expressed as:
the mth drone then performs the work obtained.
First of all for its communication process allocationPower of, allocated to radar processes The radar transmit power.
Select the firstA channel in whichRepresenting an upper rounding function. K denotes the total number of optional channels.
The mth drone uses the allocated channel and power resources to perform radar sounding and communication procedures.
For the radar detection process, the input information is the power of the current momentThe output is the detection fairness g (t) of the N targets, and the specific process is as follows:
first, the detection range of the mth drone is estimated, expressed as:
wherein phi ism(t) represents the maximum detection range of the mth drone in the tth time slot. B denotes the communication channel bandwidth of the drone. GTxAnd GRxRespectively representing the gain of the transmission and the gain of the receiving antenna, λ representing the wavelength of the transmitted signal, σ representing the effective detection area, Γ representing the boltzmann constant, T0Representing thermodynamic temperature, F and gamma representing radar noise and detection loss, respectively, phiMinRepresenting a minimum signal-to-noise ratio of the unmanned aerial vehicle measurement; among these parameters, GTx、GRx、Γ、T0For fixed values, other parameters may be measured by radar signal processing equipment.
Only targets in the radar detection range can be detected by the drone, so the condition for detecting the nth agent for the mth agent is: phi is am(t)≥dmn(t) wherein dmn(t) represents the distance between the mth agent and the nth target at time t;
then, the mth drone uses the allocated communication powerAnd channelPerforming a communication with the information fusion center, sending the radar probe channel to the information fusion center, and measuring the data rate R during the communicationm(t)。
The information fusion center calculates the number of times each target is detected according to the detection information collected by all the unmanned aerial vehicles, and then calculates the detection scores epsilon of all the unmanned aerial vehicles at the current momentn(t):
Wherein, cn(t) represents the number of times the nth object was detected by time t;
then, the detection fairness g (t) is calculated:
wherein, N represents the total number of detected targets.
And then, sending the calculated detection fairness value to each unmanned aerial vehicle.
Finally, each drone is according to the assigned direction θmFlight in the direction of (t) < i >m(t) distance.
Step 6: penalty behavior detection
And setting penalty values for violation strategies according to the action obtained in the step 5, wherein the penalty values comprise boundary crossing, mutual collision and radar coverage loss. The significance of this step is that a negative reward is set for the non-compliance policy generated by each drone, so in order to maximize its own reward, the drone must learn the compliance policy step by step until the optimal policy is found.
First, if the mth drone crosses a given boundary, a boundary crossing penalty is set, denoted as:
wherein xi1Represents a penalty value; xMin,XMax,YMin,YMaxThe range of motion of the drone is limited.
Then, if the mth drone and the mth' drone collide with each other, a collision penalty is set, denoted as:
wherein xi2Represents a penalty value; dmm′(t) represents the distance between the mth drone and the mth' drone. DSA safe distance between any two drones is defined.
Then, if the mth drone can not cover the ground, the penalty is obtained as:
wherein xi3Represents a penalty value; h represents the flying height of the drone.
Xi therein1、Ξ2Xi and xi3The reward is set according to the unmanned plane, too small to set, and may be set to 0.1 times of the total reward, for example, the total reward is 100, and the penalty value may be set to 10.
Calculating the final reward obtained by each drone, i.e. by counting the penalties obtained by each drone
After the action at the current moment is finished, each unmanned aerial vehicle observes and obtains the state when the next time slot starts
Checking whether the mth unmanned aerial vehicle has three punishment behaviors of crossing the boundary, colliding or losing the radar load, and if the three punishment behaviors occur, rolling back the current state of the sword at the next moment
And 7: generating federated state information
The input to this step is the observation information for each droneAction informationAnd obtaining a rewardThe output is data of one batch.
Each unmanned aerial vehicle sends respective state information to the information fusion center, and the information fusion center integrates all observation messagesInformation processing deviceAnd sending the state information of the current moment to each unmanned aerial vehicle.
Continuously repeating the steps 2 to 7 for each unmanned aerial vehicle until the jth batch is obtained, and N is totalBThe observation information, state information, and action information are represented by Bs,j,The jth batch of awards is represented asNBThe larger the size of (c) is, the better the convergence effect is, because the larger the batch size means that more data is used for training, but not more than one (epamode) total training times, and can be adjusted by setting a larger value step by step initially.
And 8: network parameter update
This step is used to update the parameters of the policy network and the evaluation case, i.e. θmAnd ωm. The input is the batch data obtained in the step 7, and the output is the trained network parameters.
The parameter updating of the policy network is divided into updating of an online policy network and updating of a historical policy network.
The parameters of the historical policy network are first updated. The network is mainly used for storing parameters in the existing online network and does not participate in the training process, so that the parameters of the existing online network are directly copied to the historical strategy network, and the method is represented as follows:
representing historical strategies, primarily for reusing historical experiences collected by each agent to enhance each intelligenceSampling efficiency of the energy body.
wherein L isA(θm)=J(θm)+fE(θm) A loss function representing the policy network,representing a gradient.
J(θm) The penalty function representing the mth agent is set to the expected reward for each agent, expressed asWherein theta ismA parameter representing an online policy network in the mth agent,representing a probability ratio between a current policy and a historical policy; function fCLFor mixing x (theta)m) Restricted to [ 1-e, 1+ e]Is shown as
E represents a limiting parameter, and is generally 0.2;
represents a merit function for evaluating the strategy obtained by each agent, expressed as WhereinRepresenting the evaluation network value function in the mth agent.
fE(θm) Representing a state entropy function for enhancing the exploratory behavior of an agent in an environment, represented asHere, theRepresenting the entropy function of the online policy pi.
And (5) repeating the steps 1 to 8, and if all the targets are detected or one training round is finished, performing a new round of training until all the unmanned aerial vehicles finish all rounds of training.
Examples
The unmanned aerial vehicle detection method comprises the steps of firstly defining a detection range for unmanned aerial vehicle detection, enabling each unmanned aerial vehicle to obtain a current coordinate in real time through a GPS positioning device assembled for each unmanned aerial vehicle, and adjusting the learning behavior of the unmanned aerial vehicle through an algorithm when the coordinate exceeds the detection range at a certain moment, so that the unmanned aerial vehicle is prevented from crossing the boundary.
A collaborative process between the multiple drones is then defined using the markov model. The detectable range of the unmanned aerial vehicles is set to 2000M × 2000M, the number M of the unmanned aerial vehicles is set to 10, the number of objects to be detected is 100, the maximum number of time steps T taken to end detection from the start of detection is set to 200, and the duration of each step is 5 minutes. In addition, the farthest distance and the maximum angle of flight within one time step are set for each drone, where the farthest distance l is set to 20m meters, and the maximum angle θ is set to 360 degrees. Then, each drone first obtains environment information including coordinate information of the current time, a moving distance in the last time step, a moving direction in the last time step, a power allocation factor in the last time step, and a data rate in the last time step. Note that, in the 1 st time step, it is necessary to randomly take a value according to the approximate value range of each value, for example, the maximum flight distance is 20m, and here, the first flight distance may be 5 m. And then inputting the information into multi-agent reinforcement learning to learn the action of each unmanned aerial vehicle in the current time step, wherein the action comprises the distance that the unmanned aerial vehicle needs to fly in the current time step, the angle that the unmanned aerial vehicle needs to fly in the current time step, the distributed channels in the current time step and the power distribution factor.
Each drone then executes the learned action and updates the learning network. And when each unmanned aerial vehicle obtains the flight distance l, the flight angle theta, the channel allocation and the power allocation factor in the current time step through a learning algorithm. Firstly, each unmanned aerial vehicle detects whether a target exists around through radar communication integrated equipment, wherein the detection range is determined by power distributed for radar functions, then each unmanned aerial vehicle sends obtained radar detection information to a control center through a distributed channel, and the control center sends all information to each unmanned aerial vehicle after summarizing the information of all unmanned aerial vehicles. Each drone then uses this information to calculate the return obtained for this learning action, this return including the measured communication data rate, the fairness of all target detections, whether the drone has a collision and crosses the boundary, the radar cannot cover the ground, note that the inability to cover here is caused by allocating too little power to the radar. And then each unmanned aerial vehicle updates the respective learning network according to the calculated return information, and finally, each unmanned aerial vehicle flies l m at the flying angle theta. Through the process, each unmanned aerial vehicle continuously learns in the environment, and finally a stable strategy can be learned, wherein the strategy is the learned unmanned aerial vehicle cooperative multi-target detection method.
The invention provides a radar communication integrated unmanned aerial vehicle cooperative multi-target detection method, and a plurality of methods and ways for specifically implementing the technical scheme are provided, the above description is only a preferred embodiment of the invention, it should be noted that, for a person skilled in the art, on the premise of not departing from the principle of the invention, a plurality of improvements and embellishments can be made, and these improvements and embellishments should also be regarded as the protection scope of the invention. All the components not specified in the present embodiment can be realized by the prior art.
Claims (7)
1. The radar communication integrated unmanned aerial vehicle cooperative multi-target detection method is characterized by comprising the following steps of:
step 1, modeling an unmanned aerial vehicle cooperative multi-target detection problem;
and 2, designing a multi-agent cooperative detection scheme.
2. The method of claim 1, wherein step 1 comprises:
step 1-1, defining a problem;
step 1-2, designing flight path constraints of the unmanned aerial vehicle;
step 1-3, designing resource allocation under the integration of radar communication of the unmanned aerial vehicle;
step 1-4, measuring the performance of radar and communication of the unmanned aerial vehicle;
step 1-5, carrying out multi-unmanned aerial vehicle cooperative detection reinforcement learning modeling;
and 1-6, designing a strategy learning module and a strategy evaluation module.
3. The method of claim 2, wherein step 1-1 comprises: setting each unmanned aerial vehicle as an intelligent body, wherein all the intelligent bodies cooperate to complete the detection tasks of the areas, each unmanned aerial vehicle sends the information obtained by detection to the control center in real time through a communication link, the total detection time is T, and the data rate and the detection performance of the unmanned aerial vehicles and the control center are expected to be maximized by allocating radar and communication resources and the tracks of the unmanned aerial vehicles in the given areas within the detection time, wherein the detection performance is expressed by the detection fairness of all targets.
4. The method of claim 3, wherein steps 1-2 comprise: dividing the whole detection time into S time slots, wherein the duration of each time slot is tau;
at each flight interval, each drone can face θmFlight in the (t) direction of 0, 2 pim(t)∈[0,lMax]Distance of l whereinMaxRepresents the maximum distance a drone can fly during time τ, this distance being determined by the model of the drone; for a coordinate of [ x ]m(0),ym(0)]The departing agent, the movement within time t is represented as:
wherein lm(t) represents the actual moving distance of the mth unmanned plane in the tth time slot; thetam(t ') represents the flight direction of the mth drone during the t' th time slot;
set that the unmanned aerial vehicle can only be in [ X ]Min,XMax]×[YMin,YMax]Thus, there are:
XMin≤xm(t)≤XMax
YMin≤ym(t)≤YMax
wherein XMin,XMax,YMin,YMaxRespectively representing the movement minimum value of the unmanned aerial vehicle movement coordinate on an x axis, the movement maximum value on the x axis, the movement minimum value on a y axis and the movement maximum value on the y axis;
set for safe distance between the unmanned aerial vehicle, show as:
dmm′(t)≥DS
wherein d ismm′(t) denotes the mth drone to mth' drone in the tth slotDistance of the drone; dSRepresenting a safe distance between any two drones.
5. The method of claim 4, wherein steps 1-3 comprise: the resources allocated for each drone radar and communication process are transmit power and channel:
for a given total transmit power, P, a power allocation factor is used to allocate the corresponding power for the radar detection and communication functions,representing the communication power allocated to the mth drone at time t, indicating the radar transmission power, beta, allocated to the mth drone at time tm(t) represents the power allocation factor of the mth agent at time t;
for a total of K channels, pmk(t) denotes the selection of the kth channel at time t, pmkWhen (t) is 1, the mth agent selects the kth channel, ρmkWhen (t) is 0, the mth agent does not select the kth channel.
6. The method of claim 5, wherein steps 1-4 comprise:
according to the power distributed to the mth unmanned aerial vehicle at the moment tThe detection range of each agent is estimated using the following radar equation:
wherein the content of the first and second substances,b represents the drone communication channel bandwidth; phi is am(t) represents the farthest distance that the mth drone can probe in the tth time slot; gTxAnd GRxRespectively representing the gain of the transmission and the gain of the receiving antenna, λ representing the wavelength of the transmitted signal, σ representing the effective detection area, Γ representing the boltzmann constant, T0Representing thermodynamic temperature, F and y representing radar noise and detection loss, respectively, phiMinRepresents a minimum signal-to-noise ratio for drone detection;
the condition that the mth agent detects the nth agent is defined as: phi is am(t)≥dmn(t) in which dmn(t) represents the distance between the mth agent and the nth target at time t;
defining a probing score εn(t) is:
wherein, cn(t) represents the number of times the nth object was detected by time t;
the fairness g (t) that defines the target being probed is:
wherein, N represents the total number of detected targets.
7. The method of claim 6, wherein steps 1-5 comprise: using a 5-tupleTo describe a decision process whereinRefers to the viewing space of each agent,refers to the joint state space of all agents,refers to the action space of the intelligent agent,refers to the reward function of the agent,refers to the transition probability of each agent;
observation spaceDefined as the current time coordinate (x) of the mth agentm(t),ym(t)), distance l moved at previous timem(t-1), direction θm(t-1), the last time is the channel rho allocated to the communication function of the unmanned aerial vehiclem(t-1), communication and radar power distribution factor beta of the last momentm(t-1), communication data rate R obtained at the last timem(t-1), generally indicated as
Movement spaceThe motion space is defined as the mth agent moving direction theta in the current momentm(t) a distance l movable in this directionm(t), communication channel allocation factor ρm(t) and a power distribution factor βm(t), generally expressed as
Reward functionDefining the detection reward and punishment of error behavior of all agents, and expressing the punishment as Wherein R ism(t) represents the communication data rate measured by the mth agent at time t;andrespectively representing punishment obtained when the mth unmanned aerial vehicle crosses the boundary, punishment obtained when the unmanned aerial vehicles collide with each other and punishment obtained when the radar cannot cover the ground;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210336444.7A CN114679729B (en) | 2022-03-31 | Unmanned aerial vehicle cooperative multi-target detection method integrating radar communication |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210336444.7A CN114679729B (en) | 2022-03-31 | Unmanned aerial vehicle cooperative multi-target detection method integrating radar communication |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114679729A true CN114679729A (en) | 2022-06-28 |
CN114679729B CN114679729B (en) | 2024-04-30 |
Family
ID=
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115877868A (en) * | 2022-12-01 | 2023-03-31 | 南京航空航天大学 | Path planning method for unmanned aerial vehicle to resist malicious interference in data collection of Internet of things |
CN116482673A (en) * | 2023-04-27 | 2023-07-25 | 电子科技大学 | Distributed radar detection tracking integrated waveform implementation method based on reinforcement learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020230137A1 (en) * | 2019-05-16 | 2020-11-19 | B.G. Negev Technologies And Applications Ltd., At Ben-Gurion University | System and method for automated multi-objective policy implementation, using reinforcement learning |
CN113207128A (en) * | 2021-05-07 | 2021-08-03 | 东南大学 | Unmanned aerial vehicle cluster radar communication integrated resource allocation method under reinforcement learning |
CN114142908A (en) * | 2021-09-17 | 2022-03-04 | 北京航空航天大学 | Multi-unmanned aerial vehicle communication resource allocation method for coverage reconnaissance task |
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020230137A1 (en) * | 2019-05-16 | 2020-11-19 | B.G. Negev Technologies And Applications Ltd., At Ben-Gurion University | System and method for automated multi-objective policy implementation, using reinforcement learning |
CN113207128A (en) * | 2021-05-07 | 2021-08-03 | 东南大学 | Unmanned aerial vehicle cluster radar communication integrated resource allocation method under reinforcement learning |
CN114142908A (en) * | 2021-09-17 | 2022-03-04 | 北京航空航天大学 | Multi-unmanned aerial vehicle communication resource allocation method for coverage reconnaissance task |
Non-Patent Citations (3)
Title |
---|
M. SCHERHAUF等: "Radar distance measurement with Viterbi algorithm to resolve phase ambiguity", 《IEEE TRANS. MICROW. THEORY TECHN》, vol. 68, no. 9, 31 December 2020 (2020-12-31), pages 3784 - 3793, XP011807061, DOI: 10.1109/TMTT.2020.2985357 * |
揭东;汤新民;李博;顾俊伟;戴峥;张阳;刘岩;: "无人机冲突探测及解脱策略关键技术研究", 武汉理工大学学报(交通科学与工程版), no. 05, 15 October 2018 (2018-10-15) * |
王超;马驰;常俊杰;: "基于改进小波神经网络的协同作战能力评估", 指挥信息系统与技术, no. 01, 28 February 2020 (2020-02-28) * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115877868A (en) * | 2022-12-01 | 2023-03-31 | 南京航空航天大学 | Path planning method for unmanned aerial vehicle to resist malicious interference in data collection of Internet of things |
CN115877868B (en) * | 2022-12-01 | 2024-01-26 | 南京航空航天大学 | Path planning method for resisting malicious interference of unmanned aerial vehicle in data collection of Internet of things |
CN116482673A (en) * | 2023-04-27 | 2023-07-25 | 电子科技大学 | Distributed radar detection tracking integrated waveform implementation method based on reinforcement learning |
CN116482673B (en) * | 2023-04-27 | 2024-01-05 | 电子科技大学 | Distributed radar detection tracking integrated waveform implementation method based on reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108731684B (en) | Multi-unmanned aerial vehicle cooperative area monitoring airway planning method | |
Wu et al. | Distributed trajectory optimization for multiple solar-powered UAVs target tracking in urban environment by Adaptive Grasshopper Optimization Algorithm | |
Chen et al. | Coordination between unmanned aerial and ground vehicles: A taxonomy and optimization perspective | |
CN112180967B (en) | Multi-unmanned aerial vehicle cooperative countermeasure decision-making method based on evaluation-execution architecture | |
CN105892480A (en) | Self-organizing method for cooperative scouting and hitting task of heterogeneous multi-unmanned-aerial-vehicle system | |
CN113848974B (en) | Aircraft trajectory planning method and system based on deep reinforcement learning | |
Cao et al. | Hunting algorithm for multi-auv based on dynamic prediction of target trajectory in 3d underwater environment | |
Wei et al. | Recurrent MADDPG for object detection and assignment in combat tasks | |
Li et al. | Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm | |
Yan et al. | Flocking control of uav swarms with deep reinforcement leaming approach | |
CN115826601A (en) | Unmanned aerial vehicle path planning method based on reverse reinforcement learning | |
Sadhu et al. | Aerial-DeepSearch: Distributed multi-agent deep reinforcement learning for search missions | |
Salisbury et al. | Real-time opinion aggregation methods for crowd robotics | |
Cao et al. | Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory | |
Zijian et al. | Imaginary filtered hindsight experience replay for UAV tracking dynamic targets in large-scale unknown environments | |
Liu et al. | Rapid location technology of odor sources by multi‐UAV | |
Liang et al. | Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN–LSTM fusion network | |
Liu | A novel path planning method for aerial UAV based on improved genetic algorithm | |
CN114679729B (en) | Unmanned aerial vehicle cooperative multi-target detection method integrating radar communication | |
CN114679729A (en) | Radar communication integrated unmanned aerial vehicle cooperative multi-target detection method | |
Yang et al. | Learning graph-enhanced commander-executor for multi-agent navigation | |
CN116227622A (en) | Multi-agent landmark coverage method and system based on deep reinforcement learning | |
Zhang et al. | Situational continuity-based air combat autonomous maneuvering decision-making | |
CN114142908B (en) | Multi-unmanned aerial vehicle communication resource allocation method for coverage reconnaissance task | |
Shen et al. | Pigeon-inspired optimisation algorithm with hierarchical topology and receding horizon control for multi-UAV formation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |