CN114679729A - Radar communication integrated unmanned aerial vehicle cooperative multi-target detection method - Google Patents

Radar communication integrated unmanned aerial vehicle cooperative multi-target detection method Download PDF

Info

Publication number
CN114679729A
CN114679729A CN202210336444.7A CN202210336444A CN114679729A CN 114679729 A CN114679729 A CN 114679729A CN 202210336444 A CN202210336444 A CN 202210336444A CN 114679729 A CN114679729 A CN 114679729A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
detection
mth
radar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210336444.7A
Other languages
Chinese (zh)
Other versions
CN114679729B (en
Inventor
郑少秋
张涛
赵朔
冯建航
孔俊俊
张政伟
施生生
蒋飞
朱琨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN202210336444.7A priority Critical patent/CN114679729B/en
Priority claimed from CN202210336444.7A external-priority patent/CN114679729B/en
Publication of CN114679729A publication Critical patent/CN114679729A/en
Application granted granted Critical
Publication of CN114679729B publication Critical patent/CN114679729B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/30TPC using constraints in the total amount of available transmission power
    • H04W52/34TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading
    • H04W52/346TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading distributing total power among users or channels
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/02Systems using reflection of radio waves, e.g. primary radar systems; Analogous systems
    • G01S13/50Systems of measurement based on relative movement of target
    • G01S13/52Discriminating between fixed and moving objects or between objects moving at different speeds
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/02Resource partitioning among network components, e.g. reuse partitioning
    • H04W16/10Dynamic resource partitioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models

Abstract

The invention provides a radar communication integrated unmanned aerial vehicle cooperative multi-target detection method, which is characterized in that a plurality of unmanned aerial vehicles carry radar communication integrated equipment for cooperative detection, each unmanned aerial vehicle is set as an intelligent body, a stable detection strategy is trained, the trained strategy is used for controlling flight tracks of the unmanned aerial vehicles and resource allocation between radar and communication, and finally a given detection task is quickly completed. The method takes the radar, communication and unmanned aerial vehicle flight states observed by each intelligent agent as the input of a strategy generation module, uses a deep neural network to map the states and actions observed by each intelligent agent into a random strategy, uses a strategy evaluation module to evaluate the strategy of each intelligent agent, and obtains a better cooperative strategy through module training. According to the invention, the search of multiple targets in the designated area is realized by efficiently planning resources such as radar, communication and the like on multiple unmanned aerial vehicles, and the search and discovery efficiency of multiple targets is greatly improved.

Description

Radar communication integrated unmanned aerial vehicle cooperative multi-target detection method
Technical Field
The invention belongs to the field of radar communication integration and cluster cooperative detection, and particularly relates to a radar communication integration unmanned aerial vehicle cooperative multi-target detection method.
Background
The simultaneous detection only considers resource allocation in a static environment, the track design of the unmanned aerial vehicle is not considered in work, and the track design of the unmanned aerial vehicle is important to exert maneuverability and flexibility. For example, a static radar communication integrated UAV network utility optimization method based on power control is designed for Ficus, Liupeng and Mianyi; the invention provides an unmanned aerial vehicle cluster static radar communication integrated resource allocation method under reinforcement learning. 3) The unmanned aerial vehicle often needs time-varying channels and limited observation information when allocating radar communication resources in a dynamic environment, and the traditional optimization method is difficult to solve the problems. Such as feijoa, liupeng, and new gambling, use game theory to distribute the power of radar-communication integrated UAVs.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of the prior art, and provides a radar communication integrated unmanned aerial vehicle cooperative multi-target detection method, which comprises the following steps:
step 1, modeling an unmanned aerial vehicle cooperative multi-target detection problem;
and 2, designing a multi-agent cooperative detection scheme.
The step 1 comprises the following steps:
step 1-1, defining a problem;
step 1-2, designing flight path constraints of the unmanned aerial vehicle;
step 1-3, designing resource allocation under the integration of radar communication of the unmanned aerial vehicle;
step 1-4, measuring the performance of radar and communication of the unmanned aerial vehicle;
step 1-5, carrying out multi-unmanned aerial vehicle cooperative detection reinforcement learning modeling;
and 1-6, designing a strategy learning module and a strategy evaluation module.
The step 1-1 comprises the following steps: setting each unmanned aerial vehicle as an intelligent body, wherein all the intelligent bodies cooperate to complete the detection tasks of the areas, each unmanned aerial vehicle sends the information obtained by detection to the control center in real time through a communication link, the total detection time is T, and the data rate and the detection performance of the unmanned aerial vehicles and the control center are expected to be maximized by allocating radar and communication resources and the tracks of the unmanned aerial vehicles in the given areas within the detection time, wherein the detection performance is expressed by the detection fairness of all targets.
The step 1-2 comprises the following steps: dividing the whole detection time into S time slots, wherein the duration of each time slot is tau, each intelligent agent finishes detection and communication tasks in a small time period from the beginning of each time slot, and other time periods are used for flying; here, the time for communication and sounding is determined by the channel bandwidth allocated to both, and assuming that the channel bandwidth allocated to both is x hz, the execution time is 1/x. Typically this time is much less than τ.
At each flight interval, each drone can face θmFlight in the (t) direction of 0, 2 pim(t)∈[0,lMax]Distance of lMaxRepresents the maximum distance a drone can fly during time τ, this distance being determined by the model of the drone;for a coordinate of [ x ]m(0),ym(0)]The departing agent, the movement within time t is represented as:
Figure BDA0003574483560000021
wherein lm(t) represents the actual moving distance of the mth unmanned plane in the tth time slot; thetam(t ') represents the flight direction of the mth drone during the t' th time slot;
set that the unmanned aerial vehicle can only be in [ X ]Min,XMax]×[YMin,YMax]Thus, there are:
XMin≤xm(t)≤XMax
YMin≤ym(t)≤YMax
wherein, XMin,XMax,YMin,YMaxRespectively representing the movement minimum value of the unmanned aerial vehicle movement coordinate on an x axis, the movement maximum value on the x axis, the movement minimum value on a y axis and the movement maximum value on the y axis; the three-dimensional rectangular coordinate system with the origin of 0 is used, the X-y axis represents the ground, and the minimum value and the maximum value of the unmanned aerial vehicle capable of flying in the X-axis direction are XMin,XMaxIn the Y-axis direction, the minimum value and the maximum value that each unmanned aerial vehicle can fly are YMin,YMax. The positive half axis of the z-axis represents the flight height of the drone.
Set for safe distance between the unmanned aerial vehicle, show as:
dmm′(t)≥DS
wherein, dmm′(t) represents the distance of the mth drone to the mth' drone at the tth time slot; dSRepresenting the safe distance between any two drones.
The steps 1-3 comprise: the resources allocated for each drone radar and communication process are transmit power and channel:
for a given total transmitted power P, a power division factor is usedThe sounding and communication functions allocate the respective power,
Figure BDA0003574483560000031
representing the communication power allocated to the mth drone at time t,
Figure BDA0003574483560000032
Figure BDA0003574483560000033
indicating the radar transmission power, beta, allocated to the mth drone at time tm(t) represents the power allocation factor of the mth agent at time t;
for a total of K channels, ρmk(t) denotes the selection of the kth channel at time t, pmkWhen (t) is 1, the mth agent selects the kth channel, ρmkWhen (t) is 0, the mth agent does not select the kth channel.
The steps 1 to 4 comprise:
according to the power distributed to the mth unmanned aerial vehicle at the moment t
Figure BDA0003574483560000034
The detection range of each agent is estimated using the following radar equation:
Figure BDA0003574483560000035
wherein B represents the drone communication channel bandwidth; phi is am(t) represents the farthest distance that the mth drone can probe in the tth time slot; gTxAnd GRxRespectively representing the gain of the transmission and the gain of the receiving antenna, λ representing the wavelength of the transmitted signal, σ representing the effective detection area, Γ representing the boltzmann constant, T0Representing thermodynamic temperature, F and gamma representing radar noise and detection loss, respectively, phiMinRepresents a minimum signal-to-noise ratio for drone detection;
the condition that the mth agent detects the nth agent is defined as follows: phi is am(t)≥dmn(t) wherein dmn(t) represents tThe distance between the mth agent and the nth target at the moment;
defining a probe score εn(t) is:
Figure BDA0003574483560000036
wherein, cn(t) represents the number of times the nth object was detected by time t;
the fairness g (t) that defines the target being probed is:
Figure BDA0003574483560000037
wherein, N represents the total number of detected targets.
The steps 1 to 5 comprise: using a 5-tuple
Figure BDA0003574483560000038
To describe a decision process wherein
Figure BDA0003574483560000039
Refers to the viewing space of each agent,
Figure BDA00035744835600000310
refers to the joint state space of all agents,
Figure BDA00035744835600000311
refers to the action space of the intelligent agent,
Figure BDA00035744835600000312
refers to the reward function of the agent,
Figure BDA00035744835600000313
refers to the transition probability of each agent;
observation space
Figure BDA00035744835600000314
Is defined asCurrent time coordinate (x) of m agentsm(t),ym(t)), distance l moved at previous timem(t-1), direction θm(t-1), the last time is the channel rho allocated to the communication function of the unmanned aerial vehiclem(t-1), communication and radar power distribution factor beta of the last momentm(t-1), communication data rate R obtained at the last timem(t-1), indicated as a whole by
Figure BDA0003574483560000041
Movement space
Figure BDA0003574483560000042
The motion space is defined as the mth agent moving direction theta in the current momentm(t) a distance l movable in this directionm(t), communication channel allocation factor ρm(t) and a power distribution factor βm(t), generally expressed as
Figure BDA0003574483560000043
Reward function
Figure BDA0003574483560000044
Defining the detection reward and punishment of error behavior of all agents, and expressing the punishment as
Figure BDA0003574483560000045
Figure BDA0003574483560000046
Wherein R ism(t) represents the communication data rate measured by the mth agent at time t;
Figure BDA0003574483560000047
and
Figure BDA0003574483560000048
respectively representing punishment obtained when the mth unmanned aerial vehicle crosses the boundary, punishment obtained when the unmanned aerial vehicles collide with each other and punishment obtained when the radar cannot cover the ground;
State space
Figure BDA0003574483560000049
Containing observation information of all agents, denoted as
Figure BDA00035744835600000410
Transition probability
Figure BDA00035744835600000411
Is shown as
Figure BDA00035744835600000412
Wherein
Figure BDA00035744835600000419
Representing the joint action of all agents.
Steps 1-6 include: configuring a strategy learning module and a strategy evaluation module for each unmanned aerial vehicle, wherein the strategy learning module is used for generating strategies, and the strategy evaluation module is used for evaluating the generated strategies;
the strategy learning module comprises an online strategy network pi of the mth unmanned aerial vehicleθm (o, a), historical policy network
Figure BDA00035744835600000414
An optimizer and a loss function; o and a represent the set of states and actions of the drone, respectively;
the online strategy network is used for generating a random strategy, mapping the collected state and corresponding action of each agent into strategy distribution through a neural network, and adopting a Gaussian model as the strategy distribution;
the historical strategy network is used for reusing the historical experience collected by each agent so as to enhance the sampling efficiency of each agent, and the loss function of each agent is set to be the expected return J (theta) of each agentm) Is shown as
Figure BDA00035744835600000415
Wherein theta ismA parameter representing the policy network in the mth agent,
Figure BDA00035744835600000416
the function of the expectation is represented by,
Figure BDA00035744835600000417
representing a probability ratio between a current policy and a historical policy; function fCLFor mixing x (theta)m) Restricted to [ 1-e, 1+ e]Is shown as
Figure BDA00035744835600000418
E represents a limiting parameter;
Figure BDA0003574483560000051
represents a merit function;
the policy evaluation module evaluates the policies obtained by each agent by generating a merit function, expressed as
Figure BDA0003574483560000052
Wherein
Figure BDA0003574483560000053
Representing an evaluation network value function in the m-th agent, wherein omega represents a parameter of a corresponding evaluation network, and gamma represents a discount factor;
Figure BDA0003574483560000054
representing the reward obtained by the mth drone at time t;
enhancing exploratory behavior of an agent in an environment by introducing a state entropy function, represented as
Figure BDA0003574483560000055
Wherein
Figure BDA0003574483560000056
Representing the entropy function of the online policy pi.
The step 2 comprises the following steps:
step 2-1, initializing model parameters: initializing parameters of different modules, including a parameter θ of an online policy networkmParameters of a history policy network
Figure BDA0003574483560000057
Evaluating a parameter omega of a networkmLearning rate beta of policy networkAEvaluation network learning rate betaIAnd a discount factor γ;
step 2-2, collecting samples:
obtaining an observation vector after each unmanned aerial vehicle observes the environment
Figure BDA0003574483560000058
Including the coordinates of each drone at the current moment and the movement information of each drone at the previous moment, expressed as
Figure BDA0003574483560000059
Figure BDA00035744835600000510
Step 2-3, inputting the observation vector into a deep neural network to obtain online strategy distribution, and then sampling from the online strategy distribution to obtain a corresponding action vector:
the motion vector obtained by sampling is generally expressed as
Figure BDA00035744835600000511
Adopting a Gaussian model as strategy distribution, and for the mth unmanned aerial vehicle, the online strategy distribution pi of the mth unmanned aerial vehicleθm (o, a) is represented by:
Figure BDA00035744835600000512
wherein o ismAnd amRespectively represent the m-thThe state observed and actions performed by the individual agent; μ and σ represent mean and standard deviation functions, respectively;
step 2-4, sampling and executing actions:
allocating power of P beta (t) for communication process of each unmanned aerial vehicle, allocating (1-beta (t)) P radar transmission power for radar process, and selecting the second
Figure BDA00035744835600000513
A channel in which
Figure BDA00035744835600000514
Representing an upper rounding function;
controlling each unmanned aerial vehicle to be in thetamFlight in the direction of (t) < i >m(t) distance;
step 2-5, detecting punishment action:
defining three punishment behaviors for each unmanned aerial vehicle, wherein the punishment behaviors comprise boundary crossing, mutual collision and incapability of covering the ground;
Figure BDA0003574483560000061
respectively represent the penalty obtained by the mth unmanned plane crossing the boundary, and are represented as:
Figure BDA0003574483560000062
wherein xi1Represents a penalty value;
Figure BDA0003574483560000063
the penalty obtained by mutual collision between the mth unmanned aerial vehicle and the mth' unmanned aerial vehicle is represented as:
Figure BDA0003574483560000064
wherein xi2Represents a penalty value; dmm′(t) represents a distance between the mth drone and the mth' drone,DSdefining a safe distance between any two drones;
Figure BDA0003574483560000065
the penalty that the mth unmanned plane cannot cover the ground is represented as:
Figure BDA0003574483560000066
wherein xi3Represents a penalty value; h represents the farthest distance that can be detected;
calculating the final reward obtained by each unmanned aerial vehicle by counting the punishment obtained by each unmanned aerial vehicle
Figure BDA0003574483560000067
After the action of the current time slot is finished, each unmanned aerial vehicle observes and obtains the state when the next time slot starts
Figure BDA0003574483560000068
Checking whether the mth unmanned aerial vehicle has three punishment behaviors, if so, rolling back to the current state at the next moment
Figure BDA0003574483560000069
Step 2-6, generating the joint state information:
each unmanned aerial vehicle sends respective state information to the information fusion center, and the information fusion center integrates all observation information
Figure BDA00035744835600000610
Sending the state information of the current moment to each unmanned aerial vehicle;
Figure BDA00035744835600000611
representing a set of drones;
each unmanned aerial vehicle continuously repeats step 2-2 to step 2-6 until the jth batch is obtained, N in totalBAn observation information Bs,jStatus information
Figure BDA00035744835600000612
Action information
Figure BDA00035744835600000613
The jth lot of awards is expressed as
Figure BDA00035744835600000614
And 2-7, updating the network parameters.
The steps 2-7 comprise: use of
Figure BDA00035744835600000615
Updating a parameter θ of a policy generating networkmExpressed as:
Figure BDA00035744835600000616
wherein L isAm)=J(θm)+fEm) A loss function representing the policy network,
Figure BDA0003574483560000071
represents a gradient;
copying parameters in online policy network directly to historical policy network
Figure BDA0003574483560000072
πθRepresenting a policy obtained from an online network,
Figure BDA0003574483560000073
representing historical policies of the agent;
using Bs,j,Br,jUpdating the parameter phi, using Bs,j
Figure BDA0003574483560000074
Updating parameters of an evaluation network
Figure BDA0003574483560000075
Figure BDA0003574483560000076
βIIndicates the learning rate of the evaluation network, AIm) The function of the merit is expressed as,
Figure BDA0003574483560000077
represents the pair omegamA gradient of (a);
and (3) repeating the steps 2-1 to 2-7, and if all the targets are detected or one training round is finished, performing a new round of training until all the unmanned aerial vehicles finish all rounds of training.
Aiming at the problems of the existing unmanned aerial vehicle cluster cooperative target detection method, the method provided by the invention has the advantages that firstly, the radar communication is integrated, the communication function and the detection function share the radar frequency spectrum, the problem of communication frequency spectrum resource shortage is solved, meanwhile, the load of the unmanned aerial vehicle is reduced, the hardware cost is saved, and the weight of the unmanned aerial vehicle is reduced; secondly, aiming at the problem of radar communication resource interference and the problem of resource planning, the same detection signal waveform is designed to complete the communication and radar functions, and the unified planning of the radar communication resources is carried out based on the reinforcement learning intelligence, so that the adaptability of the dynamic complex scene is improved; and thirdly, when planning radar communication resources, the speed and the direction of each unmanned aerial vehicle in the unmanned aerial vehicle cluster are controlled in real time, the flight track of the unmanned aerial vehicle is controlled by designing a multi-agent strategy facing to incomplete information search, collision and flying out of a detection area between the unmanned aerial vehicles are avoided, and the adaptability of searching unknown environments is ensured. And fourthly, aiming at the problem that only part of targets are detected and the unknown targets at the long-distance edge are difficult to detect when a plurality of targets wait to be detected in the given environment, providing a geographic fairness index to measure the fairness of the detected targets, and ensuring that all the targets can be detected by maximizing the index.
The invention is different from the prior detection method based on vision, and the invention uses radar to detect the target, thereby solving the problem that the common vision detection is sensitive to the environmental condition. Meanwhile, the radar and communication integrated technology is used for assisting the detection process, so that the unmanned aerial vehicle can finish radar detection and communication functions only by carrying one device, and the flight parameters of the unmanned aerial vehicle are adjusted through deep reinforcement learning of multiple intelligent agents, and different resources are allocated for the radar and the communication functions to carry out efficient target detection.
Compared with the prior art, the invention has the remarkable advantages that: (1) dynamic environment detection under the integrated assistance of radar communication is considered, and the maneuverability and flexibility of the unmanned aerial vehicle are fully exerted; (2) the detection strategy is learned by using a deep learning technology, so that the method can be applied to large-scale complex detection tasks; (3) and multi-agent reinforcement learning is designed to drive cooperative detection among the unmanned aerial vehicles, so that a plurality of unmanned aerial vehicles can efficiently complete detection tasks.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
Fig. 1 is a radar communication integrated auxiliary unmanned aerial vehicle cooperative target detection flow chart.
Fig. 2 is a schematic diagram of a multi-unmanned-aerial-vehicle cooperative detection model with radar communication integrated assistance.
FIG. 3 is a conceptual diagram of the method of the present invention.
Detailed Description
As shown in fig. 1, 2 and 3, the invention provides a radar communication integrated unmanned aerial vehicle cooperative multi-target detection method, in the scheme, based on unmanned aerial vehicle trajectory control and resource control, reinforcement learning is assisted, a multi-unmanned aerial vehicle cooperative detection scene is shown in fig. 3, each unmanned aerial vehicle is provided with radar communication dual-function equipment to detect a target in a given area, and meanwhile, the unmanned aerial vehicle cooperative multi-target detection method keeps communication with an information fusion center. A multi-agent deep reinforcement learning algorithm is configured in a controller of each unmanned aerial vehicle, information observed by each agent in the environment is learned, corresponding actions are output at the same time, and the method structure is shown in fig. 2. The whole control system is shown as the attached figure 1 and comprises:
step 1: multi-agent collaborative process definition
The method firstly defines a multi-unmanned aerial vehicle cooperative detection process as a Markov decision process. The process uses a 5-tuple
Figure BDA0003574483560000081
Is described, wherein
Figure BDA0003574483560000082
Refers to the viewing space of each agent,
Figure BDA0003574483560000083
refers to the joint state space of all agents,
Figure BDA0003574483560000084
refers to the action space of the intelligent agent,
Figure BDA0003574483560000085
refers to the reward function of the agent,
Figure BDA0003574483560000086
refers to the transition probability of each agent.
(1) Observation space
Figure BDA0003574483560000087
Observation space
Figure BDA0003574483560000088
Contains 7 elements, which are respectively the current time coordinate (x) of the mth intelligent agentm(t),ym(t)), distance l moved at previous timem(t-1), direction θm(t-1), the last time is the channel rho allocated to the communication function of the unmanned aerial vehiclem(t-1), communication and radar power distribution factor beta of the last momentm(t-1), the communication data rate R obtained at the present timem(t)。
That is to say that the first and second electrodes,the observation of the mth agent at time t may be represented as
Figure BDA0003574483560000089
Figure BDA00035744835600000810
(2) Space of action
Figure BDA00035744835600000811
Movement space
Figure BDA00035744835600000812
Defined as the mth agent moving direction theta in the current momentm(t) a distance l movable in this directionm(t), communication channel allocation factor ρm(t) and a power distribution factor βm(t) of (d). That is, the action of the mth agent at the tth time is represented as:
Figure BDA0003574483560000091
(3) reward function
Figure BDA0003574483560000092
Reward function
Figure BDA0003574483560000093
Defining the detection reward and the punishment of the error behavior of all the agents, and the observation of the mth agent at the t moment is expressed as:
Figure BDA0003574483560000094
wherein R ism(t) represents the communication data rate measured by the mth agent at time t;
Figure BDA0003574483560000095
and
Figure BDA0003574483560000096
respectively representing punishment obtained when the mth unmanned aerial vehicle crosses the boundary, the unmanned aerial vehicles collide with each other and the radar cannot cover the ground; g (t) shows that the geographic fairness is obtained at the current moment, and the calculation method comprises the following steps:
Figure BDA0003574483560000097
wherein, N represents the total number of detected targets. c. Cn(t) represents the number of times the nth object was detected by time t.
(4) State space
Figure BDA0003574483560000098
State space
Figure BDA0003574483560000099
Contains the observed information for all agents, expressed as:
Figure BDA00035744835600000910
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00035744835600000911
representing a set of drones.
(5) Transition probability
Figure BDA00035744835600000912
Transition probability
Figure BDA00035744835600000913
Expressed as:
Figure BDA00035744835600000914
wherein the content of the first and second substances,
Figure BDA00035744835600000915
representing the joint action of all agents.
Step 2: initializing model parameters
Initializing parameters of different modules, including a parameter θ of an online policy networkmParameters of a history policy network
Figure BDA00035744835600000916
Parameter omega of evaluation network of distributed schememLearning rate beta of policy networkAEvaluation network learning rate betaIAnd a discount factor gamma. Here, the parameters used by both the policy network and the evaluation network are randomly initialized. The learning rates of the policy network and the learning network are important parameters affecting the learning effect, the convergence of the algorithm is easy to be very slow when an excessively small learning rate is set, and the convergence of the algorithm to a local optimal point is easy when an excessively large learning rate is set, so that the two parameters are debugged through multiple experiments. When the learning rate is adjusted, the discount factor can also be adjusted in a similar way, and a higher value, such as 0.99, is set, and is decreased by 0.01 or 0.02 each time until the algorithm converges to a larger total average reward.
After all the parameters are debugged, the online learning stage can be entered.
And 3, step 3: sample collection
First, each drone needs to collect sufficient samples for training of the policy network and the evaluation network.
Every m drones first need to determine the current position coordinates, i.e. xm(t),ym(t), this position can be obtained by a GPS positioning device carried on the drone.
In addition, every m unmanned aerial vehicles need to extract the distance l moved at the last moment from the memorym(t-1), topDirection of movement theta at a momentm(t-1), communication channel ρ allocated at the last timem(t-1) and the power distribution factor beta at the previous momentm(t-1) and the data rate R at the previous momentm(t-1). It is worth noting that when the sample is collected by the unmanned aerial vehicle at the time 0, the sample at the previous time is a random sample, and a value is generally taken from a random number generator of 0-1.
Therefore, in the sampling step, the observation information output by the mth drone is represented as:
Figure BDA0003574483560000101
and 4, step 4: an online policy distribution is generated. And inputting the observation vector into a deep neural network to obtain online strategy distribution, and then sampling from the strategy distribution to obtain a corresponding action vector.
The input to this step is the observation information collected in the previous step. Thus for the mth drone, the sequence of observations entered is
Figure BDA0003574483560000102
Then inputting the observation value sequence into a decision neural network to output corresponding strategy distribution, and adopting Gaussian distribution to fit the strategy distribution, wherein the strategy distribution is expressed as:
Figure BDA0003574483560000103
where μ and σ represent mean and standard deviation functions.
And 5: motion sampling and execution
First from the obtained strategic distribution πθm (o, a), namely the distance l that the mth unmanned aerial vehicle needs to move at the current momentm(t) direction of required deflection θm(t) channel rho distributed for communication of mth unmanned aerial vehicle and information fusion center at current momentm(t) and power allocation factor are collectively expressed as:
Figure BDA0003574483560000111
the mth drone then performs the work obtained.
First of all for its communication process allocation
Figure BDA0003574483560000112
Power of, allocated to radar processes
Figure BDA0003574483560000113
Figure BDA0003574483560000114
The radar transmit power.
Select the first
Figure BDA0003574483560000115
A channel in which
Figure BDA0003574483560000116
Representing an upper rounding function. K denotes the total number of optional channels.
The mth drone uses the allocated channel and power resources to perform radar sounding and communication procedures.
For the radar detection process, the input information is the power of the current moment
Figure BDA0003574483560000117
The output is the detection fairness g (t) of the N targets, and the specific process is as follows:
first, the detection range of the mth drone is estimated, expressed as:
Figure BDA0003574483560000118
wherein phi ism(t) represents the maximum detection range of the mth drone in the tth time slot. B denotes the communication channel bandwidth of the drone. GTxAnd GRxRespectively representing the gain of the transmission and the gain of the receiving antenna, λ representing the wavelength of the transmitted signal, σ representing the effective detection area, Γ representing the boltzmann constant, T0Representing thermodynamic temperature, F and gamma representing radar noise and detection loss, respectively, phiMinRepresenting a minimum signal-to-noise ratio of the unmanned aerial vehicle measurement; among these parameters, GTx、GRx、Γ、T0For fixed values, other parameters may be measured by radar signal processing equipment.
Only targets in the radar detection range can be detected by the drone, so the condition for detecting the nth agent for the mth agent is: phi is am(t)≥dmn(t) wherein dmn(t) represents the distance between the mth agent and the nth target at time t;
then, the mth drone uses the allocated communication power
Figure BDA0003574483560000119
And channel
Figure BDA00035744835600001110
Performing a communication with the information fusion center, sending the radar probe channel to the information fusion center, and measuring the data rate R during the communicationm(t)。
The information fusion center calculates the number of times each target is detected according to the detection information collected by all the unmanned aerial vehicles, and then calculates the detection scores epsilon of all the unmanned aerial vehicles at the current momentn(t):
Figure BDA00035744835600001111
Wherein, cn(t) represents the number of times the nth object was detected by time t;
then, the detection fairness g (t) is calculated:
Figure BDA0003574483560000121
wherein, N represents the total number of detected targets.
And then, sending the calculated detection fairness value to each unmanned aerial vehicle.
Finally, each drone is according to the assigned direction θmFlight in the direction of (t) < i >m(t) distance.
Step 6: penalty behavior detection
And setting penalty values for violation strategies according to the action obtained in the step 5, wherein the penalty values comprise boundary crossing, mutual collision and radar coverage loss. The significance of this step is that a negative reward is set for the non-compliance policy generated by each drone, so in order to maximize its own reward, the drone must learn the compliance policy step by step until the optimal policy is found.
First, if the mth drone crosses a given boundary, a boundary crossing penalty is set, denoted as:
Figure BDA0003574483560000122
wherein xi1Represents a penalty value; xMin,XMax,YMin,YMaxThe range of motion of the drone is limited.
Then, if the mth drone and the mth' drone collide with each other, a collision penalty is set, denoted as:
Figure BDA0003574483560000123
wherein xi2Represents a penalty value; dmm′(t) represents the distance between the mth drone and the mth' drone. DSA safe distance between any two drones is defined.
Then, if the mth drone can not cover the ground, the penalty is obtained as:
Figure BDA0003574483560000124
wherein xi3Represents a penalty value; h represents the flying height of the drone.
Xi therein1、Ξ2Xi and xi3The reward is set according to the unmanned plane, too small to set, and may be set to 0.1 times of the total reward, for example, the total reward is 100, and the penalty value may be set to 10.
Calculating the final reward obtained by each drone, i.e. by counting the penalties obtained by each drone
Figure BDA0003574483560000125
After the action at the current moment is finished, each unmanned aerial vehicle observes and obtains the state when the next time slot starts
Figure BDA0003574483560000126
Checking whether the mth unmanned aerial vehicle has three punishment behaviors of crossing the boundary, colliding or losing the radar load, and if the three punishment behaviors occur, rolling back the current state of the sword at the next moment
Figure BDA0003574483560000127
And 7: generating federated state information
The input to this step is the observation information for each drone
Figure BDA0003574483560000131
Action information
Figure BDA0003574483560000132
And obtaining a reward
Figure BDA0003574483560000133
The output is data of one batch.
Each unmanned aerial vehicle sends respective state information to the information fusion center, and the information fusion center integrates all observation messagesInformation processing device
Figure BDA0003574483560000134
And sending the state information of the current moment to each unmanned aerial vehicle.
Continuously repeating the steps 2 to 7 for each unmanned aerial vehicle until the jth batch is obtained, and N is totalBThe observation information, state information, and action information are represented by Bs,j
Figure BDA0003574483560000135
The jth batch of awards is represented as
Figure BDA0003574483560000136
NBThe larger the size of (c) is, the better the convergence effect is, because the larger the batch size means that more data is used for training, but not more than one (epamode) total training times, and can be adjusted by setting a larger value step by step initially.
And 8: network parameter update
This step is used to update the parameters of the policy network and the evaluation case, i.e. θmAnd ωm. The input is the batch data obtained in the step 7, and the output is the trained network parameters.
The parameter updating of the policy network is divided into updating of an online policy network and updating of a historical policy network.
The parameters of the historical policy network are first updated. The network is mainly used for storing parameters in the existing online network and does not participate in the training process, so that the parameters of the existing online network are directly copied to the historical strategy network, and the method is represented as follows:
Figure BDA0003574483560000137
Figure BDA0003574483560000138
representing historical strategies, primarily for reusing historical experiences collected by each agent to enhance each intelligenceSampling efficiency of the energy body.
Then use
Figure BDA0003574483560000139
Updating parameter θ of policy generating networkmExpressed as:
Figure BDA00035744835600001310
wherein L isAm)=J(θm)+fEm) A loss function representing the policy network,
Figure BDA00035744835600001311
representing a gradient.
J(θm) The penalty function representing the mth agent is set to the expected reward for each agent, expressed as
Figure BDA00035744835600001312
Wherein theta ismA parameter representing an online policy network in the mth agent,
Figure BDA00035744835600001313
representing a probability ratio between a current policy and a historical policy; function fCLFor mixing x (theta)m) Restricted to [ 1-e, 1+ e]Is shown as
Figure BDA0003574483560000141
E represents a limiting parameter, and is generally 0.2;
Figure BDA0003574483560000142
represents a merit function for evaluating the strategy obtained by each agent, expressed as
Figure BDA0003574483560000143
Figure BDA0003574483560000144
Wherein
Figure BDA0003574483560000149
Representing the evaluation network value function in the mth agent.
fEm) Representing a state entropy function for enhancing the exploratory behavior of an agent in an environment, represented as
Figure BDA0003574483560000145
Here, the
Figure BDA0003574483560000146
Representing the entropy function of the online policy pi.
Finally using Bs,j
Figure BDA0003574483560000147
Updating a parameter omega of a policy evaluation networkmIs provided with
Figure BDA0003574483560000148
And (5) repeating the steps 1 to 8, and if all the targets are detected or one training round is finished, performing a new round of training until all the unmanned aerial vehicles finish all rounds of training.
Examples
The unmanned aerial vehicle detection method comprises the steps of firstly defining a detection range for unmanned aerial vehicle detection, enabling each unmanned aerial vehicle to obtain a current coordinate in real time through a GPS positioning device assembled for each unmanned aerial vehicle, and adjusting the learning behavior of the unmanned aerial vehicle through an algorithm when the coordinate exceeds the detection range at a certain moment, so that the unmanned aerial vehicle is prevented from crossing the boundary.
A collaborative process between the multiple drones is then defined using the markov model. The detectable range of the unmanned aerial vehicles is set to 2000M × 2000M, the number M of the unmanned aerial vehicles is set to 10, the number of objects to be detected is 100, the maximum number of time steps T taken to end detection from the start of detection is set to 200, and the duration of each step is 5 minutes. In addition, the farthest distance and the maximum angle of flight within one time step are set for each drone, where the farthest distance l is set to 20m meters, and the maximum angle θ is set to 360 degrees. Then, each drone first obtains environment information including coordinate information of the current time, a moving distance in the last time step, a moving direction in the last time step, a power allocation factor in the last time step, and a data rate in the last time step. Note that, in the 1 st time step, it is necessary to randomly take a value according to the approximate value range of each value, for example, the maximum flight distance is 20m, and here, the first flight distance may be 5 m. And then inputting the information into multi-agent reinforcement learning to learn the action of each unmanned aerial vehicle in the current time step, wherein the action comprises the distance that the unmanned aerial vehicle needs to fly in the current time step, the angle that the unmanned aerial vehicle needs to fly in the current time step, the distributed channels in the current time step and the power distribution factor.
Each drone then executes the learned action and updates the learning network. And when each unmanned aerial vehicle obtains the flight distance l, the flight angle theta, the channel allocation and the power allocation factor in the current time step through a learning algorithm. Firstly, each unmanned aerial vehicle detects whether a target exists around through radar communication integrated equipment, wherein the detection range is determined by power distributed for radar functions, then each unmanned aerial vehicle sends obtained radar detection information to a control center through a distributed channel, and the control center sends all information to each unmanned aerial vehicle after summarizing the information of all unmanned aerial vehicles. Each drone then uses this information to calculate the return obtained for this learning action, this return including the measured communication data rate, the fairness of all target detections, whether the drone has a collision and crosses the boundary, the radar cannot cover the ground, note that the inability to cover here is caused by allocating too little power to the radar. And then each unmanned aerial vehicle updates the respective learning network according to the calculated return information, and finally, each unmanned aerial vehicle flies l m at the flying angle theta. Through the process, each unmanned aerial vehicle continuously learns in the environment, and finally a stable strategy can be learned, wherein the strategy is the learned unmanned aerial vehicle cooperative multi-target detection method.
The invention provides a radar communication integrated unmanned aerial vehicle cooperative multi-target detection method, and a plurality of methods and ways for specifically implementing the technical scheme are provided, the above description is only a preferred embodiment of the invention, it should be noted that, for a person skilled in the art, on the premise of not departing from the principle of the invention, a plurality of improvements and embellishments can be made, and these improvements and embellishments should also be regarded as the protection scope of the invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims (7)

1. The radar communication integrated unmanned aerial vehicle cooperative multi-target detection method is characterized by comprising the following steps of:
step 1, modeling an unmanned aerial vehicle cooperative multi-target detection problem;
and 2, designing a multi-agent cooperative detection scheme.
2. The method of claim 1, wherein step 1 comprises:
step 1-1, defining a problem;
step 1-2, designing flight path constraints of the unmanned aerial vehicle;
step 1-3, designing resource allocation under the integration of radar communication of the unmanned aerial vehicle;
step 1-4, measuring the performance of radar and communication of the unmanned aerial vehicle;
step 1-5, carrying out multi-unmanned aerial vehicle cooperative detection reinforcement learning modeling;
and 1-6, designing a strategy learning module and a strategy evaluation module.
3. The method of claim 2, wherein step 1-1 comprises: setting each unmanned aerial vehicle as an intelligent body, wherein all the intelligent bodies cooperate to complete the detection tasks of the areas, each unmanned aerial vehicle sends the information obtained by detection to the control center in real time through a communication link, the total detection time is T, and the data rate and the detection performance of the unmanned aerial vehicles and the control center are expected to be maximized by allocating radar and communication resources and the tracks of the unmanned aerial vehicles in the given areas within the detection time, wherein the detection performance is expressed by the detection fairness of all targets.
4. The method of claim 3, wherein steps 1-2 comprise: dividing the whole detection time into S time slots, wherein the duration of each time slot is tau;
at each flight interval, each drone can face θmFlight in the (t) direction of 0, 2 pim(t)∈[0,lMax]Distance of l whereinMaxRepresents the maximum distance a drone can fly during time τ, this distance being determined by the model of the drone; for a coordinate of [ x ]m(0),ym(0)]The departing agent, the movement within time t is represented as:
Figure FDA0003574483550000011
wherein lm(t) represents the actual moving distance of the mth unmanned plane in the tth time slot; thetam(t ') represents the flight direction of the mth drone during the t' th time slot;
set that the unmanned aerial vehicle can only be in [ X ]Min,XMax]×[YMin,YMax]Thus, there are:
XMin≤xm(t)≤XMax
YMin≤ym(t)≤YMax
wherein XMin,XMax,YMin,YMaxRespectively representing the movement minimum value of the unmanned aerial vehicle movement coordinate on an x axis, the movement maximum value on the x axis, the movement minimum value on a y axis and the movement maximum value on the y axis;
set for safe distance between the unmanned aerial vehicle, show as:
dmm′(t)≥DS
wherein d ismm′(t) denotes the mth drone to mth' drone in the tth slotDistance of the drone; dSRepresenting a safe distance between any two drones.
5. The method of claim 4, wherein steps 1-3 comprise: the resources allocated for each drone radar and communication process are transmit power and channel:
for a given total transmit power, P, a power allocation factor is used to allocate the corresponding power for the radar detection and communication functions,
Figure FDA0003574483550000021
representing the communication power allocated to the mth drone at time t,
Figure FDA0003574483550000022
Figure FDA0003574483550000023
indicating the radar transmission power, beta, allocated to the mth drone at time tm(t) represents the power allocation factor of the mth agent at time t;
for a total of K channels, pmk(t) denotes the selection of the kth channel at time t, pmkWhen (t) is 1, the mth agent selects the kth channel, ρmkWhen (t) is 0, the mth agent does not select the kth channel.
6. The method of claim 5, wherein steps 1-4 comprise:
according to the power distributed to the mth unmanned aerial vehicle at the moment t
Figure FDA0003574483550000024
The detection range of each agent is estimated using the following radar equation:
Figure FDA0003574483550000025
wherein the content of the first and second substances,b represents the drone communication channel bandwidth; phi is am(t) represents the farthest distance that the mth drone can probe in the tth time slot; gTxAnd GRxRespectively representing the gain of the transmission and the gain of the receiving antenna, λ representing the wavelength of the transmitted signal, σ representing the effective detection area, Γ representing the boltzmann constant, T0Representing thermodynamic temperature, F and y representing radar noise and detection loss, respectively, phiMinRepresents a minimum signal-to-noise ratio for drone detection;
the condition that the mth agent detects the nth agent is defined as: phi is am(t)≥dmn(t) in which dmn(t) represents the distance between the mth agent and the nth target at time t;
defining a probing score εn(t) is:
Figure FDA0003574483550000026
wherein, cn(t) represents the number of times the nth object was detected by time t;
the fairness g (t) that defines the target being probed is:
Figure FDA0003574483550000031
wherein, N represents the total number of detected targets.
7. The method of claim 6, wherein steps 1-5 comprise: using a 5-tuple
Figure FDA0003574483550000032
To describe a decision process wherein
Figure FDA0003574483550000033
Refers to the viewing space of each agent,
Figure FDA0003574483550000034
refers to the joint state space of all agents,
Figure FDA0003574483550000035
refers to the action space of the intelligent agent,
Figure FDA0003574483550000036
refers to the reward function of the agent,
Figure FDA0003574483550000037
refers to the transition probability of each agent;
observation space
Figure FDA0003574483550000038
Defined as the current time coordinate (x) of the mth agentm(t),ym(t)), distance l moved at previous timem(t-1), direction θm(t-1), the last time is the channel rho allocated to the communication function of the unmanned aerial vehiclem(t-1), communication and radar power distribution factor beta of the last momentm(t-1), communication data rate R obtained at the last timem(t-1), generally indicated as
Figure FDA0003574483550000039
Movement space
Figure FDA00035744835500000310
The motion space is defined as the mth agent moving direction theta in the current momentm(t) a distance l movable in this directionm(t), communication channel allocation factor ρm(t) and a power distribution factor βm(t), generally expressed as
Figure FDA00035744835500000311
Reward function
Figure FDA00035744835500000312
Defining the detection reward and punishment of error behavior of all agents, and expressing the punishment as
Figure FDA00035744835500000313
Figure FDA00035744835500000314
Wherein R ism(t) represents the communication data rate measured by the mth agent at time t;
Figure FDA00035744835500000315
and
Figure FDA00035744835500000316
respectively representing punishment obtained when the mth unmanned aerial vehicle crosses the boundary, punishment obtained when the unmanned aerial vehicles collide with each other and punishment obtained when the radar cannot cover the ground;
state space
Figure FDA00035744835500000317
Containing observation information of all agents, denoted as
Figure FDA00035744835500000318
Transition probability
Figure FDA00035744835500000319
Is shown as
Figure FDA00035744835500000320
Wherein
Figure FDA00035744835500000321
Representing the joint action of all agents.
CN202210336444.7A 2022-03-31 Unmanned aerial vehicle cooperative multi-target detection method integrating radar communication Active CN114679729B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210336444.7A CN114679729B (en) 2022-03-31 Unmanned aerial vehicle cooperative multi-target detection method integrating radar communication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210336444.7A CN114679729B (en) 2022-03-31 Unmanned aerial vehicle cooperative multi-target detection method integrating radar communication

Publications (2)

Publication Number Publication Date
CN114679729A true CN114679729A (en) 2022-06-28
CN114679729B CN114679729B (en) 2024-04-30

Family

ID=

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115877868A (en) * 2022-12-01 2023-03-31 南京航空航天大学 Path planning method for unmanned aerial vehicle to resist malicious interference in data collection of Internet of things
CN116482673A (en) * 2023-04-27 2023-07-25 电子科技大学 Distributed radar detection tracking integrated waveform implementation method based on reinforcement learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020230137A1 (en) * 2019-05-16 2020-11-19 B.G. Negev Technologies And Applications Ltd., At Ben-Gurion University System and method for automated multi-objective policy implementation, using reinforcement learning
CN113207128A (en) * 2021-05-07 2021-08-03 东南大学 Unmanned aerial vehicle cluster radar communication integrated resource allocation method under reinforcement learning
CN114142908A (en) * 2021-09-17 2022-03-04 北京航空航天大学 Multi-unmanned aerial vehicle communication resource allocation method for coverage reconnaissance task

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020230137A1 (en) * 2019-05-16 2020-11-19 B.G. Negev Technologies And Applications Ltd., At Ben-Gurion University System and method for automated multi-objective policy implementation, using reinforcement learning
CN113207128A (en) * 2021-05-07 2021-08-03 东南大学 Unmanned aerial vehicle cluster radar communication integrated resource allocation method under reinforcement learning
CN114142908A (en) * 2021-09-17 2022-03-04 北京航空航天大学 Multi-unmanned aerial vehicle communication resource allocation method for coverage reconnaissance task

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
M. SCHERHAUF等: "Radar distance measurement with Viterbi algorithm to resolve phase ambiguity", 《IEEE TRANS. MICROW. THEORY TECHN》, vol. 68, no. 9, 31 December 2020 (2020-12-31), pages 3784 - 3793, XP011807061, DOI: 10.1109/TMTT.2020.2985357 *
揭东;汤新民;李博;顾俊伟;戴峥;张阳;刘岩;: "无人机冲突探测及解脱策略关键技术研究", 武汉理工大学学报(交通科学与工程版), no. 05, 15 October 2018 (2018-10-15) *
王超;马驰;常俊杰;: "基于改进小波神经网络的协同作战能力评估", 指挥信息系统与技术, no. 01, 28 February 2020 (2020-02-28) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115877868A (en) * 2022-12-01 2023-03-31 南京航空航天大学 Path planning method for unmanned aerial vehicle to resist malicious interference in data collection of Internet of things
CN115877868B (en) * 2022-12-01 2024-01-26 南京航空航天大学 Path planning method for resisting malicious interference of unmanned aerial vehicle in data collection of Internet of things
CN116482673A (en) * 2023-04-27 2023-07-25 电子科技大学 Distributed radar detection tracking integrated waveform implementation method based on reinforcement learning
CN116482673B (en) * 2023-04-27 2024-01-05 电子科技大学 Distributed radar detection tracking integrated waveform implementation method based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN108731684B (en) Multi-unmanned aerial vehicle cooperative area monitoring airway planning method
Wu et al. Distributed trajectory optimization for multiple solar-powered UAVs target tracking in urban environment by Adaptive Grasshopper Optimization Algorithm
Chen et al. Coordination between unmanned aerial and ground vehicles: A taxonomy and optimization perspective
CN112180967B (en) Multi-unmanned aerial vehicle cooperative countermeasure decision-making method based on evaluation-execution architecture
CN105892480A (en) Self-organizing method for cooperative scouting and hitting task of heterogeneous multi-unmanned-aerial-vehicle system
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
Cao et al. Hunting algorithm for multi-auv based on dynamic prediction of target trajectory in 3d underwater environment
Wei et al. Recurrent MADDPG for object detection and assignment in combat tasks
Li et al. Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm
Yan et al. Flocking control of uav swarms with deep reinforcement leaming approach
CN115826601A (en) Unmanned aerial vehicle path planning method based on reverse reinforcement learning
Sadhu et al. Aerial-DeepSearch: Distributed multi-agent deep reinforcement learning for search missions
Salisbury et al. Real-time opinion aggregation methods for crowd robotics
Cao et al. Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory
Zijian et al. Imaginary filtered hindsight experience replay for UAV tracking dynamic targets in large-scale unknown environments
Liu et al. Rapid location technology of odor sources by multi‐UAV
Liang et al. Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN–LSTM fusion network
Liu A novel path planning method for aerial UAV based on improved genetic algorithm
CN114679729B (en) Unmanned aerial vehicle cooperative multi-target detection method integrating radar communication
CN114679729A (en) Radar communication integrated unmanned aerial vehicle cooperative multi-target detection method
Yang et al. Learning graph-enhanced commander-executor for multi-agent navigation
CN116227622A (en) Multi-agent landmark coverage method and system based on deep reinforcement learning
Zhang et al. Situational continuity-based air combat autonomous maneuvering decision-making
CN114142908B (en) Multi-unmanned aerial vehicle communication resource allocation method for coverage reconnaissance task
Shen et al. Pigeon-inspired optimisation algorithm with hierarchical topology and receding horizon control for multi-UAV formation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant