CN115951711A - Unmanned cluster multi-target searching and catching method in high sea condition environment - Google Patents

Unmanned cluster multi-target searching and catching method in high sea condition environment Download PDF

Info

Publication number
CN115951711A
CN115951711A CN202310080412.XA CN202310080412A CN115951711A CN 115951711 A CN115951711 A CN 115951711A CN 202310080412 A CN202310080412 A CN 202310080412A CN 115951711 A CN115951711 A CN 115951711A
Authority
CN
China
Prior art keywords
unmanned
target
pursuit
formula
aerial vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310080412.XA
Other languages
Chinese (zh)
Inventor
李斌
彭思聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202310080412.XA priority Critical patent/CN115951711A/en
Publication of CN115951711A publication Critical patent/CN115951711A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses a multi-target searching and catching method for an unmanned cluster in a high sea condition environment. The unmanned aerial vehicle device is used as a cluster eye to be responsible for a target searching task, the communication unmanned ship device is used as a cluster brain to be responsible for cluster control, data processing and target distribution, meanwhile, the target pursuit unmanned ship device is pursued to execute a target pursuit task, and the cluster searching and pursuit tasks are completed through mutual cooperation of different unmanned aerial vehicles. In addition, on the basis of the cooperative execution of the cluster, the invention considers the problem of maneuvering decision of unmanned clusters in high sea condition environment. On one hand, when the traditional communication technology is limited and the positioning technology is interfered in a high sea condition environment, the unmanned equipment can accurately position the target; on the other hand, in order to strengthen the information interaction of each unmanned device in the cluster, the pursuit task of each unmanned device is ensured to be completed, and meanwhile, the global benefit is maximized.

Description

Unmanned cluster multi-target searching and catching method in high sea condition environment
Technical Field
The invention belongs to the technical field of unmanned cluster collaborative searching and pursuing, and particularly relates to an unmanned cluster multi-target searching and pursuing method in a high sea condition environment.
Background
In recent years, along with the rapid development of unmanned equipment, unmanned systems play an important role in future civilians and wars. However, in the face of complex environments, it is increasingly difficult for a single unmanned platform to achieve efficient processing of tasks. The heterogeneous unmanned system cooperation technology becomes an effective means for improving the unmanned cluster intelligence and realizing the efficient task processing. Different kinds of agents carry out work division cooperation according to the characteristics of the agents, and the task processing efficiency can be effectively improved.
The unmanned ship has the sudden and violent development in the aspects of offshore sea surface search, water area exploration and the like. However, in a high sea state environment, the unmanned ship is difficult to accurately acquire the surrounding sea area environment and target information in a bumpy state, and the unmanned plane can utilize the advantage of flying in the air to ensure the search of a complex and variable environment. However, the unmanned aerial vehicle also has the disadvantages of poor flight endurance and small load. Therefore, two unmanned devices can be combined, the unmanned device is used as a cluster eye to be responsible for a target search task, the unmanned boat device is used as a cluster brain to be responsible for cluster control, data processing and target distribution, and meanwhile, the unmanned boat executes a target pursuit task. At present, the existing collaborative search and pursuit technology has little research on cross-domain collaboration of a heterogeneous unmanned system, most collaborative tasks only aim at unmanned aerial vehicle clusters or unmanned ship clusters, and in a high sea state environment, the traditional positioning technology is difficult to ensure that accurate positioning of cluster equipment to a target is realized in a complex and variable sea area. In addition, most of the existing research on cooperative search or pursuit of heterogeneous unmanned systems adopts a centralized algorithm, namely, a central server is used for distributing tasks for all members in a cluster, and the mode is not favorable for the robustness and high adaptability to the environment of the unmanned cluster.
Disclosure of Invention
In order to solve the technical problems mentioned in the background art, the invention provides an unmanned cluster multi-target searching and tracking method in a high sea state environment. The unmanned aerial vehicle device is used as a cluster eye to be responsible for a target searching task, the communication unmanned ship device is used as a cluster brain to be responsible for cluster control, data processing and target distribution, meanwhile, the pursuit unmanned ship device executes a target pursuit task, and the searching and pursuit tasks of the cluster are completed through mutual cooperation of different unmanned aerial vehicles. In addition, on the basis of the cooperative execution of the cluster, the invention considers the problem of maneuvering decision of unmanned clusters in high sea condition environment. On one hand, when the traditional communication technology is limited and the positioning technology is interfered in a high sea condition environment, the unmanned equipment can accurately position the target; on the other hand, in order to enhance the information interaction of each unmanned device in the cluster, ensure that each unmanned device completes the pursuit task and simultaneously realize the maximization of the global benefit, the unmanned cluster training method based on the distributed multi-agent reinforcement learning is provided.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
an unmanned cluster multi-target searching and catching method under a high sea condition environment comprises the following steps:
s1, discretizing a sea area to be searched by using a grid form, and modeling a search environment by using a grid method;
s2, regarding the unmanned aerial vehicles as particles moving on an aerial two-dimensional plane, optimizing a search path for each unmanned aerial vehicle based on a collaborative coverage search algorithm of an environmental stimulus function, optimizing a next optimal track point of the unmanned aerial vehicle according to state information of the unmanned aerial vehicle and the environmental stimulus function, updating a motion state of the unmanned aerial vehicle and moving to a corresponding position according to the optimal track point, searching grids in a range of the unmanned aerial vehicle in each time step, and sending perception information to the unmanned communication boat;
s3, tracking the target after the unmanned aerial vehicle searches the target, measuring and calculating the relative distance, the relative distance change rate and the relative speed between the unmanned aerial vehicle and the target in each time step, calculating the relative positioning estimation of the unmanned aerial vehicle and the target, and further calculating the relative positioning estimation of the pursuit unmanned ship and the target according to the positioning estimation of the unmanned aerial vehicle and the target;
s4, the unmanned aerial vehicle records target state information and transmits the target state information to the communication unmanned ship, a target assignment matrix is constructed, and unmanned ship target task allocation is carried out according to the current target state information and the existing unmanned ship pursuit state information;
s5, establishing a pursuit unmanned ship pursuit decision model and a decision learning model on the basis of the allocation of a target task to the pursuit unmanned ship; and after the pursuit unmanned boat is completed, the pursuit system is executed to pursue the pursuit task.
Preferably, the whole environment is regarded as a planar rectangular area in step S1, and the areaThe domain is divided into L x ×L y Discrete Grid, grid (x,y) With the x-th row and y-th column grids representing rectangles, Δ x and Δ y represent the length and width of the unit grid, respectively, the entire search environment E can be formulated as follows:
E={Grid (x,y) |m=1,2,…,L x ,n=1,2,…,L y }
at time t, grid (x,y) The state of (c) is represented as:
s (x,y) (t)=[μ (x,y)(x,y)(x,y) (t),c (x,y) ]
in the formula, mu (x,y) Represents Grid (x,y) Coordinate of the center point, ζ (x,y) E {0,1} represents Grid (x,y) Whether or not a target is present, wherein (x,y) =1 denotes Grid (x,y) Zeta internal existence of search target (x,y) =0 means Grid (x,y) Absence of target therein, η (x,y) (t) is an element of {0,1,2, \8230;, h } represents Grid (x,y) Number of searches until time t, c (x,y) For the search of the stimulus function, grid is represented (x,y) The attraction degree to the unmanned aerial vehicle.
Preferably, in the step S2, based on the collaborative coverage search algorithm of the environmental stimulation function, the specific steps of optimizing the search path for each unmanned aerial vehicle are as follows:
s21, initializing the position and the state of the unmanned aerial vehicle, wherein the state information formula of the unmanned aerial vehicle i at the moment t is expressed as follows:
s i (t)=[λ i (t),o i (t)]
in the formula of lambda i (t)=(x i (t),y i (t)) position coordinates, o, of drone i in environment E at time t i (t) represents the heading angle of drone i at time t;
s22, calculating a stimulation function c of each grid (x,y) In the cluster search process, the stimulus function c (x,y) The following calculation method is adopted for updating:
Figure BDA0004067256240000031
in the formula, c (x,y) (0) Is Grid (x,y) Initial stimulation value, alpha epsilon (0, 1) is attenuation coefficient, when Grid (x,y) The more times of being searched, the smaller the search stimulus value;
drone i will select the grid with the largest search stimulus value as the next search point within the adjacent grid, the formula is as follows:
Figure BDA0004067256240000032
after the unmanned aerial vehicle i searches for the target, the unmanned aerial vehicle records and calculates a target state, and sends the target state to the communication unmanned ship in the same communication group, wherein the target state is expressed by a formula:
s target,j (t)=[λ j (t),v j (t),θ j,i (t)]
in the formula, λ j (t)=(x j (t),y j (t)) represents the position coordinates of the target j in the environment E at time t, v j (t) represents the target j speed, θ j,i (t) represents the deviation angle of target j with respect to drone i.
Preferably, in step S3, after the unmanned aerial vehicle searches for the target, the unmanned aerial vehicle measures the speed by using ultra-wideband distance measurement and visual odometer, and measures the relative distance between the unmanned aerial vehicle and the unmanned surface vehicle at each time step in real time
Figure BDA0004067256240000033
And relative speed>
Figure BDA0004067256240000034
According to the measurement data, given the t time step, the relative positioning estimation formula between the unmanned aerial vehicle i and the target j is expressed as follows:
Figure BDA0004067256240000041
in the formula (I), the compound is shown in the specification,
Figure BDA0004067256240000042
represents the rate of change of relative distance, ε t 、/>
Figure BDA0004067256240000043
And &>
Figure BDA0004067256240000044
Respectively are relative speed>
Figure BDA0004067256240000045
Relative distance->
Figure BDA0004067256240000046
And the rate of change of relative distance->
Figure BDA0004067256240000047
The measurement errors at the t-th time step respectively; t is sampling period of the ultra-wideband sensor, and gamma belongs to R + Is a tunable constant gain;
obtaining the relative distance between the pursuit unmanned ship k and the unmanned aerial vehicle i according to the state information of the pursuit unmanned ship and the unmanned aerial vehicle given in the cluster
Figure BDA0004067256240000048
Relative speed>
Figure BDA0004067256240000049
And the rate of change of relative distance->
Figure BDA00040672562400000410
And then calculating the relative distance between each unmanned boat for pursuit and the target>
Figure BDA00040672562400000411
Relative speed->
Figure BDA00040672562400000412
And the rate of change of relative distance->
Figure BDA00040672562400000413
Calculating a relative positioning estimation formula of the pursuit unmanned ship k and the target j under the same time step length, wherein the relative positioning estimation formula is expressed as follows:
Figure BDA00040672562400000414
preferably, in step S4, a total of l unmanned boats for pursuit are pursued on p targets, where l ≧ p, and a target allocation matrix a = [ a ] =isset ij ]When a is ij =1, the target j is assigned to the pursuit unmanned boat i when a ij =0, this means that the target j is not assigned to the pursuit unmanned ship i, and in the target assignment at least one pursuit unmanned ship should be assigned to each target, that is to say
Figure BDA00040672562400000415
Furthermore, all pursuit unmanned boats should eventually be subjected to a pursuit task, i.e. </or>
Figure BDA00040672562400000416
Establishing a target distribution model for a distribution target by minimizing the initial relative distance between the unmanned ship and the target, wherein the target distribution model is expressed as follows:
Figure BDA00040672562400000417
Figure BDA00040672562400000418
Figure BDA00040672562400000419
a ij ∈{0,1}
in the formula (I), the compound is shown in the specification,
Figure BDA00040672562400000420
and the initial relative distance between the unmanned boat and the target is shown, the matching degree of each unmanned boat with the target is calculated by each unmanned boat, and the unmanned boat with the highest matching degree is subjected to the pursuit task.
Preferably, a target pursuit model for pursuing the unmanned ship is established in step S5, and the model is represented by tuples as follows:
Figure BDA0004067256240000051
where S represents the current chasing state space, which can be shared by all devices in the cluster, and A i Represents the action space of the pursuit unmanned boat i, T: sxA l → S represents the deterministic transfer function of the environment, R i :
Figure BDA0004067256240000052
Representing a reward function for pursuing the unmanned ship;
the global reward value of the pursuit unmanned ship formation is defined as the average value of the reward values of all the pursuit unmanned ships, and the formula is expressed as follows:
Figure BDA0004067256240000053
in the formula, r t (s, a) represents the reward value obtained by chasing the unmanned boat formation at the time t under the state s;
the maximum strategy formulation is expressed as follows:
Figure BDA0004067256240000054
in which s' ≡ s t+1 Represents the state at time t + 1;
setting the reward value of each pursuing unmanned boat, wherein the formula is expressed as follows:
r i (s,a)=r cap +r help +r step
in the formula, r cap Represents a capturing reward obtained when the distance between the pursuit unmanned ship and the target is less than the pursuit distance, i.e., the pursuit unmanned ship pursues the target, r help Indicating that when a plurality of unmanned boats for pursuing are available for pursuing the same target, the target is rewarded with assistance after being caught, r step Expressed as a step size reward, r step The formula is expressed as follows:
r step =ω 1 r 12 r 2
ω 12 =1
in the formula, r 1 Reward for pursuing distance r 2 Awarding for collisions;
pursuit distance reward r 1 The formula is expressed as follows:
Figure BDA0004067256240000055
in the formula (I), the compound is shown in the specification,
Figure BDA0004067256240000056
for remaining catch distance, k 1 Is awarded r 1 Adjusting the coefficient;
collision reward r 2 The formula is expressed as follows:
Figure BDA0004067256240000057
in the formula (d) min =mind i,j Denotes the minimum travel distance between unmanned boats, r 2 ∈(-1,0],k 2 Is awarded as r 2 Adjusting the coefficient;
preferably, a multi-unmanned-boat pursuit maneuvering decision model is established in the step S5, an Actor-Critic structure is adopted, an Actor network and a Critic network of each pursuit unmanned boat are connected through a bidirectional recurrent neural network, a hidden layer in the Actor network and the Critic network of a single pursuit unmanned boat decision model is used as recurrent units of the bidirectional recurrent neural network, and the pursuit unmanned boats are expanded according to the number of the pursuit unmanned boats; wherein the content of the first and second substances,
the formula of the individual target function of the unmanned boat is expressed as follows:
Figure BDA0004067256240000061
in the formula (I), the compound is shown in the specification,
Figure BDA0004067256240000062
indicating that action a is taken under the state transition function T θ The resulting status profile, <' >>
Figure BDA0004067256240000063
As desired;
the formula of the target function of the pursuit unmanned ship formation is expressed as follows:
Figure BDA0004067256240000064
the gradient formula of the policy network parameter θ is expressed as follows:
Figure BDA0004067256240000065
using parameterized critical functions Q ξ (s, a) to estimate the state-action function of the above equation
Figure BDA0004067256240000066
And training Critic using a sum of squares penalty function, Q ξ The gradient formula of (s, a) is expressed as follows: />
Figure BDA0004067256240000067
In the formula, xi is a Q network parameter;
and optimizing the Actor network and the Critic network by adopting a random gradient descent method, and updating network parameters through data obtained by trial and error in the interactive learning process to finish the optimization of collaborative search pursuit.
Preferably, the training and learning process of the multi-unmanned-boat cooperative target pursuit decision model comprises the following steps:
s51, initializing online network parameters of Actor and Critic, and allocating the online network parameters to corresponding target network parameters, namely theta '← theta and xi' ← xi, wherein theta 'and xi' are the target parameters of Actor and Critic respectively, and initializing empirical playback space
Figure BDA0004067256240000068
Storing the data obtained in the exploration;
s52, determining an initial state of training, and setting an initial position state and a speed state of a pursuing unmanned ship formation and a target;
s53, repeating the multi-set training according to the initial state, and simulating to execute the following operations:
each chasing unmanned boat is based on state s t And a random process epsilon generates an action
Figure BDA0004067256240000069
And executing;
after all actions have been performed, the state transitions to s t+1 Calculating the prize value
Figure BDA0004067256240000071
And will pass on the process variable
Figure BDA0004067256240000072
Store to empirical playback space >>
Figure BDA0004067256240000073
During learning, a batch of M pieces of experience data are randomly extracted
Figure BDA0004067256240000074
To calculate the target Q value of each pursuit unmanned boat, the formula is expressed as follows:
Figure BDA0004067256240000075
a gradient estimate for Critic is calculated, formulated as follows:
Figure BDA0004067256240000076
and updating online network parameters of Actor and Critic according to the obtained gradient estimation delta xi and delta theta, and then updating target network parameters, wherein a formula is expressed as follows:
Figure BDA0004067256240000077
wherein k ∈ (0, 1).
Adopt the beneficial effect that above-mentioned technical scheme brought:
(1) The invention uses the rasterization method to model the search environment of the unmanned cluster, thereby facilitating the description of the environment information and reducing the calculated amount;
(2) The invention designs a collaborative coverage search algorithm based on an environmental stimulus function, integrates the position of a target which possibly appears and the current state of the unmanned aerial vehicle, and optimizes the search path of the unmanned aerial vehicle;
(3) The method adopts a relative positioning method with persistent reward to measure the relative position and relative speed between the unmanned equipment and the target in real time, does not depend on external infrastructure, ensures accurate positioning in a rejection environment, can cope with the interference of a traditional positioning system in a high sea condition environment, and ensures accurate positioning of the unmanned aerial vehicle and the unmanned boat on the target;
(4) The unmanned cluster collaborative communication model is established, the unmanned cluster is divided into a plurality of communication groups, the unmanned cluster collaborative search target is realized on the sea surface lacking communication infrastructure and resources, the collision of the unmanned aerial vehicle or the unmanned ship is avoided, and the unmanned aerial vehicle searching group can quickly transmit the self state information, the environment information and the target state information to the base station deployed on the unmanned ship in the target searching process;
(5) In the unmanned cluster collaborative searching and pursuing process, the unmanned ship target tasks are distributed while the unmanned aerial vehicles in the cluster are searched, the task harmony between the target searching and the target pursuing is realized, and a target task distribution method is designed;
(6) According to the unmanned ship cooperative pursuit method, unmanned ship individual learning behaviors are organized into unmanned ship cluster group cooperation through a coordination mechanism, a distributed multi-agent reinforcement learning method based on equipment communication is designed, the pursuit task of each unmanned ship is guaranteed to be completed, the overall benefit maximization of cluster pursuit is realized, the high efficiency of unmanned cluster cooperative pursuit decision is realized under the complex and changeable high sea condition environment, and the stability and reliability of the unmanned cluster on target pursuit are guaranteed;
drawings
FIG. 1 is a model of the unmanned cluster based collaborative search and pursuit system of the present invention, the system including an unmanned cluster, a communication unmanned boat group, a pursuit unmanned boat group and an object to be searched;
FIG. 2 is a flow chart of the present invention;
fig. 3 is an unmanned boat pursuit maneuver model based on a bidirectional recurrent neural network.
Detailed Description
The technical scheme of the invention is explained in detail in the following with the accompanying drawings.
Fig. 1 shows a system model based on unmanned cluster collaborative search and pursuit, which includes an unmanned cluster, a communication unmanned ship group, a pursuit unmanned ship group and a target to be searched. The embodiment provides a method for searching and tracking multiple targets of an unmanned cluster in a high-sea-condition-oriented environment, a specific flow is shown in fig. 2, and the specific implementation method of the method for searching and tracking multiple targets of the unmanned cluster in the high-sea-condition-oriented environment is as follows:
1. the search environment is modeled using a grid method. The whole environment is regarded as a planar rectangular area, and the area is divided into L x ×L y A discrete grid. Grid (x,y) By the x-th row and y-th column grid representing a rectangle, Δ x and Δ y represent the length and width of the unit grid, respectivelyAnd (4) degree. The entire search environment E may be represented by a grid set as:
E={Grid (x,y) |m=1,2,…,L x ,n=1,2,…,L y }
at time t, grid (x,y) The state of (a) is represented as:
s (x,y) (t)=[μ (x,y)(x,y)(x,y) (t),c (x,y) ]
wherein mu (x,y) Represents Grid (x,y) Coordinate of the center point, ζ (x,y) E {0,1} represents Grid (x,y) Whether or not a target is present, wherein (x,y) =1 denotes Grid (x,y) ζ is the search target (x,y) =0 means Grid (x,y) Absence of target therein, η (x,y) (t) ∈ {0,1,2, \8230;, h } denotes Grid (x,y) Number of searches until time t, c (x,y) For the search of the stimulus function, grid is represented (x,y) The attraction degree to the unmanned aerial vehicle.
2. Initializing the number of devices in the unmanned cluster, wherein the number of the devices comprises m unmanned planes, n communication unmanned boats, l pursuit unmanned boats and p targets. In the unmanned cluster, unmanned aerial vehicles and communication unmanned boats form communication groups, each communication group comprises one communication unmanned boat and a plurality of unmanned aerial vehicles, and communication of unmanned aerial vehicles in the group is divided into communication between A2S (UAV to UAV) and communication between A2A (UAV to UAV) according to difference of communication parties. In the packet backhaul uplink, drones share uplink spectrum resources and reuse resource blocks within the group.
3. Designing a collaborative coverage search algorithm based on an environmental stimulus function to optimize a search path of each unmanned aerial vehicle, wherein the method comprises the following specific steps:
1) Initializing the position and state of the drone, wherein the state information of drone i at time t may be represented as:
s i (t)=[λ i (t),o i (t)]
wherein λ is i (t)=(x i (t),y i (t)) represents the position coordinates, o, of drone i in environment E at time t i (t) represents the heading angle of drone i at time t.
2) Calculating a stimulation function c for each grid (x,y) In the cluster search process, the stimulus function c (x,y) The following calculation method is adopted for updating:
Figure BDA0004067256240000091
wherein c is (x,y) (0) Is Grid (x,y) Initial stimulation value, alpha epsilon (0, 1) is attenuation coefficient, when Grid (x,y) The more times of being searched, the smaller the search stimulus value.
Drone i will select the grid with the largest search stimulus value within the adjacent grid as the next search point, i.e. the next search point
Figure BDA0004067256240000092
Under the action of the environment search stimulation function, the unmanned aerial vehicle always tends to move to the grid with larger stimulation values, namely the grid which is not searched is more likely to be selected and searched by the unmanned aerial vehicle, and the unmanned aerial vehicle can effectively avoid the area which is repeatedly searched by the unmanned aerial vehicle. Therefore, the cluster higher search coverage rate and efficiency can be ensured.
After the unmanned aerial vehicle i searches for the target, the unmanned aerial vehicle records and calculates a target state, and sends the target state to the communication unmanned ship in the same communication group, wherein the target state can be expressed as:
s target,j (t)=[λ j (t),ν j (t),θ j,i (t)]
wherein λ is j (t)=(x j (t),y j (t)) represents the position coordinates of the target j in the environment E at time t, ν j (t) represents the target j speed, θ j,i (t) represents the deflection angle of target j relative to drone i.
4. After the unmanned aerial vehicle searches for the target, the unmanned aerial vehicle measures the speed by adopting ultra wide band ranging and visual odometer, and measures the relative distance between the unmanned aerial vehicle and the unmanned ship under each time step in real time
Figure BDA0004067256240000093
And relative speed->
Figure BDA0004067256240000094
According to the measurement data, giving a relative positioning estimation between the unmanned aerial vehicle i and the target j under the t time step:
Figure BDA0004067256240000101
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0004067256240000102
represents the rate of change of relative distance, ε t 、/>
Figure BDA0004067256240000103
And &>
Figure BDA0004067256240000104
Respectively is the relative speed->
Figure BDA0004067256240000105
Relative distance->
Figure BDA0004067256240000106
Relative distance change rate>
Figure BDA0004067256240000107
The measurement error at the t time step; t is the sampling period of the ultra-wideband sensor, and gamma belongs to R + Is a tunable constant gain.
Further, according to the state information of the pursuit unmanned ship and the unmanned aerial vehicle given in the cluster, the relative distance between the pursuit unmanned ship k and the unmanned aerial vehicle i is obtained
Figure BDA0004067256240000108
Relative speed>
Figure BDA0004067256240000109
And the rate of change of relative distance->
Figure BDA00040672562400001010
And then the relative distance between each pursuit unmanned boat and the target is calculated>
Figure BDA00040672562400001011
Relative speed->
Figure BDA00040672562400001012
And relative distance change rate>
Figure BDA00040672562400001013
Further, calculating the relative positioning estimation of the pursuit unmanned ship k and the target j under the same time step:
Figure BDA00040672562400001014
5. in target distribution, a total of l pursuit unmanned boats pursue p targets, wherein l is larger than or equal to p. Setting a target allocation matrix a = [ a ] ij ]When a is ij =1, indicates that the target j is assigned to the pursuit unmanned boat i, when a ij If =0, it means that the target j is not assigned to the pursuit unmanned ship i. In object allocation, each object should be allocated at least one pursuit unmanned boat, i.e.
Figure BDA00040672562400001015
Furthermore, all pursuit unmanned boats should eventually be subjected to a pursuit task, i.e. </or>
Figure BDA00040672562400001016
Establishing a target distribution model by using the minimization of the initial relative distance between the unmanned ship and the target as a distribution target as follows:
Figure BDA00040672562400001017
Figure BDA00040672562400001018
Figure BDA00040672562400001019
a ij ∈{0,1}
wherein the content of the first and second substances,
Figure BDA00040672562400001020
and the initial relative distance between the unmanned boat and the target is shown, the matching degree of each unmanned boat with the target is calculated by each unmanned boat, and the unmanned boat with the highest matching degree is subjected to the pursuit task.
6. The invention designs a distributed multi-agent reinforcement learning method based on equipment communication, which realizes a maneuvering decision for multi-unmanned boat cooperative pursuit, and comprises the following specific contents:
(1) Policy coordination mechanism
In the target pursuit process of the unmanned ships, each pursuit unmanned ship makes a maneuvering decision according to the situation of the unmanned ship in a high sea condition environment, the pursuit of the multi-unmanned ship system can be regarded as a competition game between the unmanned ships and the targets, and an unmanned ship target pursuit model is established and is represented by tuples:
Figure BDA0004067256240000111
where S represents the current chased state space, which can be shared by all devices in the cluster, A i Represents the action space of the pursuit unmanned boat i, T: sxA l → S denotes the deterministic transfer function of the environment, R i :
Figure BDA0004067256240000112
Representing a reward function for pursuing the unmanned boat.
The global reward value of the pursuit unmanned boat formation is defined as the average value of all the pursuit unmanned boat reward values, and can be expressed as:
Figure BDA0004067256240000113
wherein r is t And (s, a) represents the reward value obtained by chasing the unmanned boat formation at the time t under the state s.
The goal of pursuing unmanned boat formation is to learn a strategy that maximizes the expected value of the discount reward, i.e. the
Figure BDA0004067256240000114
Wherein 0 < lambda < 1 is a discount factor.
In summary, the following great strategies can be obtained:
Figure BDA0004067256240000115
wherein s' ≡ s t+1 The state indicating the time is determined by the state transition function T (s, a).
To reflect the role of each pursuit unmanned boat individual in the cooperative pursuit, a reward value is set for each pursuit unmanned boat, which can be expressed as:
r i (s,a)=r cap +r help +r step
wherein r is cap Represents a capturing reward obtained when the distance between the pursuit unmanned ship and the target is less than the pursuit distance, i.e., the pursuit unmanned ship pursues the target, r help Indicating that when the same target has a plurality of unmanned boats for pursuing to pursue the task, after the target is caught, assistance rewards can be obtained, r step Denoted as a step prize, is composed of a number of sub-prize weights:
r step =ω 1 r 12 r 2
ω 12 =1
wherein r is 1 And r 2 Is defined as follows:
1. pursuit distance reward r 1
Figure BDA0004067256240000121
At each time step, the pursuit unmanned ship receives a negative return r 1 Distance from remaining pursuit
Figure BDA0004067256240000122
Linear relationship, k 1 To adjust the coefficients.
2. Collision reward r 2
Figure BDA0004067256240000123
Wherein d is min =mind i,j Denotes the minimum travel distance between unmanned boats, r 2 ∈(-1,0],k 2 To adjust the coefficients.
For m unmanned boat pursuits, there are m Bellman equations, i.e.
Figure BDA0004067256240000124
In the training process of reinforcement learning, feedback of each defined pursuit unmanned ship in the aspects of target allocation, collision avoidance and the like is given through allocation of reward values. Decision coordination can be realized after training of the pursuit unmanned ship, so that behaviors of the pursuit unmanned ships reach tacit conditions.
(2) Decision learning mechanism
And establishing a multi-unmanned-boat pursuit maneuver decision model to ensure information interaction among unmanned boats and realize the cooperation of cluster maneuvers. The model adopts an Actor-Critic structure, the Actor network and the Critic network of each pursuit unmanned ship are connected through a bidirectional recurrent neural network, specifically, as shown in fig. 3, hidden layers in a strategy network (Actor) and a Q network (Critic) of a single pursuit unmanned ship decision model are used as recurrent units of the bidirectional recurrent neural network, and the strategy network and the Q network are expanded according to the number of pursuit unmanned ships.
The individual objective function of pursuing unmanned boats may be defined as:
Figure BDA0004067256240000125
wherein
Figure BDA0004067256240000126
Indicating that action a is taken under the state transition function T θ The resulting state distribution.
The objective function of pursuing the formation of unmanned boats is expressed as:
Figure BDA0004067256240000127
according to the multi-subject deterministic policy gradient theorem, the gradient of the policy network parameter θ is:
Figure BDA0004067256240000128
using parameterized critical functions Q ξ (s, a) to estimate the state-action function of the above equation
Figure BDA0004067256240000131
And training Critic using a sum of squares loss function. Q ξ The gradient of (s, a) can be expressed as: />
Figure BDA0004067256240000132
Where ξ is the Q network parameter.
And optimizing the Actor network and the Critic network by adopting a random gradient descent method, and updating network parameters through data obtained by trial and error in the interactive learning process to complete the optimization of collaborative search pursuit.
(3) Training and learning process of multi-unmanned-boat cooperative target pursuit decision model
a. Initializing the online network parameters of Actor and Critic and allocating the online network parameters to corresponding target network parameters, namely theta ' ← theta and xi, wherein theta ' and xi ' are the target parameters of Actor and Critic, respectively. Initializing an empirical playback space
Figure BDA0004067256240000133
Storing the data obtained in the exploration;
b. determining an initial state of training, and setting an initial position state and a speed state of a pursuing unmanned ship formation and a target;
c. and repeating the multi-set training according to the initial state, wherein each set of trapping simulation executes the following operations:
each chasing unmanned boat is based on state s t And a random process epsilon generates an action
Figure BDA0004067256240000134
And executing;
after all actions have been performed, the state transitions to s t+1 Calculating the prize value
Figure BDA0004067256240000135
And will pass on the process variable
Figure BDA0004067256240000136
Store to empirical playback space >>
Figure BDA0004067256240000137
During learning, a batch of M pieces of experience data are randomly extracted
Figure BDA0004067256240000138
To calculate a target Q-value for each of the pursuit drones, i.e.
Figure BDA0004067256240000139
Calculating a gradient estimate of Critic, i.e.
Figure BDA00040672562400001310
Calculating an estimate of the gradient of the Actor, i.e.
Figure BDA00040672562400001311
According to the obtained gradient estimation delta xi and delta theta, updating the online network parameters of Actor and Critic, and then updating the target network parameters, namely updating the target network parameters
Figure BDA0004067256240000141
Where k ∈ (0, 1).
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be implemented by adopting various computer languages, such as object-oriented programming language Java and transliterated scripting language JavaScript.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (8)

1. A multi-target searching and catching method for an unmanned cluster in a high sea condition environment is characterized by comprising the following steps:
s1, discretizing a sea area to be searched by using a grid form, and modeling a search environment by using a grid method;
s2, regarding the unmanned aerial vehicles as particles moving on an aerial two-dimensional plane, optimizing a search path for each unmanned aerial vehicle based on a collaborative coverage search algorithm of an environmental stimulus function, optimizing a next optimal track point of the unmanned aerial vehicle according to state information of the unmanned aerial vehicle and the environmental stimulus function, updating a motion state of the unmanned aerial vehicle and moving to a corresponding position according to the optimal track point, searching grids in a range of the unmanned aerial vehicle in each time step, and sending perception information to the unmanned communication boat;
s3, tracking the target after the unmanned aerial vehicle searches the target, measuring and calculating the relative distance, the relative distance change rate and the relative speed between the unmanned aerial vehicle and the target in each time step, calculating the relative positioning estimation of the unmanned aerial vehicle and the target, and further calculating the relative positioning estimation of the pursuit unmanned ship and the target according to the positioning estimation of the unmanned aerial vehicle and the target;
s4, the unmanned aerial vehicle records target state information and transmits the target state information to the communication unmanned ship, a target assignment matrix is constructed, and unmanned ship target task allocation is carried out according to the current target state information and the existing unmanned ship pursuit state information;
s5, establishing a pursuit unmanned ship pursuit decision model and a decision learning model on the basis of the allocation of a target task to the pursuit unmanned ship; and after the pursuit unmanned boat is completed, the pursuit task of the system is executed.
2. The unmanned clustered multi-objective searching and tracking method under high sea state environment as claimed in claim 1, wherein the whole environment is regarded as a planar rectangular area in step S1, and the area is divided into L x ×L y Discrete Grid, grid (x,y) Using the x-th row and y-th column grids representing rectangles, Δ x and Δ y represent the length and width of the unit grid, respectively, the whole search environment E is expressed by the grid set formula as follows:
E={Grid (x,y) |m=1,,2,…,L x ,n=1,,2,…,L y }
at time t, grid (x,y) The state of (c) is represented as:
s (x,y) (t)=[μ (x,y) ,ζ (x,y) ,η (x,y) (t),c (x,y) ]
in the formula, mu (x,y) Represents Grid (x,y) Coordinate of center point, ζ (x,y) E {0,1} represents Grid (x,y) Whether or not a target is present, wherein (x,y) =1 denotes Grid (x,y) ζ is the search target (x,y) =0 means Grid (x,y) Absence of target therein, η (x,y) (t) ∈ {0,1,2, \8230;, h } denotes Grid (x,y) Number of searches until time t, c (x,y) For the search of the stimulus function, grid is represented (x,y) The attraction degree to the unmanned aerial vehicle.
3. The unmanned cluster multi-target searching and catching method under the high sea condition environment according to claim 1, wherein the specific steps of optimizing the search path for each unmanned aerial vehicle based on the collaborative coverage search algorithm of the environmental stimulus function in step S2 are as follows:
s21, initializing the position and the state of the unmanned aerial vehicle, wherein the state information formula of the unmanned aerial vehicle i at the moment t is expressed as follows:
s i (t)=[λ i (t),o i (t)]
in the formula of lambda i (t)=(x i (t),y i (t)) position coordinates, o, of drone i in environment E at time t i (t) represents the heading angle of drone i at time t;
s22, calculating a stimulation function c of each grid (x,y) In the cluster search process, the stimulus function c (x,y) The following calculation method is adopted for updating:
Figure FDA0004067256230000021
in the formula, c (x,y) (0) Is Grid (x,y) Initial stimulation value, alpha epsilon (0, 1) is attenuation coefficient, when Grid (x,y) The more times of being searched, the smaller the search stimulus value;
drone i will select the grid with the largest search stimulus value as the next search point within the adjacent grid, the formula is as follows:
Figure FDA0004067256230000022
after the unmanned aerial vehicle i searches for the target, the unmanned aerial vehicle records and calculates a target state, and sends the target state to the communication unmanned ship in the same communication group, wherein the target state is expressed by a formula:
s target,j (t)=[λ j (t),ν j (t),θ j,i (t)]
in the formula, λ j (t)=(x j (t),y j (t)) represents the position coordinates of the target j in the environment E at time t, v j (t) represents the target j speed, θ j,i (t) represents the deviation angle of target j with respect to drone i.
4. The unmanned cluster multi-target searching and catching method under the high sea condition environment of claim 1, wherein in step S3, after the unmanned aerial vehicle searches for the target, the unmanned aerial vehicle measures the speed by using ultra-wideband distance measurement and visual odometer, and measures the relative distance between the unmanned aerial vehicle and the unmanned boat at each time step in real time
Figure FDA0004067256230000031
And relative speed->
Figure FDA0004067256230000032
According to the measurement data, given the t time step, the relative positioning estimation formula between the unmanned aerial vehicle i and the target j is expressed as follows:
Figure FDA0004067256230000033
in the formula (I), the compound is shown in the specification,
Figure FDA0004067256230000034
represents the rate of change of relative distance, ε t 、/>
Figure FDA0004067256230000035
And &>
Figure FDA0004067256230000036
Respectively is the relative speed->
Figure FDA0004067256230000037
Relative distance->
Figure FDA0004067256230000038
And the rate of change of relative distance->
Figure FDA0004067256230000039
The measurement errors at the t-th time step respectively; t is the sampling period of the ultra-wideband sensor, and gamma belongs to R + Is a tunable constant gain;
obtaining the relative distance between the pursuit unmanned ship k and the unmanned aerial vehicle i according to the state information of the pursuit unmanned ship and the unmanned aerial vehicle given in the cluster
Figure FDA00040672562300000310
Relative speed->
Figure FDA00040672562300000311
And relative distance change rate>
Figure FDA00040672562300000312
And then the relative distance between each pursuit unmanned boat and the target is calculated>
Figure FDA00040672562300000313
Relative speed->
Figure FDA00040672562300000314
And the rate of change of relative distance->
Figure FDA00040672562300000315
Calculating a relative positioning estimation formula of the pursuit unmanned ship k and the target j under the same time step length, wherein the relative positioning estimation formula is expressed as follows:
Figure FDA00040672562300000316
5. the unmanned cluster multi-target searching and pursuing method in the high sea condition environment as claimed in claim 1, wherein in step S4, a total of l pursuit unmanned boats pursue p targets, wherein l is larger than or equal to p, and a target distribution matrix a = [ a ] = is set ij ]When a is ij =1, the target j is assigned to the pursuit unmanned boat i when a ij By =0, it is meant that the target j is not assigned to the pursuit unmanned boat i, and in the target assignment, each target should be assigned at least one pursuit unmanned boat, that is, at least one pursuit unmanned boat is assigned
Figure FDA0004067256230000041
Furthermore, all pursuit unmanned boats should eventually be subjected to the pursuit task, i.e. <' > in & -5 & ->
Figure FDA0004067256230000042
Establishing a target distribution model for a distribution target by minimizing the initial relative distance between the unmanned ship and the target, wherein the target distribution model is expressed as follows:
Figure FDA0004067256230000043
Figure FDA0004067256230000044
Figure FDA0004067256230000045
a ij ∈{0,1}
in the formula (I), the compound is shown in the specification,
Figure FDA0004067256230000046
and the initial relative distance between the pursuit unmanned boat and the target is shown, the matching degree of each pursuit unmanned boat with the target is calculated, and the unmanned boat with the highest matching degree is subjected to pursuit tasks. />
6. The unmanned cluster multi-target searching and pursuit method under the high sea condition environment as claimed in claim 1, wherein a pursuit unmanned ship target pursuit model is established in step S5, and the model is represented by tuples as follows:
Figure FDA0004067256230000047
where S represents the current chasing state space, which can be shared by all devices in the cluster, and A i Represents an action space for chasing the unmanned boat i, T: s X A l → S denotes the deterministic transfer function of the environment, R i
Figure FDA0004067256230000048
Representing a reward function for pursuing the unmanned ship;
the global reward value of the pursuit unmanned ship formation is defined as the average value of the reward values of all the pursuit unmanned ships, and the formula is expressed as follows:
Figure FDA0004067256230000051
in the formula, r t (s, a) represents the reward value obtained by chasing the unmanned boat formation at the time t under the state s;
the maximum strategy formulation is as follows:
Figure FDA0004067256230000052
in which s' ≡ s t+1 Represents the state at time t + 1;
setting the reward value of each pursuing unmanned boat, wherein the formula is expressed as follows:
r i (s,a)=r cap +r help +r step
in the formula, r cap Represents a capturing reward obtained when the distance between the pursuit unmanned ship and the target is less than the pursuit distance, i.e., the pursuit unmanned ship pursues the target, r help Indicating that when a plurality of unmanned boats for pursuing are arranged at the same target for pursuing, the assisting reward is obtained after the target is caught, r step Expressed as a stride reward, r step The formula is expressed as follows:
r step =ω 1 r 12 r 2
ω 12 =1
in the formula, r 1 Reward for pursuing distance r 2 Awarding for collisions;
pursuit distance reward r 1 The formula is expressed as follows:
Figure FDA0004067256230000054
in the formula (I), the compound is shown in the specification,
Figure FDA0004067256230000055
for remaining catch distance, k 1 Is awarded as r 1 Adjusting the coefficient;
collision reward r 2 The formula is expressed as follows:
Figure FDA0004067256230000053
in the formula (d) min =mind i,j Denotes the minimum travel distance between unmanned boats, r 2 ∈(-l,0],k 2 Is awarded as r 2 And adjusting the coefficient.
7. The unmanned cluster multi-target searching and chasing method under the high sea condition environment according to claim 1, characterized in that a multi-unmanned boat chasing maneuver decision model is established in step S5, an Actor-Critic structure is adopted, an Actor network and a Critic network of each chasing unmanned boat are connected through a bidirectional recurrent neural network, a hidden layer in the Actor network and the Critic network of a single chasing unmanned boat decision model is used as a recurrent unit of the bidirectional recurrent neural network, and the chasing unmanned boats are expanded according to the number of the chasing unmanned boats;
wherein the content of the first and second substances,
the formula of the individual target function of the unmanned boat is expressed as follows:
Figure FDA0004067256230000061
in the formula (I), the compound is shown in the specification,
Figure FDA0004067256230000062
indicating that action a is taken under the state transition function T θ The resulting status profile, <' >>
Figure FDA0004067256230000063
As desired; />
The formula of the target function of the pursuit unmanned ship formation is expressed as follows:
Figure FDA0004067256230000064
the gradient formula of the policy network parameter θ is expressed as follows:
Figure FDA0004067256230000065
using parameterized critical functions Q ξ (s, a) to estimate the state-action function of the above equation
Figure FDA0004067256230000066
And training Critic using a sum of squares penalty function, Q ξ The gradient formula of (s, a) is expressed as follows:
Figure FDA0004067256230000067
in the formula, xi is a Q network parameter;
and optimizing the Actor network and the Critic network by adopting a random gradient descent method, and updating network parameters through data obtained by trial and error in the interactive learning process to complete the optimization of collaborative search pursuit.
8. The unmanned cluster multi-target searching and pursuing method in the high sea condition environment as claimed in claim 7, wherein the training and learning process of the multi-unmanned-boat cooperative target pursuit decision model comprises the following steps:
s51, initializing online network parameters of Actor and Critic, and allocating the online network parameters to corresponding target network parameters, namely theta '← theta and xi' ← xi, wherein theta 'and xi' are the target parameters of Actor and Critic respectively, and initializing empirical playback space
Figure FDA0004067256230000071
Storing the data obtained in the exploration;
s52, determining an initial state of training, and setting an initial position state and a speed state of a pursuing unmanned ship formation and a target;
s53, repeating the multi-set training according to the initial state, and simulating to execute the following operations:
each unmanned catch boat generates one based on state st and random process epsilonMovement of
Figure FDA0004067256230000072
And executing;
after all actions have been performed, the state transitions to s t+1 Calculating the prize value
Figure FDA0004067256230000073
And will pass on the process variable
Figure FDA0004067256230000074
Store to empirical playback space >>
Figure FDA0004067256230000075
During learning, a batch of M pieces of experience data are randomly extracted
Figure FDA0004067256230000076
To calculate the target Q value of each pursuit unmanned boat, the formula is expressed as follows:
Figure FDA0004067256230000077
a gradient estimate for Critic is calculated, formulated as follows:
Figure FDA0004067256230000078
updating online network parameters of Actor and Critic according to the obtained gradient estimation delta xi and delta theta, and then updating target network parameters, wherein a formula is expressed as follows:
Figure FDA0004067256230000079
wherein k ∈ (0, 1).
CN202310080412.XA 2023-02-08 2023-02-08 Unmanned cluster multi-target searching and catching method in high sea condition environment Pending CN115951711A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310080412.XA CN115951711A (en) 2023-02-08 2023-02-08 Unmanned cluster multi-target searching and catching method in high sea condition environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310080412.XA CN115951711A (en) 2023-02-08 2023-02-08 Unmanned cluster multi-target searching and catching method in high sea condition environment

Publications (1)

Publication Number Publication Date
CN115951711A true CN115951711A (en) 2023-04-11

Family

ID=87291253

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310080412.XA Pending CN115951711A (en) 2023-02-08 2023-02-08 Unmanned cluster multi-target searching and catching method in high sea condition environment

Country Status (1)

Country Link
CN (1) CN115951711A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117008464A (en) * 2023-10-07 2023-11-07 广东海洋大学 Unmanned ship navigation method based on attitude control

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117008464A (en) * 2023-10-07 2023-11-07 广东海洋大学 Unmanned ship navigation method based on attitude control
CN117008464B (en) * 2023-10-07 2023-12-15 广东海洋大学 Unmanned ship navigation method based on attitude control

Similar Documents

Publication Publication Date Title
CN111667513B (en) Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
CN110989576B (en) Target following and dynamic obstacle avoidance control method for differential slip steering vehicle
CN113110509B (en) Warehousing system multi-robot path planning method based on deep reinforcement learning
CN112465151A (en) Multi-agent federal cooperation method based on deep reinforcement learning
CN110488859B (en) Unmanned aerial vehicle route planning method based on improved Q-learning algorithm
Russell et al. Q-decomposition for reinforcement learning agents
CN110134140B (en) Unmanned aerial vehicle path planning method based on potential function reward DQN under continuous state of unknown environmental information
Ma et al. Multi-robot target encirclement control with collision avoidance via deep reinforcement learning
CN112180967B (en) Multi-unmanned aerial vehicle cooperative countermeasure decision-making method based on evaluation-execution architecture
CN113791634A (en) Multi-aircraft air combat decision method based on multi-agent reinforcement learning
CN111859541B (en) PMADDPG multi-unmanned aerial vehicle task decision method based on transfer learning improvement
CN114625151A (en) Underwater robot obstacle avoidance path planning method based on reinforcement learning
CN115951711A (en) Unmanned cluster multi-target searching and catching method in high sea condition environment
CN113504798A (en) Unmanned aerial vehicle cluster cooperative target searching method imitating biological group negotiation behaviors
CN112651486A (en) Method for improving convergence rate of MADDPG algorithm and application thereof
Xue et al. Multi-agent deep reinforcement learning for UAVs navigation in unknown complex environment
CN117606489B (en) Unmanned aerial vehicle flight survey method, unmanned aerial vehicle flight survey equipment and unmanned aerial vehicle flight survey medium
CN117724524A (en) Unmanned aerial vehicle route planning method based on improved spherical vector particle swarm algorithm
Sun et al. Multi-agent cooperative search based on reinforcement learning
CN114609925B (en) Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish
CN116227622A (en) Multi-agent landmark coverage method and system based on deep reinforcement learning
CN113959446B (en) Autonomous logistics transportation navigation method for robot based on neural network
Yang Reinforcement learning for multi-robot system: A review
CN115097861A (en) Multi-Unmanned Aerial Vehicle (UAV) capture strategy method based on CEL-MADDPG
Diallo et al. Coordination in adversarial multi-agent with deep reinforcement learning under partial observability

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination