CN114625167A - Unmanned aerial vehicle collaborative search method and system based on heuristic Q-learning algorithm - Google Patents

Unmanned aerial vehicle collaborative search method and system based on heuristic Q-learning algorithm Download PDF

Info

Publication number
CN114625167A
CN114625167A CN202210281173.XA CN202210281173A CN114625167A CN 114625167 A CN114625167 A CN 114625167A CN 202210281173 A CN202210281173 A CN 202210281173A CN 114625167 A CN114625167 A CN 114625167A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
search
heuristic
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210281173.XA
Other languages
Chinese (zh)
Inventor
王怀震
王小谦
高明
王建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong New Generation Information Industry Technology Research Institute Co Ltd
Original Assignee
Shandong New Generation Information Industry Technology Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong New Generation Information Industry Technology Research Institute Co Ltd filed Critical Shandong New Generation Information Industry Technology Research Institute Co Ltd
Priority to CN202210281173.XA priority Critical patent/CN114625167A/en
Publication of CN114625167A publication Critical patent/CN114625167A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses an unmanned aerial vehicle collaborative search method and system based on heuristic Q-learning algorithm, belonging to the field of unmanned aerial vehicle collaborative search, aiming at solving the technical problems of low efficiency and low reliability of the existing unmanned aerial vehicle search method, and the technical scheme is as follows: the method comprises the following specific steps: s1, establishing an unmanned aerial vehicle power model and searching an environment perception map; s2, establishing an unmanned aerial vehicle search reward and heuristic reward mechanism: designing an excellent unmanned aerial vehicle search reward mechanism based on the unmanned aerial vehicle power model, the environment perception map and the unmanned state established in the step S1; s3, the unmanned aerial vehicle adopts a heuristic Q-learning algorithm, explores the search area by using the unmanned aerial vehicle search reward mechanism established in the step S2, plans the search path and realizes the optimal search of the unmanned aerial vehicle; s4, the unmanned aerial vehicle updates the search environment perception map according to the search result of the step S3; s5, fusing attribute values of the multi-unmanned aerial vehicle environment perception map; and S6, repeating the steps S3-S5 until all the targets are searched.

Description

Unmanned aerial vehicle collaborative search method and system based on heuristic Q-learning algorithm
Technical Field
The invention relates to the field of unmanned aerial vehicle collaborative search, in particular to an unmanned aerial vehicle collaborative search method and system based on a heuristic Q-learning algorithm.
Background
In recent years, unmanned aerial vehicles have been widely used in the fields of post-disaster rescue, target striking, military confrontation, and the like. Drone search is an important area of drones. Unmanned aerial vehicle flies to searching sub-region top, uses the airborne sensor to explore this region, designs good unmanned aerial vehicle search strategy for unmanned aerial vehicle catches moving object in the region as far as possible. Therefore, in order to improve the unmanned aerial vehicle searching efficiency, various strategies are provided, and the classical rolling time domain planning generates an optimal searching path on line and is widely applied to the unmanned aerial vehicle searching process.
The extreme value of a predicted path is screened by the classical rolling time domain planning, and the value of an adjacent grid is not considered. And when the prediction time domain is too long, the single step planning time is too long, and the requirement of real-time property cannot be met. Depending on various numerical solutions or heuristic algorithms, the objective function is often required to be conductive or differentiable, and the problem of low reliability exists in the using process.
Disclosure of Invention
The invention provides an unmanned aerial vehicle collaborative searching method and system based on a heuristic Q-learning algorithm, and aims to solve the problems of low efficiency and low reliability of the existing unmanned aerial vehicle searching method.
The technical task of the invention is realized in the following way, and the unmanned aerial vehicle collaborative searching method based on the heuristic Q-learning algorithm specifically comprises the following steps:
s1, establishing an unmanned aerial vehicle power model and searching an environment perception map;
s2, establishing an unmanned plane search reward and heuristic reward mechanism: designing an excellent unmanned aerial vehicle search reward mechanism based on the unmanned aerial vehicle power model, the environment perception map and the unmanned state established in the step S1;
s3, the unmanned aerial vehicle adopts a heuristic Q-learning algorithm, explores the search area by using the unmanned aerial vehicle search reward mechanism established in the step S2, plans the search path and realizes the optimal search of the unmanned aerial vehicle;
s4, the unmanned aerial vehicle updates the search environment perception map according to the search result of the step S3;
s5, fusing attribute values of the multi-unmanned-aerial-vehicle environment perception map: the unmanned aerial vehicle and the unmanned aerial vehicle in the communication range carry out information intercommunication, exchange and fuse the perception information of the search environment map;
and S6, repeating the steps S3-S5 until all the targets are searched.
Preferably, the environment search map establishment in step S1 is specifically as follows:
rasterizing the search area into an N M discretization grid map;
the length and width of the grid are dx and dy respectively; dx, dy is the distance for the drone to fly at average flat flight speed for a planned duration.
Preferably, the environment search map in step S1 is divided into an object existence probability map and an uncertainty map;
the establishment of the target existence probability map model is as follows:
(1) defining p (x, y, k) epsilon (0,1) to represent the probability that the target is positioned at (x, y) at the moment k; p (x, y,0) ═ 0.5 indicates that the probability that the target exists in (x, y) and does not exist in (x, y) at the initial time is the same, and p (x, y,0) is as follows:
Figure BDA0003557862990000021
wherein v isnRepresenting a target existence probability peak width; c. CnRepresenting a target presence probability peak; (x)n,yn) Representing locations where targets may appear;
(2) unmanned aerial vehicle has detection probability p in the detection processdAnd false alarm probability pfIt is defined as:
p(Z(x,y,k)=1|δ(x,y,k)=1)=pd
p(Z(x,y,k)=1|δ(x,y,k)=0)=pf
wherein Z (x, y, k) represents a target detection result of the unmanned aerial vehicle k at (x, y) moment; δ (x, y, k) represents the presence of (x, y) at time instant of target k;
(3) with the search of the unmanned aerial vehicle on the area, the attribute value of the target existence probability map is continuously updated according to the Bayes theorem; the updating formula of the target existence probability map is as follows:
Figure BDA0003557862990000022
Figure BDA0003557862990000031
wherein phi iskRepresenting the detection range of the unmanned aerial vehicle at the k moment;
the uncertainty map model is established as follows:
(1) defining uncertainty information of chi (x, y, k) at the time k, and initializing an uncertainty map; the formula is as follows:
χ(x,y,0)=-p(x,y,0)log2 p(x,y,0);
(2) the uncertain map attribute value exponentially decays along with the increase of the number of times of access of the unmanned aerial vehicle, and the formula is as follows:
χ(x,y,k)=τ*χ(x,y,k)
wherein τ ∈ (0,1) represents a decay exponent.
Preferably, the mechanism for searching and awarding the unmanned aerial vehicle and the heuristic awarding in step S2 is as follows:
s201, searching the reward r by rA,rB,rC,rDFour parts, the formula is as follows:
r=w1rA+w2rB+w3rC+w4rD
wherein r isAIndicating that a probability value reward exists for the target; r isBRepresenting an indeterminate value reward; r isCRepresenting a flight cost reward; r isDIndicating a desired search reward;
s202, reward r of probability value of existence of targetAThe setting is as follows:
rA=(1-σ(x,y,k))*p(x,y,k);
Figure BDA0003557862990000032
wherein σ (x, y, k) represents whether the drone considers (x, y) to be present; when p (x, y, k) > p _ max, the drone considers that a target exists at (x, y);
s203, uncertain value reward rBThe setting is as follows:
rB=χ(x,y,k+1)-χ(x,y,k);
s204, flight cost reward rCThe setting is as follows:
Figure BDA0003557862990000041
s205, expecting the exploration cost rDThe setting is as follows:
Figure BDA0003557862990000042
s206, setting a heuristic reward mechanism: the heuristic reward mechanism attracts the unmanned aerial vehicle to move towards the place where the probability value of the global target is the maximum and to be far away from the map boundary;
s207, the heuristic reward and the search reward jointly form a reward mechanism of heuristic reinforcement learning, and the formula is as follows:
Figure BDA0003557862990000043
wherein D and F represent heuristic factors; d represents the distance from any position of the map to the position with the maximum probability value of the global map target; f represents the minimum distance from the position of the unmanned aerial vehicle to the 4 edges of the map; c and d represent adjustment coefficients.
Preferably, the heuristic Q-learning algorithm in step S3 is specifically as follows:
s301, building a Q table, initializing Q (S, a) to an arbitrary value, and setting tracking trace e (S, a) to 0;
s302, circulating:
initializing an unmanned aerial vehicle state s;
loop (for each step within the round):
selecting action a according to heuristic action selection strategy according to deep learning Q table
Taking action a to obtain heuristic reward r and next state s'
Figure BDA0003557862990000044
e(s,a)←1
s←s′
Wherein e (s, a) is the qualification track of the unmanned aerial vehicle taking the action a in the s state; gamma is a discount factor; δ is the increment of Q.
Updating value function and qualification trace of all states s' of unmanned aerial vehicle
Q(s,a)←Q(s,a)+αδe(s,a)
e(s,a)←γλe(s,a)
Wherein α is a learning rate; λ is the learning step size.
Until the end of the number of round steps
Until all Q (s, a) converge;
s303, outputting a final strategy, wherein the formula is as follows:
Figure BDA0003557862990000051
in the formula, pi*An optimal strategy for the unmanned aerial vehicle; q (s, a) is an action value function value of the unmanned aerial vehicle executing the action a in the state s;
setting a heuristic action selection mechanism, and improving an action selection strategy by utilizing heuristic information of a harmonic function E (s, a) so that the unmanned aerial vehicle tends to drive to a region with the maximum probability value, wherein the formula is as follows:
Figure BDA0003557862990000052
in the formula, E (s, a) is a harmonic function value of the unmanned aerial vehicle executing the action a in the state s; epsilon (0,1) is a random probability value; a israndomRandomly selecting an action for the action set;
when the intelligent agent selects the action in the s state, the action with the maximum sum of the value function and the harmonic function in the state is selected by the probability of epsilon, and the action is randomly selected by the probability of 1-epsilon, wherein the formula is as follows:
Figure BDA0003557862990000053
wherein p iskRepresenting the position of the unmanned aerial vehicle at the moment k; p is a radical ofp_maxIndicating the position with the maximum probability value of the target existence; η represents a positive coefficient for adjusting the magnitude of the harmonic function E (s, a).
Preferably, the context aware map update in step S4 is specifically as follows:
s401, the unmanned aerial vehicle explores the environment under the guidance of an unmanned aerial vehicle searching method based on a heuristic Q-learning algorithm and enters a new exploration area;
s402, the unmanned aerial vehicle updates the target existence probability map and the uncertain map according to the exploration result of the unmanned aerial vehicle and the updating rule of the environment perception map.
Preferably, the attribute value fusion of the multi-drone environment perception map in step S5 is specifically as follows:
in the flight process, the distributed unmanned aerial vehicle carries out information exchange with the unmanned aerial vehicle in a communication range, exchanges environment perception map information, and fuses attribute values of the environment perception map, wherein the formula is as follows:
Figure BDA0003557862990000061
Figure BDA0003557862990000062
wherein n represents the number of drones (including itself) within the communication range; v. ofi,k(x, y) represents attribute values (target existence probability value, uncertainty value, etc.) of the environment perception map at (x, y) of the unmanned aerial vehicle i at the time k; vkAnd (x, y) represents an attribute value of the environment perception map after the information of (x, y) is fused at the time k.
An unmanned aerial vehicle collaborative search system based on heuristic Q-learning algorithm comprises,
the building module I is used for building an unmanned aerial vehicle power model and searching an environment perception map;
the establishment module II is used for establishing an unmanned aerial vehicle search reward and heuristic reward mechanism;
the planning module is used for exploring the search area by using an unmanned aerial vehicle search reward mechanism by adopting a heuristic Q-learning algorithm, planning a search path and realizing the optimal search of the unmanned aerial vehicle;
the updating module is used for updating the search environment perception map according to the search result;
and the fusion module is used for fusing the attribute values of the multi-unmanned aerial vehicle environment perception map, namely, the unmanned aerial vehicles and the unmanned aerial vehicles in the communication range perform information intercommunication, exchange and fuse the perception information of the search environment map.
An electronic device, comprising: a memory and at least one processor;
wherein the memory has stored thereon a computer program;
the at least one processor executes the computer program stored by the memory, causing the at least one processor to perform a heuristic Q-learning algorithm based unmanned aerial vehicle collaborative search method as described above.
A computer-readable storage medium having stored thereon a computer program executable by a processor to implement a method for collaborative search of unmanned aerial vehicles based on a heuristic Q-learning algorithm as described above.
The unmanned aerial vehicle collaborative search method and the unmanned aerial vehicle collaborative search system based on the heuristic Q-learning G algorithm have the following advantages:
the grid mean value is considered, the grid mean value is driven by data, is independent of derivatives, does not need prediction, and has excellent convergence and robustness; meanwhile, reasonable rewards are designed by adopting a heuristic reward mechanism, and the prior knowledge is utilized to guide the intelligent agent to better complete tasks in the operation process; and the optimal search path is decided on line, so that the search efficiency is improved;
the invention provides a multi-unmanned aerial vehicle searching scheme with strong robustness, short single step decision time and high searching efficiency, thereby guiding an unmanned aerial vehicle to quickly complete a searching task;
compared with the prior art, the method has the advantages that the single-step search is shorter, and the search efficiency of the unmanned aerial vehicle is improved;
the invention realizes the rapid collaborative search of the unmanned aerial vehicle on the fixed area, the unmanned aerial vehicle scans and environment sub-area information, the self-learning environment searches the optimal path, the robustness is strong, and the unmanned aerial vehicle searching efficiency is improved;
and fifthly, the preset heuristic information guides the unmanned aerial vehicle to move away from the boundary and move to the area with higher target existence probability more quickly, so that the probability of finding the target is improved, the unmanned aerial vehicle searching efficiency is improved, and the time for the unmanned aerial vehicle to finish the task is reduced.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a flow chart of a collaborative search method for an unmanned aerial vehicle based on a heuristic Q-learning algorithm;
FIG. 2 is an environmental grid map and a model diagram of an unmanned aerial vehicle dynamics;
FIG. 3 is a diagram of a reinforcement learning framework;
FIG. 4 is a block diagram of a heuristic Q-learning algorithm;
FIG. 5 is a diagram of an initial position distribution of the unmanned aerial vehicle and the search target;
fig. 6 is an unmanned aerial vehicle initial target existence probability map;
fig. 7 is an initial uncertainty map of the drone;
fig. 8 is a target existence probability map of the unmanned aerial vehicle 1 when a task is completed;
fig. 9 is a target existence probability map of the drone 2 at the completion of the task;
fig. 10 is a diagram of the target existence probability of the drone 3 at the completion of the mission.
Detailed Description
The unmanned aerial vehicle collaborative search method and system based on the heuristic Q-learning algorithm of the invention are described in detail below with reference to the drawings and specific embodiments of the specification.
Example 1:
as shown in fig. 1, this embodiment provides an unmanned aerial vehicle collaborative search method based on a heuristic Q-learning algorithm, and the method specifically includes:
s1, establishing an unmanned aerial vehicle power model and searching an environment perception map;
s2, establishing an unmanned plane search reward and heuristic reward mechanism: designing an excellent unmanned aerial vehicle search reward mechanism based on the unmanned aerial vehicle power model, the environment perception map and the unmanned state established in the step S1;
s3, the unmanned aerial vehicle adopts a heuristic Q-learning algorithm, explores the search area by using the unmanned aerial vehicle search reward mechanism established in the step S2, plans a search path and realizes the optimal search of the unmanned aerial vehicle;
s4, the unmanned aerial vehicle updates the search environment perception map according to the search result of the step S3;
s5, fusing attribute values of the multi-unmanned-aerial-vehicle environment perception map: the unmanned aerial vehicle and the unmanned aerial vehicle in the communication range carry out information intercommunication, exchange and fuse the perception information of the search environment map;
and S6, repeating the steps S3-S5 until all the targets are searched.
The environment search map establishment in step S1 in this embodiment is specifically as follows:
rasterizing the search area into an N M discretization grid map;
the length and width of the grid are dx and dy respectively; dx, dy is the distance for the drone to fly at average flat flight speed for a planned duration.
The unmanned aerial vehicle is a fixed wing unmanned aerial vehicle and is constrained by dynamics, and the unmanned aerial vehicle moves by 45 degrees in left turning, straight going and 45 degrees in right turning within each planning time, as shown in the attached figure 2.
The environment search map in step S1 of the present embodiment is divided into a target existence probability map and an uncertainty map;
as shown in fig. 6, the establishment of the target existence probability map model is specifically as follows:
(1) defining p (x, y, k) epsilon (0,1) to represent the probability that the target is positioned at (x, y) at the moment k; p (x, y,0) ═ 0.5 indicates that the probability that the target exists in (x, y) and does not exist in (x, y) at the initial time is the same, and p (x, y,0) is as follows:
Figure BDA0003557862990000091
wherein v isnRepresenting a target existence probability peak width; c. CnRepresenting a target presence probability peak; (x)n,yn) Representing the possible locations of the objects;
(2) unmanned aerial vehicle has detection probability p in the detection processdAnd false alarm probability pfIt is defined as:
p(Z(x,y,k)=1|δ(x,y,k)=1)=pd
p(Z(x,y,k)=1|δ(x,y,k)=0)=pf
wherein Z (x, y, k) represents a target detection result of the unmanned aerial vehicle k at (x, y) moment; δ (x, y, k) represents the presence of (x, y) at time instant of target k;
(3) with the search of the unmanned aerial vehicle on the area, the attribute value of the target existence probability map is continuously updated according to the Bayes theorem; the updating formula of the target existence probability map is as follows:
Figure BDA0003557862990000092
wherein phikThe detection range of the unmanned aerial vehicle at the k moment is represented;
as shown in fig. 7, the uncertainty map model is built as follows:
(1) defining uncertainty information of chi (x, y, k) at the time k, and initializing an uncertainty map; the formula is as follows:
χ(x,y,0)=-p(x,y,0)log2 p(x,y,0);
(2) the uncertain map attribute value exponentially decays along with the increase of the number of times of access of the unmanned aerial vehicle, and the formula is as follows:
χ(x,y,k)=τ*χ(x,y,k)
wherein τ ∈ (0,1) represents a decay exponent.
As shown in fig. 3, the mechanism for searching for rewards and heuristics rewards by the drone in step S2 of this embodiment is as follows:
s201, searching for a reward r from rA,rB,rC,rDFour parts, the formula is as follows:
r=w1rA+w2rB+w3rC+w4rD
wherein r isAIndicating a target presence probability value reward; r isBRepresenting an indeterminate value reward; r isCRepresenting a flight cost reward; r isDIndicating a desired search reward;
s202, reward r of probability value of existence of targetAThe setting is as follows:
rA=(1-σ(x,y,k))*p(x,y,k);
Figure BDA0003557862990000101
wherein σ (x, y, k) represents whether the drone considers (x, y) to be present; when p (x, y, k) > p _ max, the drone considers that a target exists at (x, y);
s203, uncertain value reward rBThe setting is as follows:
rB=χ(x,y,k+1)-χ(x,y,k);
s204, flight cost reward rCThe setting is as follows:
Figure BDA0003557862990000102
s205, expecting a search cost rDThe setting is as follows:
Figure BDA0003557862990000103
s206, setting a heuristic reward mechanism: the heuristic reward mechanism attracts the unmanned aerial vehicle to move towards the place where the probability value of the global target is the maximum and to be far away from the map boundary;
s207, the heuristic reward and the search reward jointly form a reward mechanism of heuristic reinforcement learning, and the formula is as follows:
Figure BDA0003557862990000104
wherein D and F represent heuristic factors; d represents the distance from any position of the map to the position with the maximum probability value of the global map target; f represents the minimum distance from the position of the unmanned aerial vehicle on the map to the 4 edges of the map; c and d represent adjustment coefficients.
As shown in fig. 4, the heuristic Q-learning algorithm in step S3 of the present embodiment is as follows:
s301, building a Q table, setting Q (S, a) to an arbitrary value, and setting trace e (S, a) to 0;
s302, circulating:
initializing an unmanned aerial vehicle state s;
loop (for each step within round):
selecting action a according to heuristic action selection strategy according to deep learning Q table
Taking action a to obtain heuristic reward r and next state s'
Figure BDA0003557862990000111
e(s,a)←1
s←s′
Wherein e (s, a) is the qualification track of the unmanned aerial vehicle taking the action a in the s state; gamma is a discount factor; delta is the increment of Q.
Updating value function and qualification trace of all states s' of unmanned aerial vehicle
Q(s,a)←Q(s,a)+αδe(s,a)
e(s,a)←γλe(s,a)
Wherein α is a learning rate; λ is the learning step size.
Until the end of the number of round steps
Until all Q (s, a) converge;
the key codes are as follows:
Repeat:
initialization s
Repeat (for each step in the round)
Selecting action a according to Q table and heuristic action selection strategy
Taking action a to obtain heuristic reward r and next state s'
Figure BDA0003557862990000121
e(s,a)←1
s←s′
For all states s'
Q(s,a)←Q(s,a)+αδe(s,a)
e(s,a)←γλe(s,a)
End of Until round steps
Until all Q (s, a) converge;
s303, outputting a final strategy, wherein the formula is as follows:
Figure BDA0003557862990000122
in the formula, pi*An optimal strategy for the unmanned aerial vehicle; q (s, a) is an action value function value of the unmanned aerial vehicle executing the action a in the state s;
setting a heuristic action selection mechanism, and improving an action selection strategy by utilizing heuristic information of a harmonic function E (s, a) to ensure that the unmanned aerial vehicle drives to a target and has the tendency of the maximum probability value, wherein the formula is as follows:
Figure BDA0003557862990000123
in the formula, E (s, a) is a harmonic function value of the unmanned aerial vehicle executing the action a in the state s; epsilon (0,1) is a random probability value; a israndomRandomly selecting an action for the action set;
when the intelligent agent selects the action in the s state, the action with the maximum sum of the value function and the harmonic function in the state is selected by the probability of epsilon, and the action is randomly selected by the probability of 1-epsilon, wherein the formula is as follows:
Figure BDA0003557862990000124
wherein p iskRepresenting the position of the unmanned aerial vehicle at the moment k; p is a radical ofp_maxIndicating the position with the maximum probability value of the target existence; η represents a positive coefficient for adjusting the magnitude of the harmonic function E (s, a).
The context aware map update in step S4 of this embodiment is specifically as follows:
s401, the unmanned aerial vehicle explores the environment under the guidance of an unmanned aerial vehicle searching method based on a heuristic Q-learning algorithm and enters a new exploration area;
s402, the unmanned aerial vehicle updates the target existence probability map and the uncertain map according to the exploration result of the unmanned aerial vehicle and the updating rule of the environment perception map.
In this embodiment, the attribute value fusion of the multi-drone environment perception map in step S5 is specifically as follows:
in the flight process, the distributed unmanned aerial vehicle carries out information exchange with the unmanned aerial vehicle in a communication range, exchanges environment perception map information, and fuses attribute values of the environment perception map, wherein the formula is as follows:
Figure BDA0003557862990000131
Figure BDA0003557862990000132
wherein n represents the number of drones (including itself) within the communication range; v. ofi,k(x, y) represents attribute values (target existence probability value, uncertainty value, etc.) of the environment perception map at (x, y) of the unmanned aerial vehicle i at the time k; vkAnd (x, y) represents an attribute value of the environment perception map after the information of (x, y) is fused at the time k.
In order to verify the search effect of the heuristic Q-learning algorithm-based unmanned aerial vehicle search method, Matlab simulation is carried out on the method, and the feasibility and the search efficiency of the method are verified, which are specifically as follows:
(1) taking a rectangular search area with the search area of 2km by 2km, and dividing the search area into 20 by 20 square grids with the grid size of 100m by 100 m;
(2) the initial positions of the 3 unmanned aerial vehicles are [450,250], [1750,1250], [950, 450], and the initial positions of the 3 targets are [1450, 250], [800,1500], [1300,700 ];
(3) the unmanned aerial vehicle and the target initial position are distributed as shown in figure 5, the detection range of the unmanned aerial vehicle is 300m x 300m, the detection period is set to be 4s, the detection probability is 0.95, and the false alarm probability is 0.015;
(4) in the heuristic Q-learning algorithm, the discount factor gamma is 0.9, the learning rate alpha is 0.01, the greedy selection coefficient epsilon is 0.6, and the trace discount factor lambda is 0.9; uncertainty map decay index τ is 0.5.
The simulation result shows that after 164s, the unmanned aerial vehicle successfully searches all the targets in the area. As shown in fig. 8, 9 and 10, the target existence probability maps of the drones 1,2 and 3 after the task is finished. The unmanned aerial vehicle can collaboratively cover most areas to be searched, and the coverage search of the areas is completed. The unmanned aerial vehicle collaborative searching method based on the heuristic Q-learning algorithm can better guide the unmanned aerial vehicle to complete the searching task, has better convergence and robustness, and improves the searching efficiency of the unmanned aerial vehicle.
Example 2:
the embodiment provides an unmanned aerial vehicle collaborative search system based on heuristic Q-learning algorithm, which comprises,
the building module I is used for building an unmanned aerial vehicle power model and searching an environment perception map;
the establishment module II is used for establishing an unmanned aerial vehicle search reward and heuristic reward mechanism;
the planning module is used for exploring the search area by using an unmanned aerial vehicle search reward mechanism by adopting a heuristic Q-learning algorithm, planning a search path and realizing the optimal search of the unmanned aerial vehicle;
the updating module is used for updating the search environment perception map according to the search result;
and the fusion module is used for fusing the attribute values of the multi-unmanned aerial vehicle environment perception map, namely, the unmanned aerial vehicle and the unmanned aerial vehicle in a communication range perform information intercommunication, exchange and fuse the perception information of the search environment map.
Example 3:
the present embodiment also provides an electronic device, including: a memory and a processor;
wherein the memory stores computer execution instructions;
the processor executes the computer execution instructions stored in the memory, so that the processor executes the unmanned aerial vehicle collaborative search method based on the heuristic Q-learning algorithm in any embodiment of the invention.
Example 4:
the embodiment also provides a computer-readable storage medium, wherein a plurality of instructions are stored, and the instructions are loaded by the processor, so that the processor executes the unmanned aerial vehicle collaborative search method based on the heuristic Q-learning algorithm in any embodiment of the invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RYM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An unmanned aerial vehicle collaborative search method based on heuristic Q-learning algorithm is characterized by comprising the following steps:
s1, establishing an unmanned aerial vehicle dynamic model and searching an environment perception map;
s2, establishing an unmanned plane search reward and heuristic reward mechanism: designing an excellent unmanned aerial vehicle search reward mechanism based on the unmanned aerial vehicle power model, the environment perception map and the unmanned state established in the step S1;
s3, the unmanned aerial vehicle adopts a heuristic Q-learning algorithm, explores the search area by using the unmanned aerial vehicle search reward mechanism established in the step S2, plans the search path and realizes the optimal search of the unmanned aerial vehicle;
s4, the unmanned aerial vehicle updates the search environment perception map according to the search result of the step S3;
s5, fusing attribute values of the multi-unmanned-aerial-vehicle environment perception map: the unmanned aerial vehicle and the unmanned aerial vehicle in the communication range carry out information intercommunication, exchange and fuse the perception information of the search environment map;
s6, repeating the steps S3-S5 until all the targets are searched.
2. The collaborative unmanned aerial vehicle searching method based on the heuristic Q-learning algorithm of claim 1, wherein the environment search map establishment in step S1 is specifically as follows:
rasterizing the search area into an N M discretization grid map;
the length and width of the grid are dx and dy respectively; dx, dy is the distance for the drone to fly at average flat flight speed for a planned duration.
3. The collaborative unmanned aerial vehicle searching method based on the heuristic Q-learning algorithm of claim 1, wherein the environment search map in step S1 is divided into a target existence probability map and an uncertainty map;
the establishment of the target existence probability map model is as follows:
(1) defining p (x, y, k) epsilon (0,1) to represent the probability that the target is positioned at (x, y) at the moment k; p (x, y,0) ═ 0.5 indicates that the probability that the target exists in (x, y) and does not exist in (x, y) at the initial time is the same, and p (x, y,0) is as follows:
Figure FDA0003557862980000021
wherein v isnRepresenting a target existence probability peak width; c. CnRepresenting a target existence probability peak; (x)n,yn) Representing the possible locations of the objects;
(2) unmanned aerial vehicle has detection probability p in the detection processdAnd false alarm probability pfIt is defined as:
p(Z(x,y,k)=1|δ(x,y,k)=1)=pd
p(Z(x,y,k)=1|δ(x,y,k)=0)=pf
wherein Z (x, y, k) represents a target detection result of the unmanned aerial vehicle k at (x, y) moment; δ (x, y, k) represents the presence of (x, y) at time instant of target k;
(3) with the search of the unmanned aerial vehicle on the area, the attribute value of the target existence probability map is continuously updated according to the Bayes theorem; the updating formula of the target existence probability map is as follows:
Figure FDA0003557862980000022
wherein phikRepresenting the detection range of the unmanned aerial vehicle at the k moment;
the uncertainty map model is established as follows:
(1) defining uncertainty information of chi (x, y, k) at the time k, and initializing an uncertainty map; the formula is as follows:
χ(x,y,0)=-p(x,y,0)log2p(x,y,0);
(2) the uncertain map attribute value is exponentially attenuated along with the increase of the access times of the unmanned aerial vehicle, and the formula is as follows:
χ(x,y,k)=τ*χ(x,y,k)
wherein τ ∈ (0,1) represents a decay exponent.
4. The collaborative unmanned aerial vehicle searching method based on the heuristic Q-learning algorithm of claim 1, wherein the unmanned aerial vehicle search reward and heuristic reward mechanism in step S2 is specifically as follows:
s201, searching for a reward r from rA,rB,rC,rDFour parts, the formula is as follows:
r=w1rA+w2rB+w3rC+w4rD
wherein r isAIndicating that a probability value reward exists for the target; r is a radical of hydrogenBRepresenting an indeterminate value reward; r isCRepresenting a flight cost reward; r isDIndicating a desired search reward;
s202, reward r of probability value of existence of targetAThe setting is as follows:
rA=(1-σ(x,y,k))*p(x,y,k);
Figure FDA0003557862980000031
wherein σ (x, y, k) represents whether the drone considers (x, y) to be present; when p (x, y, k) > p _ max, the drone considers that there is a target at (x, y);
s203, uncertain value reward rBThe setting is as follows:
rB=χ(x,y,k+1)-χ(x,y,k);
s204, flight cost reward rCThe setting is as follows:
Figure FDA0003557862980000032
s205, expecting the exploration cost rDThe setting is as follows:
Figure FDA0003557862980000033
s206, setting a heuristic reward mechanism: the heuristic reward mechanism attracts the unmanned aerial vehicle to move towards the place where the probability value of the global target is the maximum and to be far away from the map boundary;
s207, the heuristic reward and the search reward jointly form a reward mechanism of heuristic reinforcement learning, and the formula is as follows:
Figure FDA0003557862980000041
wherein D and F represent heuristic factors; d represents the distance from any position of the map to the position with the maximum probability value of the global map target; f represents the minimum distance from the position of the unmanned aerial vehicle to the 4 edges of the map; c and d represent adjustment coefficients.
5. The collaborative search method for unmanned aerial vehicles based on heuristic Q-learning algorithm as claimed in claim 1, wherein the heuristic Q-learning algorithm in step S3 is as follows:
s301, building a Q table, setting Q (S, a) to an arbitrary value, and setting trace e (S, a) to 0;
s302, circulating:
initializing an unmanned aerial vehicle state s;
loop (for each step within round):
selecting action a according to heuristic action selection strategy according to deep learning Q table
Taking action a to obtain heuristic reward r and next state s'
Figure FDA0003557862980000042
e(s,a)←1
s←s′
Wherein e (s, a) is the qualification track of the unmanned aerial vehicle taking the action a in the s state; gamma is a discount factor; δ is the increment of Q.
Updating value function and qualification trace of all states s' of unmanned aerial vehicle
Q(s,a)←Q(s,a)+αδe(s,a)
e(s,a)←γλe(s,a)
Wherein α is a learning rate; λ is the learning step size.
Until the end of the number of round steps
Until all Q (s, a) converge;
s303, outputting a final strategy, wherein the formula is as follows:
Figure FDA0003557862980000051
in the formula, pi*An optimal strategy for the unmanned aerial vehicle; q (s, a) is an action value function value of the unmanned aerial vehicle executing the action a in the state s;
setting a heuristic action selection mechanism, and improving an action selection strategy by utilizing heuristic information of a harmonic function E (s, a) to ensure that the unmanned aerial vehicle drives to a target and has the tendency of the maximum probability value, wherein the formula is as follows:
Figure FDA0003557862980000052
in the formula, E (s, a) is a harmonic function value of the unmanned aerial vehicle executing the action a in the state s; epsilon (0,1) is a random probability value; a israndomRandomly selecting an action for the action set;
when the intelligent agent selects the action in the s state, the action with the maximum sum of the value function and the harmonic function in the state is selected by the probability of epsilon, and the action is randomly selected by the probability of 1-epsilon, wherein the formula is as follows:
Figure FDA0003557862980000053
wherein p iskRepresenting the position of the unmanned aerial vehicle at the moment k; p is a radical ofp_maxIndicating the presence of a targetThe position with the maximum probability value; η represents a positive coefficient for adjusting the magnitude of the harmonic function E (s, a).
6. The collaborative unmanned aerial vehicle searching method based on the heuristic Q-learning algorithm of claim 1, wherein the environment-aware map update in step S4 is specifically as follows:
s401, the unmanned aerial vehicle explores the environment under the guidance of an unmanned aerial vehicle searching method based on a heuristic Q-learning algorithm and enters a new exploration area;
s402, the unmanned aerial vehicle updates the target existence probability map and the uncertain map according to the exploration result of the unmanned aerial vehicle and the updating rule of the environment perception map.
7. The collaborative search method for unmanned aerial vehicles based on heuristic Q-learning algorithm of any of claims 1-6, wherein the attribute value fusion of the multi-unmanned aerial vehicle environment perception map in step S5 is specifically as follows:
in the flight process, the distributed unmanned aerial vehicle carries out information exchange with the unmanned aerial vehicle in a communication range, exchanges environment perception map information, and fuses attribute values of the environment perception map, wherein the formula is as follows:
Figure FDA0003557862980000061
Figure FDA0003557862980000062
wherein n represents the number of unmanned aerial vehicles in the communication range; v. ofi,k(x, y) represents the attribute value of the environment perception map at (x, y) at the moment k of the unmanned plane i; vkAnd (x, y) represents an attribute value of the environment perception map after the information of (x, y) is fused at the time k.
8. An unmanned aerial vehicle collaborative search system based on heuristic Q-learning algorithm is characterized by comprising,
the building module I is used for building an unmanned aerial vehicle power model and searching an environment perception map;
the establishment module II is used for establishing an unmanned aerial vehicle search reward and heuristic reward mechanism;
the planning module is used for exploring the search area by using a heuristic Q-learning algorithm and an unmanned aerial vehicle search reward mechanism, planning a search path and realizing optimal search of the unmanned aerial vehicle;
the updating module is used for updating the search environment perception map according to the search result;
and the fusion module is used for fusing the attribute values of the multi-unmanned aerial vehicle environment perception map, namely, the unmanned aerial vehicle and the unmanned aerial vehicle in a communication range perform information intercommunication, exchange and fuse the perception information of the search environment map.
9. An electronic device, comprising: a memory and at least one processor;
wherein the memory has stored thereon a computer program;
the at least one processor executing the memory-stored computer program causes the at least one processor to perform the unmanned aerial vehicle collaborative search method based on the heuristic Q-learning algorithm of any of claims 1 to 7.
10. A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and the computer program is executable by a processor to implement the unmanned aerial vehicle collaborative search method based on the heuristic Q-learning algorithm according to any one of claims 1 to 7.
CN202210281173.XA 2022-03-22 2022-03-22 Unmanned aerial vehicle collaborative search method and system based on heuristic Q-learning algorithm Pending CN114625167A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210281173.XA CN114625167A (en) 2022-03-22 2022-03-22 Unmanned aerial vehicle collaborative search method and system based on heuristic Q-learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210281173.XA CN114625167A (en) 2022-03-22 2022-03-22 Unmanned aerial vehicle collaborative search method and system based on heuristic Q-learning algorithm

Publications (1)

Publication Number Publication Date
CN114625167A true CN114625167A (en) 2022-06-14

Family

ID=81904453

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210281173.XA Pending CN114625167A (en) 2022-03-22 2022-03-22 Unmanned aerial vehicle collaborative search method and system based on heuristic Q-learning algorithm

Country Status (1)

Country Link
CN (1) CN114625167A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472083A (en) * 2023-12-27 2024-01-30 南京邮电大学 Multi-unmanned aerial vehicle collaborative marine search path planning method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171315A (en) * 2017-12-27 2018-06-15 南京邮电大学 Multiple no-manned plane method for allocating tasks based on SMC particle cluster algorithms
CN110196605A (en) * 2019-04-26 2019-09-03 大连海事大学 A kind of more dynamic object methods of the unmanned aerial vehicle group of intensified learning collaboratively searching in unknown sea area
CN111880565A (en) * 2020-07-22 2020-11-03 电子科技大学 Q-Learning-based cluster cooperative countermeasure method
CN111896006A (en) * 2020-08-11 2020-11-06 燕山大学 Path planning method and system based on reinforcement learning and heuristic search
CN112817327A (en) * 2020-12-30 2021-05-18 北京航空航天大学 Multi-unmanned aerial vehicle collaborative search method under communication constraint
CN113110463A (en) * 2021-04-22 2021-07-13 山东新一代信息产业技术研究院有限公司 Delivery service robot system based on intelligent disinfection cabin

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171315A (en) * 2017-12-27 2018-06-15 南京邮电大学 Multiple no-manned plane method for allocating tasks based on SMC particle cluster algorithms
CN110196605A (en) * 2019-04-26 2019-09-03 大连海事大学 A kind of more dynamic object methods of the unmanned aerial vehicle group of intensified learning collaboratively searching in unknown sea area
CN111880565A (en) * 2020-07-22 2020-11-03 电子科技大学 Q-Learning-based cluster cooperative countermeasure method
CN111896006A (en) * 2020-08-11 2020-11-06 燕山大学 Path planning method and system based on reinforcement learning and heuristic search
CN112817327A (en) * 2020-12-30 2021-05-18 北京航空航天大学 Multi-unmanned aerial vehicle collaborative search method under communication constraint
CN113110463A (en) * 2021-04-22 2021-07-13 山东新一代信息产业技术研究院有限公司 Delivery service robot system based on intelligent disinfection cabin

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
NA XIA ,等: "Optimization algorithms in wireless monitoring networks: A survey", NEUROCOMPUTING, 31 December 2022 (2022-12-31), pages 584 *
房立金,等: "基于改进RRT∗ FN算法的机械臂多场景运动规划", 中国机械工程, vol. 32, no. 21, 30 November 2021 (2021-11-30), pages 2590 - 2597 *
方敏,李浩: "基于状态回溯代价分析的启发式Q学习", 模式识别与人工智能, vol. 26, no. 9, 30 September 2013 (2013-09-30), pages 838 - 844 *
程传斌,等: "改进的动态A*-Q-Learning算法及其在无人机航迹规划中的应用", 现代信息科技, vol. 5, no. 9, 10 May 2021 (2021-05-10), pages 1 - 6 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472083A (en) * 2023-12-27 2024-01-30 南京邮电大学 Multi-unmanned aerial vehicle collaborative marine search path planning method
CN117472083B (en) * 2023-12-27 2024-02-23 南京邮电大学 Multi-unmanned aerial vehicle collaborative marine search path planning method

Similar Documents

Publication Publication Date Title
Faust et al. Prm-rl: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning
CN107169608B (en) Distribution method and device for multiple unmanned aerial vehicles to execute multiple tasks
WO2022007179A1 (en) Multi-agv motion planning method, apparatus, and system
CN109597425B (en) Unmanned aerial vehicle navigation and obstacle avoidance method based on reinforcement learning
CN107103164B (en) Distribution method and device for unmanned aerial vehicle to execute multiple tasks
CN110926477A (en) Unmanned aerial vehicle route planning and obstacle avoidance method
CN112859912A (en) Adaptive optimization method and system for unmanned aerial vehicle path planning in relay charging mode
CN114740846A (en) Hierarchical path planning method for topology-grid-metric hybrid map
Sun et al. A cooperative target search method based on intelligent water drops algorithm
CN114261400A (en) Automatic driving decision-making method, device, equipment and storage medium
US20240239484A1 (en) Fast path planning for dynamic avoidance in partially known environments
CN114879716B (en) Law enforcement unmanned aerial vehicle path planning method for countering low-altitude airspace aircraft
CN114625167A (en) Unmanned aerial vehicle collaborative search method and system based on heuristic Q-learning algorithm
CN110647162B (en) Route planning method for tour guide unmanned aerial vehicle, terminal equipment and storage medium
CN116203990A (en) Unmanned plane path planning method and system based on gradient descent method
CN116772846A (en) Unmanned aerial vehicle track planning method, unmanned aerial vehicle track planning device, unmanned aerial vehicle track planning equipment and unmanned aerial vehicle track planning medium
CN114840016B (en) Multi-ant colony search submarine target cooperative path optimization method based on rule heuristic method
Qiu et al. Obstacle avoidance planning combining reinforcement learning and RRT* applied to underwater operations
CN116578080A (en) Local path planning method based on deep reinforcement learning
CN116048126A (en) ABC rapid convergence-based unmanned aerial vehicle real-time path planning method
CN115202359A (en) Unmanned ship path planning method based on reinforcement learning and rapid expansion of random tree
CN114637331A (en) Unmanned aerial vehicle multi-task path planning method and system based on ant colony algorithm
Song et al. UAV Path Planning Based on an Improved Ant Colony Algorithm
Yao et al. Path Planning of Unmanned Helicopter in Complex Environment Based on Heuristic Deep Q‐Network
CN117032247B (en) Marine rescue search path planning method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination