CN114997617B

CN114997617B - Multi-unmanned platform multi-target combined detection task allocation method and system

Info

Publication number: CN114997617B
Application number: CN202210566512.9A
Authority: CN
Inventors: 杨卫东; 王棋; 钟胜; 颜露新; 邹旭; 王伟
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2024-06-07
Anticipated expiration: 2042-05-23
Also published as: CN114997617A

Abstract

The invention discloses a multi-unmanned platform multi-target combined detection task distribution method and a system, wherein the method comprises the following steps: acquiring a target existence probability map of a task area, and carrying out data clustering by using a Gaussian mixture model to divide the task area into a plurality of key subregions; obtaining the area, position and other information of each subarea by utilizing the dividing result; designing a reward function according to the area of each area and the fight capability of the two sides of the friend or foe such as defensive power, attack power and the like; and designing a dynamic update strategy on the basis of the reinforcement learning DDQN network, and distributing task areas for each unmanned platform. The task allocation method provided by the invention converts the allocation of the multi-unmanned platform to the multi-target into the allocation of the multi-unmanned platform to the sub-task area, so that the task scale is reduced, and the complexity is reduced; and comprehensively considering factors such as search time, attack and damage proportion, task distribution uniformity degree and the like, distributing tasks for each unmanned platform, ensuring self safety and ensuring that the unmanned platform can rapidly detect enemy targets.

Description

Multi-unmanned platform multi-target combined detection task allocation method and system

Technical Field

The invention belongs to the technical field of multi-agent task planning, and particularly relates to a multi-unmanned platform multi-target combined detection task distribution method and system.

Background

The multi-unmanned platform collaboration system is superior to a single unmanned platform (e.g., unmanned ship, etc.) in terms of efficiency, performance, robustness, etc. By utilizing the cooperation of multiple unmanned platforms, the method can integrate the search and the enclosure of the enemy targets, improve the task execution efficiency and maximize the income, and has important significance for the maintenance of rights and interests such as the territory, the territory and the like in China. The distribution of enemy targets in performing a probing task tends to be diffuse and uneven, which can reduce the collaborative performance of the unmanned platform. Therefore, it is very important to perform a reasonable task planning before executing the task, so that the advantages and resources of each unmanned platform can be effectively utilized, the system efficiency is maximized, and the task execution efficiency is improved.

The task allocation of the multi-unmanned platform cooperative system is a multi-objective optimization problem, and not only the attack and damage ratio of the two parties of the friend and the foe but also factors of various aspects such as search efficiency, resource consumption and the like are considered. The complexity of the multi-objective optimization problem increases exponentially as the task size becomes larger, and the actual task scenario is complex and variable, and task allocation by the conventional method often needs to be solved from scratch, which can take a lot of time and resources.

Disclosure of Invention

Aiming at the defects and improvement demands of the prior art, the invention discloses a multi-unmanned platform multi-target combined detection task distribution method and a system, and aims to solve the technical problems of high solving complexity and large calculation amount of the existing task distribution method.

In order to achieve the above purpose, the invention provides a multi-unmanned platform multi-target combined detection task allocation method, which comprises the following steps:

s1, acquiring a target existence probability map of a task area, and fitting the target existence probability map by using m Gaussian models, so that the task area is divided into m subtask areas;

S2, obtaining an optimal task allocation result by maximizing accumulated rewards based on a reinforcement learning DDQN network; for any subtask region, the bonus function R _t is expressed as:

Wherein L is a positive number, f _r represents decision correctness, when the number of various unmanned platforms dispatched by the decision exceeds the actual remaining number, or the total number of unmanned platforms dispatched by the current subtask area is 0, the decision error f _r =0, otherwise, the decision correctness f _r＝1;f_a represents attack benefit, f _l represents loss amount, f _s represents search benefit, and w ₁、w₂、w₃、w₄ represents weight coefficient;

the attack benefit f _a is calculated as follows:

Wherein, For the enemy target value of the jth subtask area, j E [1, m ], n is the class number of the unmanned platform,/>For the attack success rate of the ith unmanned platform to the enemy target in the jth subtask area, k _ij is the number of the ith unmanned platforms distributed in the jth subtask area;

The loss f _l is calculated as follows:

Wherein, For the value of class i unmanned platform,/>The attack success rate of an enemy target in the jth subtask area to the ith unmanned platform is achieved;

the search benefit f _s is calculated as follows:

Wherein spow _i is the searching capability of the i-th unmanned platform, S _j is the area of the j-th subtask region, R _j is the distance between the j-th subtask region and the starting point of the unmanned platform, and w ₅ represents the weight coefficient.

Further, dynamically changing DDQN a parameter update interval of a target value network of the network according to rewards obtained by the agent;

the parameter update interval is calculated by the following steps:

Wherein, N _t is the update interval of the time t, step _t is the iteration times of the time t, r _t is the reward obtained at the time t, and r _th is the reward threshold; η is a weight coefficient greater than 0 and equal to or less than 1.

Further, the step S1 includes:

Acquiring a target existence probability map of a task area; roughly dividing a task area by adopting a K-Means method, and obtaining an initial value of a Gaussian mixture model parameter; and estimating each parameter by using a maximum expected algorithm, so as to divide the task area into m subtask areas.

Further, after dividing the m subtask areas, the area of each subtask area, the enemy target value and the distance from the enemy target value to the starting point of the unmanned platform are obtained according to the Gaussian function corresponding to each subtask area.

In another aspect of the present invention, a multi-unmanned platform multi-target joint detection task allocation system is provided, including:

The region dividing module is used for acquiring a target existence probability map of a task region, and fitting the target existence probability map by using m Gaussian models so as to divide the task region into m subtask regions;

the task allocation module is used for obtaining an optimal task allocation result by maximizing accumulated rewards based on the reinforcement learning DDQN network; for any subtask region, the bonus function R _t is expressed as:

the attack benefit f _a is calculated as follows:

The loss f _l is calculated as follows:

the search benefit f _s is calculated as follows:

the parameter update interval is calculated by the following steps:

Further, the area dividing module is specific to acquiring a target existence probability map of the task area; roughly dividing a task area by adopting a K-Means method, and obtaining an initial value of a Gaussian mixture model parameter; and estimating each parameter by using a maximum expected algorithm, so as to divide the task area into m subtask areas.

In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:

(1) According to the invention, the known task area target existence probability map is fitted and clustered, the task area is divided into a plurality of subtask areas containing a plurality of targets, the task distribution problem of the multi-unmanned platform to the multi-enemy targets is converted into the task distribution problem of the multi-unmanned platform to the multi-subtask areas, the task scale is reduced, the calculated amount is reduced, and the solving complexity is reduced. The rewarding function is designed by comprehensively considering various factors such as the attack and defense attributes, the searching capability and the like, so that the intelligent agent can minimize own loss and maximize attack benefits while efficiently completing tasks.

(2) According to the invention, an intelligent updating strategy is designed on the basis of the original DDQN network, the parameter updating interval of the target value network is dynamically changed according to the real-time rewards, the stability of the network is improved, and the task allocation strategy with higher rewards is learned by an intelligent agent.

Drawings

FIG. 1 is a schematic flow chart of a multi-unmanned platform multi-target joint detection task allocation method according to an embodiment of the present invention;

FIG. 2 is a second schematic flow chart of a multi-unmanned platform multi-target joint detection task allocation method according to an embodiment of the present invention;

FIG. 3 is a third flow chart of a multi-unmanned platform multi-target joint detection task allocation method according to an embodiment of the present invention;

FIG. 4 is a graph of initial target existence probabilities provided by an embodiment of the present invention;

FIG. 5 is a graph of probability of existence of a target obtained by Gaussian fitting according to an embodiment of the invention;

FIG. 6 is a plot of the resulting partitions of a Gaussian fit provided by an embodiment of the invention;

FIG. 7 is a graph of average instant prize provided by an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

Deep reinforcement learning is a combination of deep learning and reinforcement learning, and is a method for adjusting self strategies by continuously performing trial and error with environment through an agent. The dynamic planning method can cope with environmental changes and is more suitable for task allocation than the traditional method. Therefore, the invention provides a multi-unmanned platform multi-target joint detection task distribution method and system based on deep reinforcement learning, different profit functions are designed aiming at the movement characteristics of the unmanned platform, the searching efficiency and the attack and damage advantages are considered at the same time when the decision is made, and the high efficiency and the synergy of executing tasks are ensured.

Referring to fig. 1, in combination with fig. 2 and fig. 3, the present invention provides a multi-unmanned platform multi-target joint detection task allocation method, which includes the following steps:

s1, acquiring a target existence probability map of a task area, and fitting the target existence probability map by using m Gaussian models, so that the task area is divided into m subtask areas.

In this embodiment, an initial target existence probability map of the task area is obtained by means of reconnaissance, detection, and the like, as shown in fig. 4, and is rasterized. And setting a plurality of sampling points for the task area, and dividing the sampling points into grids according to the probability of each grid. M gaussian models G _n (x, y) (n=1, 2..m) were used to fit the target presence probability map for the task area, as shown in fig. 5. Each Gaussian model has a specific gravity of w _n andThe target existence probability of the grid (x, y) can be expressed as follows by a gaussian mixture model:

Wherein, the formula of the gaussian density function G _n (x, y) is as follows:

Where p= [ x, y ] ^T represents the center position vector of the grid (x, y). Firstly, a K-Means method is adopted to roughly divide a task area, and an initial value of a Gaussian mixture model parameter w _n、μ_n、C_n is obtained. The final partition is then obtained by estimating the various parameters using the maximum expectation algorithm, as shown in fig. 6. After the partitions are obtained, the area and the regional value of each subtask region and the distance from the starting point of the unmanned platform, namely the origin, can be obtained by the Gaussian function corresponding to each region, as shown in Table 1.

TABLE 1 enemy target information table in subareas

Project	Zone 1	Zone 2	Zone 3
				Area of	108.9372	130.0092	267.7715
Distance of	34.9	56.9	40.2
				Value of	21.44	25.56	52.99
Attack success rate	[0.7,0.7,0.8]	[0.8,0.7,0.8]	[0.8,0.7,0.9]

S2, obtaining the optimal task allocation result by maximizing the accumulated rewards based on the reinforcement learning DDQN network.

Specifically, comprehensively evaluating the detection range of the unmanned platform carrying the detector in unit time to obtain the searching capability of various unmanned platforms; and comprehensively evaluating the attack and defense capabilities of the enemy targets and various unmanned platforms on the my sides to obtain the attack success rate of the various unmanned platforms on the enemy targets in the area, as shown in table 2. And obtaining the attack success rate of the enemy targets in each area to various unmanned platforms on the my side, as shown in table 1.

Table 2 unmanned platform information table

Project	Type 1	Type 2	Type 3
				Search capability	2	1	1
Value of	20	10	30
				Quantity of	2	2	2
Attack success rate	[0.8,0.7,0.7]	[0.7,0.8,0.6]	[0.8,0.7,0.8]

For any subtask region, the bonus function R _t is expressed as:

Wherein L is a positive number, in this embodiment, l=100; f _r represents decision correctness, when the number of various unmanned platforms dispatched by the decision exceeds the actual remaining number, or the total number of unmanned platforms dispatched by the current subtask area is 0, the decision error f _r =0, otherwise, the decision correctness f _r＝1;f_a represents attack benefit, f _l represents loss amount, f _s represents search benefit, and w ₁、w₂、w₃、w₄ represents weight coefficient;

the attack benefit f _a is calculated as follows:

The loss f _l is calculated as follows:

the search benefit f _s is calculated as follows:

Wherein spow _i is the searching capability of the i-th unmanned platform, S _j is the area of the j-th subtask region, R _j is the distance between the j-th subtask region and the starting point of the unmanned platform, and w ₅ represents the weight coefficient. For subtask areas with large areas or long distances, unmanned platforms with higher searching capability or more unmanned platforms are considered to be allocated.

Further, the number of the subtask area to be allocated and the number of the unmanned platforms are combined into a vector, namely a1×4 vector, which is used as the input of the network. The network calculates and outputs a vector of 1×27 dimensions, namely the value of all possible decisions in the current state, and selects the maximum value and converts the corresponding subscript and action. The actions are the number of unmanned platforms of each type dispatched by the subtask area. The network prototype is DDQN network, and the update interval of the target value network is fixed. The exploration force of the early-stage intelligent agent is large, and the network can be converged towards the direction of higher rewards by the small parameter updating interval; however, in the later stage of training, the smaller updating interval makes the parameters of the target value network updated frequently, and network convergence is difficult. Therefore, the invention provides an intelligent updating strategy, namely a strategy that a smaller parameter is given at a more interval in the early stage of network training, so that an intelligent agent can learn a higher rewarding more easily; when the instant benefit increases to a certain value, the parameter updating interval is gradually increased, so that frequent updating of the parameters is avoided, and the stability of the network is improved. The calculation formula is as follows:

Wherein, N _t is the update interval of the time t, step _t is the iteration times of the time t, r _t is the reward obtained at the time t, and r _th is the reward threshold; η is a weight coefficient greater than 0 and equal to or less than 1. In this embodiment, an initial parameter update interval N ₀ =50 is set, and the prize threshold r _th = -2, η=1.

The final distribution results are shown in Table 3, and the final profit map is shown in FIG. 7.

Table 3 distribution results table

The invention also provides a multi-unmanned platform multi-target combined detection task distribution system, which comprises:

And the task allocation module is used for obtaining the optimal task allocation result by maximizing the accumulated rewards based on the reinforcement learning DDQN network. The calculation mode of the reward function for any subtask region is the same as the multi-unmanned platform multi-target combined detection task allocation method, and is not repeated here.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The multi-unmanned platform multi-target combined detection task allocation method is characterized by comprising the following steps of:

the attack benefit f _a is calculated as follows:

Wherein, For the enemy target value of the jth subtask area, j epsilon [1, m ], n is the class number of the unmanned platform,For the attack success rate of the ith unmanned platform to the enemy target in the jth subtask area, k _ij is the number of the ith unmanned platforms distributed in the jth subtask area;

The loss f _l is calculated as follows:

the search benefit f _s is calculated as follows:

2. The multi-unmanned platform multi-target joint detection task allocation method according to claim 1, wherein the parameter update interval of the target value network of the DDQN network is dynamically changed according to rewards obtained by the agent;

the parameter update interval is calculated by the following steps:

3. The multi-unmanned platform multi-target joint detection task allocation method according to claim 1 or 2, wherein the step S1 comprises:

4. The multi-unmanned platform multi-target joint detection task allocation method according to claim 3, wherein after dividing m subtask areas, the area of each subtask area, the enemy target value and the distance from the enemy target value to the starting point of the unmanned platform are obtained according to a Gaussian function corresponding to each subtask area.

5. A multi-unmanned platform multi-target joint detection task allocation system, comprising:

the attack benefit f _a is calculated as follows:

The loss f _l is calculated as follows:

the search benefit f _s is calculated as follows:

6. The multi-unmanned platform multi-target joint detection task allocation system according to claim 5, wherein the parameter update interval of the target value network of the DDQN network is dynamically changed according to rewards obtained by the agent;

the parameter update interval is calculated by the following steps:

7. The multi-unmanned platform multi-target joint detection task allocation system according to claim 5 or 6, wherein the region dividing module is specifically configured to obtain a target existence probability map of a task region; roughly dividing a task area by adopting a K-Means method, and obtaining an initial value of a Gaussian mixture model parameter; and estimating each parameter by using a maximum expected algorithm, so as to divide the task area into m subtask areas.

8. The multi-unmanned platform multi-target joint detection task allocation system according to claim 7, wherein after dividing the m subtask areas, the area of each subtask area, the enemy target value and the distance to the starting point of the unmanned platform are calculated according to the gaussian function corresponding to each subtask area.