CN111414006B

CN111414006B - Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation

Info

Publication number: CN111414006B
Application number: CN202010232017.5A
Authority: CN
Inventors: 王维平; 周鑫; 王彦锋; 井田; 王涛; 李小波; 黄美根; 杨松; 李童心; 段婷; 刘国杰
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2023-09-08
Anticipated expiration: 2040-03-27
Also published as: CN111414006A

Abstract

The scheme relates to an unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation. The method comprises the following steps: acquiring environment information and generating an undirected graph according to the environment information; generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle; and acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function. Because the undirected graph consists of vertexes and edges, the unmanned aerial vehicle executes tasks on the designated vertexes or in the boundaries, and more valuable information can be collected; and generating a global value function through the information state transition model, and calculating a target execution strategy according to a planning algorithm to obtain a reconnaissance route of each unmanned aerial vehicle, thereby improving the accuracy of unmanned aerial vehicle cluster task planning.

Description

Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation

Technical Field

The invention relates to the technical field of unmanned aerial vehicle task planning, in particular to an unmanned aerial vehicle cluster reconnaissance task planning method, system, computer equipment and storage medium based on distributed sequential allocation.

Background

With the continuous development of unmanned aerial vehicle technology, unmanned aerial vehicles play an increasingly important role in both civil and military fields. Unmanned aerial vehicle clusters are a typical multi-agent system that can be controlled autonomously or remotely to perform tasks without pilots. Compared with unmanned aerial vehicles, unmanned aerial vehicles have outstanding advantages in performing boring, messy, dangerous tasks; compared with a manned aircraft, the unmanned aircraft has the characteristics of low cost, small volume, strong survivability and the like, and the characteristics enable the unmanned aircraft to be used for emergency rescue to have a wide prospect. Along with the deep practical application, unmanned aerial vehicle emergency rescue develops towards clustered and professional, and the responsible rescue task is also becoming harder and more complex. Among them, the multi-unmanned aerial vehicle autonomous cooperative control structure is generally classified into two types: centralized control architecture and distributed control architecture. The centralized control method has the advantage of obtaining the global optimal solution, and the distributed control method has the advantages of high reliability, less calculated amount, small communication amount and the like.

Because the working environment of the unmanned aerial vehicle is often changed dynamically and rapidly, especially under complex conditions such as poor communication, the unmanned aerial vehicle cluster is often required to make decisions and execute actions rapidly, and therefore task planning is required to be performed on the unmanned aerial vehicle cluster in advance. The traditional method for task planning of the unmanned aerial vehicle clusters generally uses a centralized or distributed method, different optimization models are established, the task planning problem of multiple tasks of a plurality of simple unmanned aerial vehicles is solved, and the method is only suitable for the unmanned aerial vehicle clusters with small-scale unmanned aerial vehicle clusters or unmanned aerial vehicle clusters with weak coupling structures.

Disclosure of Invention

Based on the above, in order to solve the above technical problems, the present invention provides a method, a system, a computer device and a storage medium for unmanned aerial vehicle cluster reconnaissance task planning based on distributed sequential allocation, which can improve the precision of unmanned aerial vehicle cluster task planning.

An unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation, the method comprising:

acquiring environment information and generating an undirected graph according to the environment information;

generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;

Generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;

and acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function.

In one embodiment, the generating an undirected graph according to the environmental information includes:

extracting environmental space characteristics in the environmental information;

determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics;

and generating the undirected graph according to the motion boundary and the motion vertex.

In one embodiment, the generating an information state transition model according to the environment information and the undirected graph includes:

acquiring a time step according to the environmental information;

acquiring environmental state change information according to the time step and the undirected graph;

and generating a state transition matrix based on the Markov chain and the environmental state change information, and obtaining the information state transition model.

In one embodiment, the generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information includes:

Generating total state information of the unmanned aerial vehicle cluster according to the state information;

generating a local return value function of each unmanned aerial vehicle through the total state information and the state information;

and generating the global value function according to each local return value function.

In one embodiment, the method further comprises:

establishing a TD-POMDP framework according to the state information and the global value function;

the calculating, according to the planning algorithm and the global value function, the target execution policy of each unmanned aerial vehicle includes:

and respectively calculating target execution strategies of the unmanned aerial vehicle through the TD-POMDP framework according to the planning algorithm and the global value function.

A unmanned aerial vehicle cluster mission planning system, the system comprising:

the undirected graph generating module is used for acquiring the environment information and generating an undirected graph according to the environment information;

the model generation module is used for generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;

The function generation module is used for generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;

and the calculation module is used for acquiring a planning algorithm and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.

A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

According to the unmanned aerial vehicle cluster reconnaissance task planning method, the unmanned aerial vehicle cluster reconnaissance task planning system, the unmanned aerial vehicle cluster reconnaissance task planning computer equipment and the storage medium based on the distributed sequential allocation, environment information is acquired, and an undirected graph is generated according to the environment information; generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle; and acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function. Because the undirected graph consists of vertexes and edges, the unmanned aerial vehicle executes tasks on the designated vertexes or in the boundaries, and more valuable information can be collected; and generating a global value function through the information state transition model, and calculating a target execution strategy according to a planning algorithm to obtain a reconnaissance route of each unmanned aerial vehicle, so that the accuracy of unmanned aerial vehicle cluster task planning can be improved.

Drawings

FIG. 1 is an application environment diagram of unmanned aerial vehicle cluster mission planning in one embodiment;

fig. 2 is a flow chart of a method for planning a cluster reconnaissance task of an unmanned aerial vehicle based on distributed sequential allocation in one embodiment;

FIG. 3 is a schematic diagram of a Markov chain-based information state transition model in one embodiment;

FIG. 4 is a schematic illustration of a different number of drone reconnaissance areas in one embodiment;

FIG. 5 is a diagram showing a comparison of average return values of an algorithm of a scenario in an experiment;

FIG. 6 is a diagram showing the comparison of the average return values of the algorithms of scene two and scene three in the experiment;

FIG. 7 is a block diagram of a drone cluster mission plan in one embodiment;

fig. 8 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation provided by the embodiment of the application can be applied to an application environment shown in fig. 1. As shown in fig. 1, the application environment includes a computer device 110 and a drone 120, where the computer device 110 and the drone 120 may be connected wirelessly. The computer device 110 may obtain the environmental information and generate an undirected graph based on the environmental information; the computer device 110 may generate an information state transition model according to the environmental information and the undirected graph, and obtain the state information of each unmanned aerial vehicle 120 in the unmanned aerial vehicle 120 cluster according to the information state transition model; the computer device 110 may generate a global value function corresponding to the unmanned aerial vehicle cluster from the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle 120; the computer device 110 may obtain the planning algorithm, and calculate the target execution policy of each of the unmanned aerial vehicles 120 according to the planning algorithm and the global value function. The computer device 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, robots, tablet computers, portable wearable devices, and the like.

In one embodiment, as shown in fig. 2, a method for planning a scout task of a cluster of unmanned aerial vehicles based on distributed sequential allocation is provided, including the following steps:

and 202, acquiring environment information and generating an undirected graph according to the environment information.

The environmental information may include characteristics of the physical environment, which may be determined by spatiotemporal characteristics of the physical environment. The spatiotemporal features of the physical environment may include spatial features, temporal features, and the like. The undirected graph may be used to represent that each edge in the graph is undirected, and the computer device may generate the undirected graph according to the spatial features in the environmental information, i.e. the undirected graph may be used to represent the spatial features in the environmental information.

And 204, generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model.

The environmental information may be affected by various factors, such as cloud cover area, rainfall, and actual temperature, all of which affect the environmental information. The environmental information is different, and the status information of the unmanned aerial vehicle is also different. The information may be used to indicate the degree of change of the data of interest, and when the data of interest in the area is changed, uncertainty of the difference between the recorded data and the unknown data will increase, and the acquired state information of the unmanned aerial vehicle will also change. The information state transition model can calculate the state information of each unmanned aerial vehicle according to the collected environmental information.

Step 206, generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle.

The global value function can be used for calculating an execution strategy of the unmanned aerial vehicle, wherein the execution strategy of the unmanned aerial vehicle can comprise a movement area, movement time, movement route and the like of the unmanned aerial vehicle. After the state information of each unmanned aerial vehicle is obtained, the information value of each unmanned aerial vehicle can be calculated, so that the sum of the information values of the unmanned aerial vehicle clusters is obtained, and then a global value function corresponding to the unmanned aerial vehicle clusters is generated according to the sum of the information values.

Step 208, a planning algorithm is obtained, and target execution strategies of the unmanned aerial vehicles are calculated according to the planning algorithm and the global value function.

The planning algorithm can be a sequential distribution Monte Carlo planning (Factored Belief based Sequential Allocated Monte Carlo Planning, FB-SAMCP) algorithm based on factorization beliefs, and the FB-SAMCP algorithm can effectively solve conflicts among unmanned aerial vehicles and improve the cluster return value of the unmanned aerial vehicles. The target execution policy may be used to represent an optimal policy for the global value function.

In the embodiment, an undirected graph is generated by acquiring environment information and according to the environment information; generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle; and acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function. Because the undirected graph consists of vertexes and edges, the unmanned aerial vehicle executes tasks on the designated vertexes or in the boundaries, and more valuable information can be collected; and generating a global value function through the information state transition model, and calculating a target execution strategy according to a planning algorithm to obtain a reconnaissance route of each unmanned aerial vehicle, so that the accuracy of unmanned aerial vehicle cluster task planning can be improved.

In one embodiment, the unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation may further include a process of generating an undirected graph, and the specific process includes: extracting environmental space features in the environmental information; determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.

The environmental spatial features may be represented as undirected graphs, denoted g= < V, E >. Wherein the space coordinate set V is Euclidean space coordinates; the edge set E represents the set of edges of all unmanned aerial vehicle motion boundaries on which the unmanned aerial vehicle can reciprocate. The motion vertexes can be used for representing important point targets or surface targets, the target area size can be divided manually according to real scenes, and the number of the vertex points of the motion vertexes can be recorded as |V|. In a practical environment, adjacent motion vertices may not be reachable due to weather and terrain constraints.

In one embodiment, the unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation may further include a process of generating an information state transition model, and the specific process includes: acquiring a time step according to the environmental information; acquiring environmental state change information according to the time step and the undirected graph; based on the Markov chain and the environmental state change information, a state transition matrix is generated, and an information state transition model is obtained.

Since the spatiotemporal features of a physical environment may include temporal features, the temporal features may be abstracted into discrete time steps, which may be denoted as t e {0,1, 2. The information state may change once per time step, each drone moving to an adjacent vertex or staying in place. To ensure that each drone is at the vertex at any time t, the time step may be set to the longest time that all drones complete a pack-with-de-cycle (OODA). For example, 1 time step in the simulation environment may be set to 10 minutes in the real environment if each drone is able to complete data collection, data preprocessing, decision making, and flight to the next target area within 10 minutes.

The computer device may obtain the time step according to the environmental information, thereby obtaining environmental state change information according to the time step and the undirected graph. Specifically, the environmental state change information may include a plurality of levels, and the environmental state change information may be denoted as I _k ∈{I ₁ ,I ₂ ,...,I _N }, wherein I _k Represents the kth environmental state change information level, and N represents the number of environmental state change information. The information status value may be used to quantitatively describe the level of environmental status change information, denoted as F _k ∈{F ₁ ,F ₂ ,...,F _N }. Wherein F is _k ＝f(I _k )，f:I _k →R ⁺ . When k is larger, the environmental state change information level I _k The higher the content, the more unknown data is contained, i.e. F ₁ ＜F ₂ ＜...＜F _N . Markov chains (Markov) are a stochastic process in probability theory and numerical statistics that has Markov properties and that exist within discrete sets of indices and state spaces, and are a common method of describing environmental dynamics. In this embodiment, it may be assumed that the ambient state change information transition for each moving vertex is a multi-state Markov chain of different, independent and discrete times. The computer device may generate a state transition matrix based on the multi-state markov chain and the environmental state change information, wherein the generated state transition matrix is:the state transition matrix may be a random matrix, where p _ij Representing slave state I _i Transition to State I _j Is a probability of (2). In this embodiment, prior information needs to be collected from different information sources before the unmanned aerial vehicle is dispatched to perform tasks, and the state transition matrix can be assigned after the prior information is preprocessed through a machine learning technology.

In this embodiment, if some of the moving vertices are not accessed by the drone, the moving vertex unknown data and information values may increase over time. Typically, for two different motion vertices, if If a moving vertex has a higher information value at the current time, the vertex may also have a higher information value at the next time. Therefore, the state transition matrix P in the present embodiment may be a monotonic random matrix. I.e. ifThen the two N-dimensional probability vectors x and y satisfy a random dominance, which can be defined as x > y. In addition, if P _N ＞P _N-1 ＞...＞P ₁ Then P may be a monotonic random matrix.

In one embodiment, a Markov chain-based information state transition model is shown in FIG. 3, wherein I ₁ 、I ₂ And I ₃ Indicating the level of the environmental state change information.

In one embodiment, the drone may be a mobile autonomous entity capable of making decisions and performing actions with the purpose of providing accurate and up-to-date situational information. Marking a predetermined area as M, marking the unmanned aerial vehicle as M _k Then the drone may be marked G to collect information in a predetermined area _k ＝<V _k ,E _k >. Wherein G is _k The sub-graph of G, the reconnaissance areas of different drones may overlap each other. As shown in fig. 4, 4 unmanned aerial vehicles and 8 unmanned aerial vehicle scout areas are shown in fig. 4, respectively, wherein black dots represent motion vertices, black lines represent motion boundaries, triangles represent unmanned aerial vehicles, and ellipses represent unmanned aerial vehicle scout areas.

In this embodiment, at any time, each unmanned aerial vehicle is on a certain motion vertex of the graph G, and different unmanned aerial vehicles may occupy the same motion vertex at the same time. Each drone moves with its motion vertices and motion boundaries in its predetermined area. Each time step of the drone may be moved from the current motion vertex to its adjacent one. When the unmanned aerial vehicle moves to the motion vertex, the information of the motion vertex can be automatically collected. At the same time, the environmental state change information level of the motion vertex will be reset to I ₁ Wherein I ₁ Indicating that there is no new information at the current time. Because of the limited observation capability of unmanned aerial vehicle, only the unmanned aerial vehicle can be observedAnd (5) observing the information of the current motion vertex at the current moment.

In this embodiment, the cooperative performance may be used to represent the ratio of the return value obtained by each unmanned aerial vehicle to the total return value, denoted as g: m, when multiple unmanned aerial vehicles access the same motion vertex at the same time _k →R ⁺ ,m _k E.m. Wherein, the expression of the collaboration performance may be:wherein m is _first Representing the first drone assigned a scout motion vertex. The expression of the cooperative performance shows that if a plurality of unmanned aerial vehicles simultaneously scout the same motion vertex, the effect is equivalent to the scout effect of one unmanned aerial vehicle on the motion vertex.

In one embodiment, the unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation may further include a process of generating a global value function, and the specific process includes: generating total state information of the unmanned aerial vehicle cluster according to the state information; generating local return value functions of each unmanned aerial vehicle respectively through the total state information and the state information; and generating a global value function according to each local return value function.

The computer equipment can collect the collected state information of each unmanned aerial vehicle, so that total state information of the unmanned aerial vehicle cluster is generated.

In one embodiment, the unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation may further include a process of calculating a target execution policy according to a TD-POMDP framework, and the specific process includes: establishing a TD-POMDP framework according to the state information and the global value function; and respectively calculating target execution strategies of each unmanned aerial vehicle through the TD-POMDP framework according to the planning algorithm and the global value function.

Among other things, the partially observable markov decision process (Partially Observable Markov Decision Process, POMDP) is a generalized markov decision process, and the POMDP architecture can simulate different real world continuous processes. The TD-POMDP framework may be denoted as < M, S, A, O, T, Z, R, B, H >.

In the TD-POMDP framework, M= { M ₁ ,…,m _k Is the set of all drones, where m _k Represents the kth unmanned aerial vehicle, and K represents the number of unmanned aerial vehicles in the set.

S can be used to represent a state set, and can be decomposed into a positional state feature of the unmanned aerial vehicle and an information state feature of the motion vertex, denoted as s=<S ^V ,S ^I >. The global state consists of the local states of all drones, and different drones may share the local states. In particular, the method comprises the steps of,representing unmanned plane m _k Information state of each motion vertex of the unmanned plane m _k All information states in the scout area of (1) are denoted +.>Wherein |V _k I is the scout area G _k The number of all motion vertices in the model. In addition, let->Sum s= [ S ] ^I ,S ^V ]E S. Unmanned plane m _k Is characterized by the information status of its scout area +.>Unmanned plane m _k Is that the position state is +.>In order to obtain more global return values, the actions of each drone need to be coordinated with the actions of the other drones.

A＝× _k A _k Is a set of actions for all unmanned aerial vehicles, wherein A _k Representing unmanned plane m _k Is a mobile space of the system. The joint action is denoted as a= [ a ] ₁ ,…a _k ],a _k ∈A _k . Decision variable a _k Representing the current roofDot unmanned plane m _k Acts of (1). a, a _k Is determined by the topology of the graph.

O＝× _k O _k Is an observation set of all unmanned aerial vehicles, wherein O _k Representing unmanned plane m _k Is a space for observation of the object. The joint observation is noted as o= [ o ₁ ,…o _k ],And each unmanned aerial vehicle independently determines own actions according to the local information and the interaction information. The position status of all unmanned aerial vehicles is completely considerable +.>When the unmanned plane moves to a moving vertex at the time step t, the unmanned plane can automatically collect the information state of the moving vertex,/>However, the drone cannot acquire information status in other situations, such as for other time steps or other motion vertices.

T is a joint state transition probability set, T (s (t+1) |s (T), a (T))=n _k T _k (s _k (t+1)|s _k (t), a (t)). Wherein, the liquid crystal display device comprises a liquid crystal display device,is unmanned plane m _k Is subject to a Markov chain of multi-state discrete times. />Is unmanned plane m _k Is a local position state transition probability of (c). If s is _k (t+1) is the state->Targets of action a (t) under the condition +.>On the contrary, let(s)>

Z is a joint observation transition probability set, Z (o (t+1) |a (t), s (t))=pi _k Z _k (o _k (t)|a _k (t),s _k (t)). If o _k (t)＝s _k (t), then Z _k (o _k (t)|a _k (t),s _k (t))=1; contrarily Z _k (o _k (t)|a _k (t),s _k (t))＝0。

S×A→R ⁺ Is a decomposable global return value function. Wherein, the liquid crystal display device comprises a liquid crystal display device,is the sum of the information values collected by all unmanned aerial vehicles. Unmanned plane m _k The local return value function of (2) is as follows: r is R _k (a _k ,o _k )＝g(m _k )f(I _k ). In this embodiment, the maximized original value function V needs to be solved _π . Due to the resolvable nature of the reward value function, the global value function V _π Can be factored into the sum of a plurality of local value functions: />Wherein h is _k ＝[a _k (0),o _k (0),…,a _k (T),o _k (T),…,a _k (T+t),o _k (T＝t)]Is unmanned plane m _k Of the local history of its dimension h _k 2 (t+T+1), determined by the current time step T and the simulation time step T; pi _k ＝[a _k (0),a _k (1),…,a _k (H-1)]Is unmanned plane m _k Is a policy of (2); pi= [ pi ] ₁ ,π ₂ ,…,π _K ]Is a combined strategy of all unmanned aerial vehicles; />Is unmanned plane m _k Executing policy pi _k Is a desired return value for (1); />Representing execution policy pi _k Is a function of the local value of (a).

B is belief, including information belief and location belief, noted as b=<B ^V ,B ^I >. Let B _k For unmanned plane m _k Is a local belief of (c). B (B) ^I Is uncertain, and B ^V Is determined. At any time step t, the belief is a sufficient statistic to calculate the optimal strategy, and the information states of all the motion vertices change independently. Factorized information belief is written asMotion vertex v _i Is +.>Variable->Refers to position v _i The information state of (a) is I _k Conditional probability at that time. The factorization belief is in a linear growth relation with the number of motion vertexes, so that the calculation complexity is greatly reduced. Further, the motion vertex v _i The predictive formula of the information belief is: / >Wherein, the liquid crystal display device comprises a liquid crystal display device,v' represents the motion vertex visited by any one of the robots at time step t.

H∈Z ⁺ Representing the planning step size.

In this embodiment, the planning algorithm may be a sequential assignment Monte Carlo planning (Factored Belief based Sequential Allocated Monte Carlo Planning, FB-SAMCP) algorithm based on factorized beliefs. Can mark priority unmanned aerial vehicle collection asRecord->Is m _j Unmanned aerial vehicle outside the priority policy set, refer to unmanned aerial vehicle m _j The policy of the priority drone should be considered in making the decision. In this embodiment, the policy pi for each drone will be executed in sequence _k The sum of the later obtained revised expected return values as a revised global cost function is recorded asRevising the global value function +.>The calculation formula of (2) is as follows: />Wherein (1)>Representing execution pi _k Time-revised local value function->Is a revised return value. Revising the global value function +.>Equivalent to the original global value function V ^π (h)。And V ^π The difference between these is in the way of computation. First, the local value function is revised +.>Is calculated in turn according to the sequence of the unmanned aerial vehicle, and is equal to the sum of all the revised local value functions of the unmanned aerial vehicle. Second, the original global value function V ^π Is calculated from time. Specifically, the expected local return value E of each unmanned aerial vehicle at the time step t is calculated _π (R _k (t)), and the expected global return value E for all unmanned aerial vehicles _π (R (t)). Original global value function V ^π Is the expected return value E _π (R (t)) from t=0 to t=h-1.

Each revised local value function depends on a local state feature, which may be affected by other drones. In this embodiment, the effect of other drones is reflected in the return value with penalty factors. The factored revised global value function decomposes the global prediction tree into several local look-ahead trees.

In this embodiment, the FB-SAMCP algorithm consists of three programs: a sequential distribution program, a search program, and a simulation program. Each unmanned aerial vehicle executes the FB-SAMCP algorithm in parallel at each time step, and coordination of actions is completed after a plurality of iterations. Wherein the actions of the drone are coordinated after each iteration, i.e. after the search and expansion of the look-ahead tree is completed.

Unmanned plane m _k The sequential allocation procedure is first executed, and the number of iterations executed does not exceed K. In each iteration, when initializing h _k Then, the unmanned aerial vehicle executes a search program to obtain a priority strategy set pi _Ck Optimal strategy pi under conditions _k Sum function V _k . Unmanned plane m _k Will pi _k And V _k Transmitting to other unmanned aerial vehicles and receiving pi of other unmanned aerial vehicles _(k) And V _(k) . Unmanned plane m _k It is necessary to wait for messages from K-n after the nth iteration. After the comparison of V _(k) And V is equal to _k Thereafter, unmanned plane m _k Unmanned aerial vehicle corresponding to storage maximum function and strategy thereof, which are respectively expressed as m ^* Andif the unmanned aerial vehicle corresponding to the maximum function is self, unmanned aerial vehicle m _k Complete the search and let pi _k As a strategy for its current time step; otherwise, unmanned plane m _k Let m ^* And->Respectively add to C _k And pi _Ck In its belief B _k (h _k ) Will be according to pi _Ck And (5) updating again.

In the search procedure, pi is based on the priority policy set ^* The local look-ahead tree is extended. First update local belief B _k (h _k ) Pi is set ^* The local information state belief of the visited vertex will be reset to Λ. Secondly, calculating an H-step optimal strategySpecifically, the information state is sampled and simulated until a termination condition is reached. If the unmanned action is determined, the position status and the observation of the position status are also determined. In addition, observations of the information status are reflected directly in the reward value. The programming step size is successively reduced in the loop to ensure that the policy set pi is prioritized when executing simulator G ^* With unmanned plane m _k Strategies with the same depth. In simulator G, two types of conflicts, synchronous repetition count and asynchronous repetition count, need to be considered. The synchronous repetition count refers to that when a plurality of unmanned aerial vehicles access the same motion vertex at the same time, each unmanned aerial vehicle receives a return value of the motion vertex. In asynchronous repetition counting, unmanned plane m _k Determining at time step t ₁ E {0,1, …, H-1} accessing the motion vertex v, and unmanned m _j E C has decided at time step t ₂ ∈{0,1,…,H-1}，t ₁ ＜t ₂ Is used to access the motion vertices. At this time, unmanned plane m _k The expected return to accessing the motion vertices is overestimated because the higher priority drone m is not considered _j It has been decided to access the motion vertex. To resolve the conflict, a penalty factor is introduced, unmanned plane p _e Punishment unmanned plane m _k Overestimated return value, i.e. p _e Refers to unmanned plane m _k Creating drone m after t1 accesses motion vertex v _j Is a loss of (2). Penalty factor p _e The formula at time step t1 is:wherein R (t) ₂ ) Is not considered in unmanned plane m _k In the case of unmanned plane m _j Accessing a sampling return value of the motion vertex v at a time step t 2; but->In consideration of unmanned plane m _k In the case of unmanned plane m _j The sampled return value of the motion vertex v is accessed at time step t 2. Let t2 be the time step closest to t1 in the asynchronous repeat calculation. Assume that at time step t1, the information state of motion vertex v is I _i E I, then the return value is R (t ₁ )＝f(I _i ). In unmanned plane m _k After access, the information state will be reset to I ₁ ，I _i And I ₁ Execution of Δt ₂ ＝t ₂ +t ₁ Secondary state transitions. Let the two information states at time step t2 be represented as And->Thus (S)>And->Unmanned plane m _k The revised return value at t1 is:in this embodiment, as the number of samples increases, the expected sampling beliefs will tend towards true beliefs. Furthermore, let the belief of the motion vertex v be b (t ₁ ). The expected original return value at time step t2 is equal toThe desired revised return value at time step t2 is equal to +.>The expected penalty factor is->

In one embodiment, the process and results of the test performed using the present technique are as follows:

experiments compare FA-SAMCP with POMCP, TD-FMOP and SA-POMCP. POMCP is the most advanced general online planning algorithm at present. The TD-FMOP algorithm combines the MCTS method and the Max-Sum method to solve the problem of online planning of the loosely-coupled distributed and fixedly-connected unmanned aerial vehicle clusters, and frequent communication is needed between unmanned aerial vehicles to ensure that the overall performance is approximately optimal. SA-POMCP is an extension of FB-SAMCP, and the distinction between SA-POMCP and FB-SAMCP is the difference in the way the beliefs are expressed. SA-POMCP is represented using a particle filter, while FB-SAMCP is represented using a factor. SA-POMCP is a general algorithm for solving TD-POMDP. It can be applied to more complex problems where beliefs are difficult to express. Each simulation run was 100 time steps long and 50 runs per algorithm in each scenario. The experiment evaluates the performance of each algorithm by counting the average return value and average run time per round. The run time of each round of algorithm was limited to 30 minutes. All experiments were run on a computer with 2.6GHz intel dual core CPU and 4GB memory.

Experiments mainly evaluate the influence of expandability on FB-SAMCP, POMCP, TD-FMOP and SA-POMCP, and three scenes are constructed, namely: scene one: as shown in fig. 5, the figure has 14 motion vertices and 25 motion boundaries, and 4 drones each have 2 neighbors to perform a reconnaissance task in a designated area. Scene II: as shown in fig. 6, the figure has 40 motion vertices and 83 motion boundaries, and 12 drones each have 2 neighbors to perform a reconnaissance task in a designated area. Scene III: as shown in fig. 6, the figure has 40 motion vertices and 83 motion boundaries, and 12 drones each have 11 neighbors to perform a reconnaissance task in a designated area.

The first scene is a small-scale unmanned aerial vehicle reconnaissance scene with a weakly-coupled distributed and fixedly-connected structure. Compared with the first scene, the second scene is concentratedThe group is still a weakly coupled distributed and fixedly connected structure, but the number of unmanned aerial vehicles is expanded. In case three, the degree of coupling of the drone is extended from weak coupling to tight coupling, as compared to case two. The unmanned plane's planning step H is 3 time steps for all scenarios. Each motion vertex has three information states, and the information state value vector is set to be F= [0,1,2 ] ]Corresponding to the information state i= [ I ] ₁ ,I ₂ ,I ₃ ]。

For scenario one, FIG. 5 depicts an average return value. Experimental results show that the return values of FB-SAMCP are respectively better than the return values of POMCP by 6.0%, 15.0% and 8.3% in 50 samples, 500 samples and 5000 samples. In addition, the return value for FB-SAMCP was slightly lower than TD-FMOP in 50 samples, but exceeded TD-FMOP by about 2.4% in 100 samples and by about 5.4% in 1000 simulations. For all scenarios, FB-SAMCP performs slightly better than SA-POMCP.

Table 1 depicts the run time of these algorithms in scenario one, where the symbol "-" indicates the result of exceeding the time limit and memory overflow, and NoS indicates the number of samples. The POMCP has a much lower runtime than the FB-SAMCP. The run time of FB-SAMCP is about one third of TD-FMOP, which is 2 times that of FB-SAMCP in all simulations. In addition, TD-FMOP exceeded the time limit in 5000 sampling simulation experiments.

Table 1:

Nos	TD-FMOP	FB-SAMCP	POMCP	SA-POMCP
					50	23.9	9.1	0.8	4.5
100	50.0	17.9	1.5	9.5
					500	267.7	89.0	7.2	45.7
1000	549.2	176.3	14.0	97.4
					5000	-	777.5	75.3	473.4

and the second scene evaluates the influence of the expandability of the number of unmanned aerial vehicles on the algorithm performance. Fig. 6 depicts the average return value for each algorithm at different sampling times in scenario two. When POMCP is operated, the result cannot be calculated due to insufficient memory of a computer. Although the average return value of FB-SAMCP is 97.0% for TD-FMOP in 50 samples, it is 3.5% higher than the average return value for TD-FMOP in 500 samples and 1000 samples. In addition, the average return value of the FB-SAMCP is similar to that of the SA-POMCP.

Table 2 depicts the average run time of several algorithms in scenario two, where the symbol "-" indicates the result of exceeding the time limit and memory overflow. Similar to the results in scenario one, the runtime of FB-SAMCP is about twice the runtime of SA-POMCP, but about one third of TD-FMOP. In fact, performing TD-FMOP requires a significant amount of time because of the frequent communication and action synchronization that the drone needs to make in making joint decisions.

Table 2:

Nos	TD-FMOP	FB-SAMCP	POMCP	SA-POMCP
					50	84.6	25.9	-	13.9
100	157.4	50.3	-	26.9
					500	841.3	238.6	-	143.8
1000	1716.3	456.3	-	272.2

and thirdly, evaluating the influence of the expandability of the coupling degree of the unmanned aerial vehicle on the algorithm. Tables 3 and 4 show the average run time and average return values, respectively. When POMCP is operated, the result cannot be calculated due to insufficient memory of a computer. From the results, it is known that the average return of FB-SAMCP is similar to that of SA-POMCP, and the average running time is higher than that of SA-POMCP.

Table 3:

Nos	TD-FMOP	FB-SAMCP	POMCP	SA-POMCP
					50	-	106.4	-	63.9
100	-	208.5	-	126.3
					500	-	1043.0	-	584.7

table 4:

Nos	TD-FMOP	FB-SAMCP	POMCP	SA-POMCP
					50	-	1115.5	-	1108.1
100	-	1153.5	-	1150.2
					500	-	1206.4	-	1197.5

when constructing the look-ahead tree of the POMCP, the joint action of all unmanned aerial vehicles needs to be considered; and when constructing the look-ahead tree of the TD-FMOP, the actions of the neighbor unmanned aerial vehicle need to be considered. For FB-SAMCP and SA-POMCP, each unmanned aerial vehicle's local look-ahead tree has a lower branching factor, because it only includes the unmanned aerial vehicle's own actions. Thus, FB-SAMCP and SA-POMCP still have excellent performance in a small number of sampling times. In addition, due to the lower branching factor, FB-SAMCP and SA-POMCP have better scalability than POMCP in terms of number and coupling of unmanned aerial vehicles.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described above may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the sub-steps or stages of other steps or other steps.

In one embodiment, as shown in fig. 7, there is provided a unmanned aerial vehicle cluster mission planning system, including: an undirected graph generation module 710, a model generation module 720, a function generation module 730, and a calculation module 740, wherein:

the undirected graph generating module 710 is configured to obtain the environmental information, and generate an undirected graph according to the environmental information.

The model generating module 720 is configured to generate an information state transition model according to the environmental information and the undirected graph, and obtain state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model.

A function generating module 730, configured to generate a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle.

The calculating module 740 is configured to obtain a planning algorithm, and calculate target execution policies of each unmanned aerial vehicle according to the planning algorithm and the global value function.

In one embodiment, the undirected graph generation module 710 includes a feature extraction module, an information determination module, and an image generation module, wherein:

and the feature extraction module is used for extracting the environmental spatial features in the environmental information.

And the information determining module is used for determining the movement boundary and the movement vertex of the unmanned aerial vehicle according to the environmental spatial characteristics.

And the image generation module is used for generating an undirected graph according to the motion boundary and the motion vertex.

In one embodiment, the function generation module 730 includes a step size acquisition module, an information acquisition module, and a matrix generation module, where:

and the step length acquisition module is used for acquiring the time step length according to the environment information.

And the information acquisition module is used for acquiring the environment state change information according to the time step and the undirected graph.

And the matrix generation module is used for generating a state transition matrix based on the Markov chain and the environmental state change information and obtaining an information state transition model.

In one embodiment, the function generating module 730 is further configured to generate total status information of the unmanned aerial vehicle cluster according to the status information; generating local return value functions of each unmanned aerial vehicle respectively through the total state information and the state information; and generating a global value function according to each local return value function.

In one embodiment, the provided unmanned aerial vehicle cluster task planning system further comprises a frame building module, a data processing module and a data processing module, wherein the frame building module is used for building a TD-POMDP frame according to the state information and the global value function; the calculation module 740 is further configured to calculate, according to the planning algorithm and the global value function, a target execution policy of each unmanned aerial vehicle through the TD-POMDP framework.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by the processor is used for realizing a unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in FIG. 8 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;

In one embodiment, the processor when executing the computer program further performs the steps of: extracting environmental space features in the environmental information; determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.

In one embodiment, the processor when executing the computer program further performs the steps of: acquiring a time step according to the environmental information; acquiring environmental state change information according to the time step and the undirected graph; based on the Markov chain and the environmental state change information, a state transition matrix is generated, and an information state transition model is obtained.

In one embodiment, the processor when executing the computer program further performs the steps of: generating total state information of the unmanned aerial vehicle cluster according to the state information; generating local return value functions of each unmanned aerial vehicle respectively through the total state information and the state information; and generating a global value function according to each local return value function.

In one embodiment, the processor when executing the computer program further performs the steps of: establishing a TD-POMDP framework according to the state information and the global value function; and respectively calculating target execution strategies of each unmanned aerial vehicle through the TD-POMDP framework according to the planning algorithm and the global value function.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of: extracting environmental space features in the environmental information; determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a time step according to the environmental information; acquiring environmental state change information according to the time step and the undirected graph; based on the Markov chain and the environmental state change information, a state transition matrix is generated, and an information state transition model is obtained.

In one embodiment, the computer program when executed by the processor further performs the steps of: generating total state information of the unmanned aerial vehicle cluster according to the state information; generating local return value functions of each unmanned aerial vehicle respectively through the total state information and the state information; and generating a global value function according to each local return value function.

In one embodiment, the computer program when executed by the processor further performs the steps of: establishing a TD-POMDP framework according to the state information and the global value function; and respectively calculating target execution strategies of each unmanned aerial vehicle through the TD-POMDP framework according to the planning algorithm and the global value function.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. The unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation is characterized by comprising the following steps of:

acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information, wherein the global value function comprises the following steps: generating total state information of the unmanned aerial vehicle cluster according to the state information; generating local return value functions of each unmanned aerial vehicle respectively through the total state information and the state information; generating a global value function according to each local return value function;

generating an information state transition model according to the environment information and the undirected graph, wherein the information state transition model comprises the following steps: acquiring a time step according to the environmental information;

generating a state transition matrix based on the Markov chain and the environmental state change information, and obtaining an information state transition model;

according to the planning algorithm and the global value function, respectively calculating target execution strategies of each unmanned aerial vehicle, wherein the target execution strategies comprise:

and respectively calculating target execution strategies of each unmanned aerial vehicle through the TD-POMDP framework according to the planning algorithm and the global value function.

2. The method of claim 1, wherein generating the undirected graph based on the environmental information comprises:

extracting environmental space features in the environmental information;

determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.

3. Unmanned aerial vehicle cluster reconnaissance mission planning system based on distributed sequential distribution, which is characterized in that the system comprises:

the calculation module is used for acquiring a planning algorithm and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information, wherein the global value function comprises the following steps: generating total state information of the unmanned aerial vehicle cluster according to the state information; generating local return value functions of each unmanned aerial vehicle respectively through the total state information and the state information; generating a global value function according to each local return value function;

according to a planning algorithm and a global value function, respectively calculating target execution strategies of each unmanned aerial vehicle through a TD-POMDP framework;

a model generation module comprising:

the step length acquisition module is used for acquiring a time step length according to the environmental information;

the information acquisition module is used for acquiring environmental state change information according to the time step and the undirected graph;

4. A system according to claim 3, characterized in that the undirected graph generating module comprises: the feature extraction module is used for extracting the environmental space features in the environmental information;

the information determining module is used for determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics;

5. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 2 when executing the computer program.

6. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 2.