CN111414006A

CN111414006A - Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential distribution

Info

Publication number: CN111414006A
Application number: CN202010232017.5A
Authority: CN
Inventors: 王维平; 周鑫; 王彦锋; 井田; 王涛; 李小波; 黄美根; 杨松; 李童心; 段婷; 刘国杰
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2020-07-14
Anticipated expiration: 2040-03-27
Also published as: CN111414006B

Abstract

The scheme relates to an unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential distribution. The method comprises the following steps: acquiring environment information, and generating an undirected graph according to the environment information; generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle; and acquiring a planning algorithm, and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function. Because the undirected graph consists of vertexes and edges, the unmanned aerial vehicle executes tasks on the specified vertexes or in the boundary, and more valuable information can be collected; the global value function is generated through the information state transition model, and then the target execution strategy is calculated according to the planning algorithm, so that the reconnaissance route of each unmanned aerial vehicle can be obtained, and the precision of unmanned aerial vehicle cluster task planning is improved.

Description

Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential distribution

Technical Field

The invention relates to the technical field of unmanned aerial vehicle mission planning, in particular to an unmanned aerial vehicle cluster reconnaissance mission planning method and system based on distributed sequential distribution, computer equipment and a storage medium.

Background

Along with the continuous development of unmanned aerial vehicle technique, unmanned aerial vehicle all plays more and more important effect in civilian field and for military use field. A drone swarm is a typical multi-agent system that can be controlled autonomously or remotely, and that can perform tasks without the need of a pilot. Unmanned aerial vehicles have significant advantages over manned unmanned aerial vehicles in performing boring, dirty, dangerous tasks; compared with a manned airplane, the unmanned aerial vehicle has the characteristics of low cost, small size, strong viability and the like, and the characteristics enable the unmanned aerial vehicle to be used for emergency rescue, so that the unmanned aerial vehicle has a wide prospect. And with the deepening of practical application, the unmanned aerial vehicle emergency rescue develops towards the cluster type and the specialty, and the rescue task in charge is also harder and more complicated. Wherein, many unmanned aerial vehicle independently cooperate control structure to divide into two types usually: centralized control architectures and distributed control architectures. The centralized control method has the advantage of obtaining a global optimal solution, and the distributed control method has the advantages of high reliability, small calculation amount, small communication traffic and the like.

Because the working environment of the unmanned aerial vehicle often changes rapidly and dynamically, especially under complex conditions such as poor communication, the unmanned aerial vehicle cluster often needs to make a decision and execute actions rapidly, and therefore task planning needs to be performed on the unmanned aerial vehicle cluster in advance. The traditional method for carrying out task planning on the unmanned aerial vehicle cluster generally uses a centralized or distributed method to establish different optimization models and process the task planning problem of simple multi-unmanned aerial vehicle multitask.

Disclosure of Invention

Based on this, in order to solve the above technical problems, a distributed sequential allocation-based unmanned aerial vehicle cluster reconnaissance task planning method, a distributed sequential allocation-based unmanned aerial vehicle cluster reconnaissance task planning system, a computer device and a storage medium are provided, so that the precision of unmanned aerial vehicle cluster task planning can be improved.

A distributed sequential distribution-based unmanned aerial vehicle cluster reconnaissance mission planning method comprises the following steps:

acquiring environment information, and generating an undirected graph according to the environment information;

generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;

generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;

and acquiring a planning algorithm, and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.

In one embodiment, the generating an undirected graph according to the environment information includes:

extracting environmental space characteristics in the environmental information;

determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics;

and generating the undirected graph according to the motion boundary and the motion vertex.

In one embodiment, the generating an information state transition model according to the environment information and the undirected graph includes:

acquiring a time step according to the environment information;

acquiring environment state change information according to the time step and the undirected graph;

and generating a state transition matrix based on the Markov chain and the environment state change information, and obtaining the information state transition model.

In one embodiment, the generating a global value function corresponding to the cluster of drones according to the state information includes:

generating total state information of the unmanned aerial vehicle cluster according to the state information;

respectively generating a local return value function of each unmanned aerial vehicle according to the total state information and the state information;

and generating the global value function according to each local return value function.

In one embodiment, the method further comprises:

establishing a TD-POMDP frame according to the state information and the global value function;

the calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function respectively comprises:

and respectively calculating the target execution strategy of each unmanned aerial vehicle through the TD-POMDP framework according to the planning algorithm and the global value function.

An unmanned aerial vehicle cluster mission planning system, the system comprising:

the undirected graph generating module is used for acquiring environment information and generating an undirected graph according to the environment information;

the model generation module is used for generating an information state transition model according to the environment information and the undirected graph and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;

the function generation module is used for generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;

and the calculation module is used for acquiring a planning algorithm and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the unmanned aerial vehicle cluster reconnaissance task planning method, the unmanned aerial vehicle cluster reconnaissance task planning system, the computer equipment and the storage medium based on distributed sequential distribution, the undirected graph is generated according to the environmental information by acquiring the environmental information; generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle; and acquiring a planning algorithm, and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function. Because the undirected graph consists of vertexes and edges, the unmanned aerial vehicle executes tasks on the specified vertexes or in the boundary, and more valuable information can be collected; the global value function is generated through the information state transition model, the target execution strategy is calculated according to the planning algorithm, the reconnaissance route of each unmanned aerial vehicle can be obtained, and therefore the precision of unmanned aerial vehicle cluster task planning can be improved.

Drawings

FIG. 1 is a diagram of an application environment for unmanned aerial vehicle cluster mission planning in one embodiment;

fig. 2 is a schematic flow chart of a method for planning a cluster scout mission of an unmanned aerial vehicle based on distributed sequential allocation in one embodiment;

FIG. 3 is a diagram of a Markov chain-based information state transition model in one embodiment;

FIG. 4 is a schematic diagram of a different number of drone reconnaissance areas in one embodiment;

FIG. 5 is a graphical illustration of a comparison of average return values for algorithms in an experimental scenario;

FIG. 6 is a diagram illustrating comparison of average reward values of algorithms of scene two and scene three in an experiment;

FIG. 7 is a block diagram of an embodiment of unmanned aerial vehicle cluster mission planning architecture;

FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential distribution can be applied to the application environment shown in fig. 1. As shown in fig. 1, the application environment includes a computer device 110 and a drone 120, wherein the computer device 110 and the drone 120 may be connected by a wireless connection. The computer device 110 may obtain the environment information and generate an undirected graph according to the environment information; the computer device 110 may generate an information state transition model according to the environmental information and the undirected graph, and respectively obtain state information of each unmanned aerial vehicle 120 in the unmanned aerial vehicle 120 cluster according to the information state transition model; the computer device 110 may generate a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used to calculate the execution policy of the drone 120; the computer device 110 may obtain the planning algorithm, and respectively calculate the target execution policy of each drone 120 according to the planning algorithm and the global value function. The computer device 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, robots, tablet computers, portable wearable devices, and the like.

In one embodiment, as shown in fig. 2, there is provided a method for unmanned aerial vehicle cluster scout mission planning based on distributed sequential allocation, including the following steps:

step 202, obtaining environment information, and generating an undirected graph according to the environment information.

The environmental information may include characteristics of the physical environment, which may be determined by spatiotemporal characteristics of the physical environment. The spatio-temporal characteristics of the physical environment may include spatial characteristics, temporal characteristics, and the like. The undirected graph can be used for indicating that each edge in the graph is undirected, and the computer device can generate the undirected graph according to the spatial features in the environment information, namely, the undirected graph represents the spatial features in the environment information.

And 204, generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model.

The environmental information may be affected by various factors, such as cloud cover area, rainfall, and actual temperature. Environmental information is different, and unmanned aerial vehicle's state information is also different. The information can be used for representing the degree of change of the interested data, when the interested data in the region change, the uncertainty of the difference between the recorded data and the unknown data is increased, and the acquired state information of the unmanned aerial vehicle also changes. The information state transition model can calculate the state information of each unmanned aerial vehicle according to the collected environment information.

Step 206, generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating the execution strategy of the unmanned aerial vehicle.

The global value function may be used to calculate an execution policy of the drone, where the execution policy of the drone may include a movement area, a movement time, a movement route, and the like of the drone. After the state information of each unmanned aerial vehicle is obtained, the information value of each unmanned aerial vehicle can be calculated, so that the sum of the information values of the unmanned aerial vehicle cluster is obtained, and then a global value function corresponding to the unmanned aerial vehicle cluster is generated according to the sum of the information values.

And 208, acquiring a planning algorithm, and respectively calculating a target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.

The Planning algorithm can be a factorized belief-based Sequential allocation Monte Carlo Planning (FB-SAMCP) algorithm, and the FB-SAMCP algorithm can effectively solve the conflict between the unmanned aerial vehicles and improve the cluster return value of the unmanned aerial vehicles. The target execution policy may be used to represent an optimal policy for the global value function.

In the embodiment, an undirected graph is generated by acquiring the environment information and according to the environment information; generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle; and acquiring a planning algorithm, and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function. Because the undirected graph consists of vertexes and edges, the unmanned aerial vehicle executes tasks on the specified vertexes or in the boundary, and more valuable information can be collected; the global value function is generated through the information state transition model, the target execution strategy is calculated according to the planning algorithm, the reconnaissance route of each unmanned aerial vehicle can be obtained, and therefore the precision of unmanned aerial vehicle cluster task planning can be improved.

In one embodiment, the provided unmanned aerial vehicle cluster scout mission planning method based on distributed sequential allocation may further include a process of generating an undirected graph, where the specific process includes: extracting environmental space characteristics in the environmental information; determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.

The ambient spatial features may be represented as an undirected graph, denoted G ═ V, E >. Wherein, the space coordinate set V is an Euclidean space coordinate; the set of edges E represents the set of edges of all the drone movement boundaries on which the drone can move back and forth. The moving vertex can be used for representing an important point target or a surface target, the size of a target area can be divided by manpower according to a real scene, and the number of the moving vertex can be recorded as | V |. In a real environment, adjacent moving vertices may be unreachable due to weather and terrain constraints.

In one embodiment, the provided unmanned aerial vehicle cluster scout mission planning method based on distributed sequential allocation may further include a process of generating an information state transition model, where the specific process includes: acquiring a time step according to the environment information; acquiring environment state change information according to the time step and the undirected graph; and generating a state transition matrix based on the Markov chain and the environment state change information, and obtaining an information state transition model.

Since the spatiotemporal features of the physical environment may include temporal features, temporal features may be abstracted into discrete time steps, which may be denoted as t ∈ {0,1, 2. }.

The computer equipment can acquire the time step according to the environment information, so that the environment state change information is acquired according to the time step and the undirected graph. Specifically, the environment state change information may include a plurality of levels, and the environment change information may be denoted as I_k∈{I₁,I₂,...,I_NIn which I_kIndicating the kth environmental state change information level, and N indicates the number of environmental state change information. The information state value can be used to quantitatively describe the level of the environmental state change information, denoted as F_k∈{F₁,F₂,...,F_N}. Wherein, F_k＝f(I_k)，f:I_k→R⁺. Environmental State Change information level I when k is larger_kThe higher the number of unknown data contained, i.e. F₁＜F₂＜...＜F_N. Markov chain (Markov) is a stochastic process in probabilistic theory and mathematical statistics of Markov nature and existing in discrete sets of exponents and state spaces, a common method of describing the dynamics of an environment. In this embodiment, it may be assumed that the environmental state change information transition for each moving vertex follows a different, independent, and discrete-time multi-state Markov chain. The computer device may generate a state transition matrix based on the multi-state Markov chain and the environmental state change information, wherein the generated stateThe state transition matrix is:

the state transition matrix may be a random matrix, where p_ijRepresents slave state I_iTransition to State I_jThe probability of (c). In this embodiment, prior information needs to be collected from different information sources before the unmanned aerial vehicle is dispatched to execute a task, and the state transition matrix is assigned after the prior information is preprocessed by a machine learning technique.

In this embodiment, if some of the moving vertices are not visited by the drone, the moving vertex unknowns and information values may increase over time. In general, for two different motion vertices, if one motion vertex has a higher information value at the current time, the vertex may also have a higher information value at the next time. Therefore, the state transition matrix P in the present embodiment may be a monotonic random matrix. I.e., if

Then the two N-dimensional probability vectors x and y satisfy a random dominance, which can be defined as x > y. Furthermore, if P_N＞P_N-1＞...＞P₁Then P may be a monotonic random matrix.

In one embodiment, a Markov chain based information state transition model is illustrated in FIG. 3, where I₁、I₂And I₃Indicating the level of the environmental state change information.

In one embodiment, the drone may be a mobile autonomous entity capable of making decisions and performing actions with the goal of providing accurate and up-to-date situational information. Mark the predetermined area as M and mark the unmanned plane as M_kThen, the drone may collect information in a predetermined area, denoted as G_k＝<V_k,E_k>. Wherein G is_kIs a subgraph of G, the reconnaissance areas of different drones may overlap each other. As shown in fig. 4, the reconnaissance areas of 4 drones and 8 drones are shown in fig. 4, respectively, wherein the black dots represent the moving vertices,the black lines represent the motion boundaries, the triangles represent drones, and the ellipses represent the reconnaissance area of a drone.

In this embodiment, at any moment, each drone is on a certain motion vertex of graph G, and different drones may occupy the same motion vertex at the same time. Each drone moves at its motion vertices and motion boundaries in a predetermined area. The drone may move from the current motion vertex to one of its neighbors every time step. When the drone moves to a motion vertex, information for the motion vertex can be automatically collected. At the same time, the environment state change information level of the motion vertex is reset to I₁In which I₁Indicating that there is no new information at the current time. Due to the limited observation capability of the unmanned aerial vehicle, only the information of the current movement vertex at the current moment can be observed.

In this embodiment, the cooperation performance may be used to represent a ratio of a return value obtained by each drone to a total return value when multiple drones access the same motion vertex at the same time, and is recorded as g: m_k→R⁺,m_k∈ M. wherein the expression of collaboration performance may be:

wherein m is_firstRepresenting the first drone assigned a scout motion vertex. The expression of the cooperative performance shows that if a plurality of unmanned planes simultaneously scout the same moving vertex, the effect is equal to the scout effect of one unmanned plane on the moving vertex.

In one embodiment, the provided unmanned aerial vehicle cluster scout mission planning method based on distributed sequential allocation may further include a process of generating a global value function, where the specific process includes: generating total state information of the unmanned aerial vehicle cluster according to the state information; respectively generating a local return value function of each unmanned aerial vehicle according to the total state information and the state information; and generating a global value function according to each local return value function.

Wherein, the computer equipment can gather the status information of each unmanned aerial vehicle that collects to the total status information of unmanned aerial vehicle cluster is generated.

In an embodiment, the unmanned aerial vehicle cluster scout mission planning method based on distributed sequential allocation further includes a process of calculating a target execution policy according to a TD-POMDP framework, where the specific process includes: establishing a TD-POMDP frame according to the state information and the global value function; and respectively calculating the target execution strategy of each unmanned aerial vehicle through the TD-POMDP framework according to a planning algorithm and a global value function.

Among them, Partially Observable Markov Decision Process (POMDP) is a generalized Markov decision process, and the POMDP architecture can simulate different real-world continuous processes. The TD-POMDP framework can be written as < M, S, A, O, T, Z, R, B, H >.

In the TD-POMDP framework, M ═ M₁,…,m_kIs the set of all drones, where m_kRepresenting the kth drone, K representing the number of drones in the set.

S can be used to represent a state set, and can be decomposed into the information state features of the position state feature and the motion vertex of the drone, and is denoted as S ═<S^V,S^I>. The global state consists of the local states of all drones, and different drones may share the local states. In particular, the method of manufacturing a semiconductor device,

represent unmanned plane m_kInformation state of each moving vertex of (1), let the unmanned plane m_kIs represented as

Wherein | V_kI is the scout area G_kThe number of all moving vertices in (a). In addition, let

And S ═ S^I,S^V]∈ S. unmanned plane m_kIs characterized by the information state of its scout area

Unmanned plane m_kIs that the position state is

To obtain more global return values, the actions of each drone need to be coordinated with the actions of the other drones.

A＝×_kA_kIs the set of actions for all drones, where A_kRepresent unmanned plane m_kThe motion space of (2). One combined action is denoted as a ═ a₁,…a_k],a_k∈A_k. Decision variable a_kUnmanned plane m representing current vertex_kThe action of (2). a is_kIs determined by the topology of the graph.

O＝×_kO_kIs the observation set of all drones, where O_kRepresent unmanned plane m_kOf the observation space. One joint observation is denoted as o ═ o₁,…o_k],

And each unmanned aerial vehicle independently determines the action of the unmanned aerial vehicle according to the local information and the interactive information. The position states of all drones are entirely considerable

When the unmanned plane moves to a motion vertex at the time step t, the unmanned plane can automatically collect the information state of the motion vertex,

however, the drone cannot acquire the information state in other situations, for example, for other time steps or other moving vertices.

T is a joint state transition probability set, T (s (T +1) | s (T), a (T) | Π_kT_k(s_k(t+1)|s_k(t), a (t)). Wherein the content of the first and second substances,

is an unmanned plane m_kObey a multi-state discrete-time Markov chain.

Is an unmanned plane m_kLocal position state transition probability. If s is_k(t +1) is the state

A (t) under the condition, then

On the contrary, the method can be used for carrying out the following steps,

S×A→R⁺Is a decomposable global return value function. Wherein the content of the first and second substances,

is the sum of the information values collected by all drones. Unmanned plane m_kThe local reward value function of (1) is as follows: r_k(a_k,o_k)＝g(m_k)f(I_k). In this embodiment, it is necessary to solve the maximized original value function V_π. Global value function V due to resolvability of the return value function_πMay be factored into the sum of a plurality of local value functions:

wherein h is_k＝[a_k(0),o_k(0),…,a_k(T),o_k(T),…,a_k(T+t),o_k(T＝t)]Is an unmanned plane m_kLocal history of, its dimension h_kIs 2(T + T +1), is determined by the current time step T and the simulation time step T; pi_k＝[a_k(0),a_k(1),…,a_k(H-1)]Is an unmanned plane m_kThe policy of (1); pi ═ pi₁,π₂,…,π_K]Is a joint strategy for all drones;

is an unmanned plane m_kExecution strategy pi_kExpected return value of;

representing an execution policy pi_kA local value function of.

B is a belief comprising an information belief and a location belief, and is marked as B<B^V,B^I>. Let B_kFor unmanned plane m_kLocal beliefs of (1). B is^IIs uncertain, and B^VIs determined. At any time step t, the belief is sufficient statistic for calculating the optimal strategy, and the information states of all the moving vertexes are changed independently. Factorized information beliefs as

Vertex of motion v_iIs that

Variables of

Refers to position v_iHas an information state of I_kConditional probability of time. The factorization beliefs and the number of the movement vertexes are in a linear increasing relation, and the calculation complexity is greatly reduced. Further, the vertex v is moved_iThe prediction formula of the information beliefs is as follows:

wherein the content of the first and second substances,

v' represents the vertex of the motion visited by any drone at time step t.

H∈Z⁺Representing the planning step.

In this embodiment, the Planning algorithm may be a factorized Belief-based Sequential allocation Monte Carlo Planning (FB-SAMCP) algorithm. Can record the set of the prior unmanned aerial vehicles

Note the book

Is m_jA drone outside the set of priority policies, referred to as drone m_jThe policy of the priority drone should be considered when making the decision. In this embodiment, the policy of each drone is to be executed in sequence_kThe sum of the resulting revised expected reward values as a revised global cost function is recorded

Revising a global value function

The calculation formula of (2) is as follows:

wherein the content of the first and second substances,

represents executing pi_kThe function of the local values is revised at a time,

is a revised return value. Revising a global value function

Equivalent to the original global value function V^π(h)。

And V^πThe difference between them is the way of calculation. First, a local value function is revised

The method is sequentially calculated according to the sequence of the unmanned aerial vehicles and is equal to the sum of the revised local value functions of all the unmanned aerial vehicles. Second, the original global value function V^πIs calculated according to time. Specifically, an expected local return value E for each drone at time step t is calculated_π(R_k(t)), and the expected global return values E for all drones_π(R (t)). Original global value function V^πIs the desired expected return value E_π(r (t)) from t ═ 0 to the sum of t ═ H-1.

Each revised local value function depends on local state characteristics that may be affected by other drones. In this embodiment, the impact of other drones is reflected in the return value with a penalty factor. The factorized revised global value function decomposes the global prediction tree into a number of local look-ahead trees.

In this embodiment, the FB-SAMCP algorithm consists of three programs: the program, the search program, and the simulation program are sequentially allocated. And each unmanned aerial vehicle executes the FB-SAMCP algorithm in parallel at each time step, and the coordination of actions is completed after multiple iterations. Wherein the actions of the drones are coordinated after each iteration, i.e. after the search and expansion of the look-ahead tree is completed.

Unmanned plane m_kThe sequential allocation program is executed first, and the number of iterations executed does not exceed K. In each iteration, h is initialized_kThen, the unmanned aerial vehicle executes a search program to obtain a priority strategy set pi thereof_CkOptimal strategy under conditions pi_kSum function V_k. Unmanned plane m_kWill pi_kAnd V_kTransmit to other unmanned aerial vehicles, and receive other unmanned aerial vehicles' pi_(k)And V_(k). Unmanned plane m_kIt is necessary to wait for messages from K-n after the nth iteration. After the comparison of V_(k)And V_kThen, unmanned plane m_kStoring unmanned aerial vehicle corresponding to maximum function and strategy thereof, respectively expressed as m^*And

if the unmanned aerial vehicle corresponding to the maximum function is self, then the unmanned aerial vehicle m_kComplete the search and sum pi_kAs a policy for its current time step; otherwise, unmanned plane m_kM is to be^*And

are added to C separately_kAnd pi_CkIn middle, its belief B_k(h_k) Will be according to pi_CkAnd (5) updating again.

In the search program, based on the priority strategy set pi^*The local look-ahead tree is extended. First, local beliefs B are updated_k(h_k) Will pi^*The local information state beliefs for the visited vertices will reset to Λ

In particular, the information state is sampled and simulated until a termination condition is reached. If the action of the drone is deterministic, the location state and the observation of the location state is also deterministic. Furthermore, observations of the status of information are directly reflected in the reward value. The programming step size is successively reduced in the loop to ensure that the set of priority policies pi is used when executing the simulator G^*With unmanned plane m_kStrategies with the same depth. In the simulator G, two types of conflicts, synchronous repeat count and asynchronous repeat count, need to be considered. The synchronous repeat counting means that when a plurality of unmanned aerial vehicles access the same motion vertex at the same time, each unmanned aerial vehicle receives the return value of the motion vertex. In asynchronous repeat counting, drone m_kDetermining at a time step t₁∈ {0,1, …, H-1}, whereas drone m has visited the motion vertex v_j∈ C has decided at time step t₂∈{0,1,…,H-1}，t₁＜t₂The motion vertex is visited. At this time, unmanned plane m_kExpected reward for visiting a moving vertexIs overestimated because drone m with higher priority is not considered_jIt has been decided to visit the moving vertex. To resolve this conflict, a penalty factor drone p is introduced_ePunishment of unmanned aerial vehicle m_kOverestimated return value, i.e. p_eMeans unmanned plane m_kUnmanned aerial vehicle m is caused to be visited after t1 visit moving vertex v_jIs lost. Penalty factor p_eThe formula at time step t1 is:

wherein R (t)₂) Is not considering the unmanned plane m_kIn case of (2), unmanned plane m_jAccessing the sampled return values of the motion vertices v at time step t 2; while

Considering unmanned plane m_kIn case of (2), unmanned plane m_jThe sampled return values for the motion vertex v are accessed at time step t 2. Let t2 be the time step closest to t1 in the asynchronous iteration calculation. Suppose that at time step t1, the information state of the motion vertex v is I_i∈ I, then the return value is R (t)₁)＝f(I_i). At unmanned aerial vehicle m_kUpon access, the information state will be reset to I₁，I_iAnd I₁Perform Δ t₂＝t₂+t₁The secondary state transition. Let the two information states at time step t2 be represented as

And

therefore, the temperature of the molten metal is controlled,

and

unmanned plane m_kThe revised reward value at t1 is:

in this embodiment, the expected sampling beliefs will tend to true beliefs as the number of samples increases. Furthermore, let the belief of the moving vertex v be b (t) at time step t1₁). The expected raw return value at time step t2 is equal to

The desired revision return value at time step t2 is equal to

The desired penalty factor is

In one embodiment, the process and results of the experiment using the present solution are as follows:

experiments FA-SAMCP was compared to POMCP, TD-FMOP and SA-POMCP. POMCP is the most advanced general online planning algorithm at present. The TD-FMOP algorithm combines MCTS and Max-Sum methods to solve the problem of online planning of a loosely-coupled distributed fixed-connection unmanned aerial vehicle cluster, and frequent communication is needed among the unmanned aerial vehicles to ensure that the overall performance is approximately optimal. SA-POMCP is an extension of FB-SAMCP, and the distinction between SA-POMCP and FB-SAMCP is the way belief is expressed. SA-POMCP is represented using a particle filter, while FB-SAMCP is represented using a factor. SA-POMCP is a general algorithm for solving TD-POMDP. It can be applied to more complex problems where beliefs are difficult to express. Each simulation run was 100 time steps long, and each algorithm was run for 50 runs per scene. Experiments the performance of each algorithm was evaluated by counting the average return and average run time for each round. Let the run time of each round of the algorithm be limited to 30 minutes. All experiments were run on a computer with a 2.6GHz intel dual core CPU and 4GB memory.

The experiment mainly evaluates the influence of expandability on FB-SAMCP, POMCP, TD-FMOP and SA-POMCP, and constructs three scenes which are respectively as follows: scene one: as shown in fig. 5, which has 14 motion vertices and 25 motion boundaries, 4 drones perform a reconnaissance mission in a designated area, each drone having 2 neighbors. Scene two: as shown in fig. 6, which has 40 motion vertices and 83 motion boundaries, 12 drones perform a reconnaissance mission in a designated area, each drone having 2 neighbors. Scene three: as shown in fig. 6, which has 40 motion vertices and 83 motion boundaries, 12 drones perform a reconnaissance mission in a designated area, each drone having 11 neighbors.

The scene is a small-scale unmanned aerial vehicle reconnaissance scene with a weak coupling distribution fixed connection type structure. Compared with the first scene, the cluster in the second scene is still in a weak coupling distribution fixed connection type structure, but the number of the unmanned aerial vehicles is expanded. Compared with the second scenario, in the third scenario, the coupling degree of the unmanned aerial vehicle is expanded from weak coupling to tight coupling. For all scenes, the planning step H of the drone is 3 time steps. Each motion vertex has three information states, with the information state value vector set to F ═ 0,1,2]Corresponding to the information state I ═ I₁,I₂,I₃]。

For scenario one, FIG. 5 depicts the average reward value. The experimental results show that the return values of FB-SAMCP are respectively 6.0%, 15.0% and 8.3% better than the return values of POMCP in the simulation of 50 times of sampling, 500 times of sampling and 5000 times of sampling. In addition, the FB-SAMCP reported values were slightly lower than the TD-FMOP for 50 samples, but about 2.4% over the TD-FMOP for 100 samples and about 5.4% over the TD-FMOP for 1000 simulations. The performance of FB-SAMCP is slightly better than SA-POMCP for all scenarios.

Table 1 describes the run time of these algorithms in scenario one, where the "-" symbol indicates the result of exceeding the time limit and memory overflow, and NoS indicates the number of samples. The run time of POMCP is much lower than that of FB-SAMCP. FB-SAMCP has a run time of about one third that of TD-FMOP, which is 2 times that of FB-SAMCP in all simulations. Furthermore, TD-FMOP exceeded the time limit in 5000 sampling simulation experiments.

Table 1:

Nos	TD-FMOP	FB-SAMCP	POMCP	SA-POMCP
					50	23.9	9.1	0.8	4.5
100	50.0	17.9	1.5	9.5
					500	267.7	89.0	7.2	45.7
1000	549.2	176.3	14.0	97.4
					5000	-	777.5	75.3	473.4

and the second scenario evaluates the influence of the expandability of the number of the unmanned aerial vehicles on the performance of the algorithm. FIG. 6 depicts the average reward value for each algorithm at different sampling times in scenario two. When the POMCP is operated, the result cannot be calculated due to insufficient memory of a computer. Although the average reward value of FB-SAMCP is 97.0% of TD-FMOP in 50 samples, it is 3.5% higher than the average reward value of TD-FMOP in 500 samples and 1000 samples. The average recovery value of FB-SAMCP is similar to that of SA-POMCP.

Table 2 describes the average run time of several algorithms in scenario two, where the symbol "-" indicates the result of exceeding the time limit and memory overflow. Similar to the results in scenario one, FB-SAMCP has a run time of approximately twice the SA-POMCP run time, but approximately one third the TD-FMOP. In fact, performing TD-FMOP takes a lot of time since the drones need to perform frequent communication and action synchronization when making joint decisions.

Table 2:

Nos	TD-FMOP	FB-SAMCP	POMCP	SA-POMCP
					50	84.6	25.9	-	13.9
100	157.4	50.3	-	26.9
					500	841.3	238.6	-	143.8
1000	1716.3	456.3	-	272.2

and a third scene evaluates the influence of the expandability of the coupling degree of the unmanned aerial vehicle on the algorithm. Table 3 and table 4 show the average run time and average return values, respectively. When the POMCP is operated, the result cannot be calculated due to insufficient memory of a computer. From the results, it is known that the average return of FB-SAMCP is similar to that of SA-POMCP, while the average operation time is higher than that of SA-POMCP.

Table 3:

Nos	TD-FMOP	FB-SAMCP	POMCP	SA-POMCP
					50	-	106.4	-	63.9
100	-	208.5	-	126.3
					500	-	1043.0	-	584.7

table 4:

Nos	TD-FMOP	FB-SAMCP	POMCP	SA-POMCP
					50	-	1115.5	-	1108.1
100	-	1153.5	-	1150.2
					500	-	1206.4	-	1197.5

when constructing the look-ahead tree of the POMCP, the joint action of all unmanned aerial vehicles needs to be considered; while the actions of the neighbor drones need to be considered when building the look-ahead tree of TD-FMOP. For FB-SAMCP and SA-popcp, the local look-ahead tree for each drone has a lower branching factor because it only includes the actions of the drone itself. Therefore, FB-SAMCP and SA-POMCP still have excellent performance in a small number of sampling times. In addition, due to the lower branching factor, FB-SAMCP and SA-POMCP have better scalability in the number and coupling degree of the drones than POMCP.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the above-described flowcharts may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 7, there is provided an unmanned aerial vehicle cluster mission planning system, including: undirected graph generation module 710, model generation module 720, function generation module 730, and computation module 740, wherein:

and an undirected graph generating module 710, configured to obtain the environment information, and generate an undirected graph according to the environment information.

And the model generation module 720 is configured to generate an information state transition model according to the environmental information and the undirected graph, and obtain state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model.

The function generation module 730 is configured to generate a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating the execution strategy of the unmanned aerial vehicle.

The calculating module 740 is configured to obtain a planning algorithm, and calculate a target execution policy of each drone according to the planning algorithm and the global value function.

In one embodiment, the undirected graph generation module 710 comprises a feature extraction module, an information determination module, and an image generation module, wherein:

and the characteristic extraction module is used for extracting the environmental space characteristics in the environmental information.

And the information determining module is used for determining the motion boundary and the motion vertex of the unmanned aerial vehicle according to the environmental space characteristics.

And the image generation module is used for generating an undirected graph according to the motion boundary and the motion vertex.

In one embodiment, the function generation module 730 includes a step size obtaining module, an information obtaining module, and a matrix generation module, wherein:

and the step length obtaining module is used for obtaining the time step length according to the environment information.

And the information acquisition module is used for acquiring the environment state change information according to the time step and the undirected graph.

And the matrix generation module is used for generating a state transition matrix based on the Markov chain and the environment state change information and obtaining an information state transition model.

In one embodiment, the function generation module 730 is further configured to generate total status information of the drone cluster according to the status information; respectively generating a local return value function of each unmanned aerial vehicle according to the total state information and the state information; and generating a global value function according to each local return value function.

In an embodiment, the provided unmanned aerial vehicle cluster mission planning system further includes a frame establishing module, configured to establish a TD-POMDP frame according to the state information and the global value function; the calculating module 740 is further configured to calculate the target execution policy of each drone through the TD-POMDP framework according to the planning algorithm and the global value function.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for unmanned aerial vehicle cluster scout mission planning based on distributed sequential allocation. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;

In one embodiment, the processor, when executing the computer program, further performs the steps of: extracting environmental space characteristics in the environmental information; determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a time step according to the environment information; acquiring environment state change information according to the time step and the undirected graph; and generating a state transition matrix based on the Markov chain and the environment state change information, and obtaining an information state transition model.

In one embodiment, the processor, when executing the computer program, further performs the steps of: generating total state information of the unmanned aerial vehicle cluster according to the state information; respectively generating a local return value function of each unmanned aerial vehicle according to the total state information and the state information; and generating a global value function according to each local return value function.

In one embodiment, the processor, when executing the computer program, further performs the steps of: establishing a TD-POMDP frame according to the state information and the global value function; and respectively calculating the target execution strategy of each unmanned aerial vehicle through the TD-POMDP framework according to a planning algorithm and a global value function.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of: extracting environmental space characteristics in the environmental information; determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a time step according to the environment information; acquiring environment state change information according to the time step and the undirected graph; and generating a state transition matrix based on the Markov chain and the environment state change information, and obtaining an information state transition model.

In one embodiment, the computer program when executed by the processor further performs the steps of: generating total state information of the unmanned aerial vehicle cluster according to the state information; respectively generating a local return value function of each unmanned aerial vehicle according to the total state information and the state information; and generating a global value function according to each local return value function.

In one embodiment, the computer program when executed by the processor further performs the steps of: establishing a TD-POMDP frame according to the state information and the global value function; and respectively calculating the target execution strategy of each unmanned aerial vehicle through the TD-POMDP framework according to a planning algorithm and a global value function.

It will be understood by those of ordinary skill in the art that all or a portion of the processes of the methods of the embodiments described above may be implemented by a computer program that may be stored on a non-volatile computer-readable storage medium, which when executed, may include the processes of the embodiments of the methods described above, wherein any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An unmanned aerial vehicle cluster reconnaissance mission planning method based on distributed sequential distribution is characterized by comprising the following steps:

2. The method of claim 1, wherein the generating an undirected graph from the context information comprises:

3. The method of claim 1, wherein generating an information state transition model based on the environmental information and the undirected graph comprises:

acquiring a time step according to the environment information;

4. The method of claim 1, wherein generating a global value function corresponding to the cluster of drones from the state information comprises:

5. The method of claim 1, further comprising:

6. An unmanned aerial vehicle cluster mission planning system, the system comprising:

7. The system according to claim 6, wherein the undirected graph generation module comprises:

the characteristic extraction module is used for extracting environmental space characteristics in the environmental information;

the information determining module is used for determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics;

and the image generation module is used for generating the undirected graph according to the motion boundary and the motion vertex.

8. The system of claim 6, wherein the model generation module comprises:

the step length obtaining module is used for obtaining a time step length according to the environment information;

the information acquisition module is used for acquiring environment state change information according to the time step length and the undirected graph;

and the matrix generation module is used for generating a state transition matrix based on the Markov chain and the environment state change information and obtaining the information state transition model.

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.