CN111414006B - Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation - Google Patents

Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation Download PDF

Info

Publication number
CN111414006B
CN111414006B CN202010232017.5A CN202010232017A CN111414006B CN 111414006 B CN111414006 B CN 111414006B CN 202010232017 A CN202010232017 A CN 202010232017A CN 111414006 B CN111414006 B CN 111414006B
Authority
CN
China
Prior art keywords
unmanned aerial
information
aerial vehicle
generating
value function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010232017.5A
Other languages
Chinese (zh)
Other versions
CN111414006A (en
Inventor
王维平
周鑫
王彦锋
井田
王涛
李小波
黄美根
杨松
李童心
段婷
刘国杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202010232017.5A priority Critical patent/CN111414006B/en
Publication of CN111414006A publication Critical patent/CN111414006A/en
Application granted granted Critical
Publication of CN111414006B publication Critical patent/CN111414006B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • G05D1/104Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The scheme relates to an unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation. The method comprises the following steps: acquiring environment information and generating an undirected graph according to the environment information; generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle; and acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function. Because the undirected graph consists of vertexes and edges, the unmanned aerial vehicle executes tasks on the designated vertexes or in the boundaries, and more valuable information can be collected; and generating a global value function through the information state transition model, and calculating a target execution strategy according to a planning algorithm to obtain a reconnaissance route of each unmanned aerial vehicle, thereby improving the accuracy of unmanned aerial vehicle cluster task planning.

Description

Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation
Technical Field
The invention relates to the technical field of unmanned aerial vehicle task planning, in particular to an unmanned aerial vehicle cluster reconnaissance task planning method, system, computer equipment and storage medium based on distributed sequential allocation.
Background
With the continuous development of unmanned aerial vehicle technology, unmanned aerial vehicles play an increasingly important role in both civil and military fields. Unmanned aerial vehicle clusters are a typical multi-agent system that can be controlled autonomously or remotely to perform tasks without pilots. Compared with unmanned aerial vehicles, unmanned aerial vehicles have outstanding advantages in performing boring, messy, dangerous tasks; compared with a manned aircraft, the unmanned aircraft has the characteristics of low cost, small volume, strong survivability and the like, and the characteristics enable the unmanned aircraft to be used for emergency rescue to have a wide prospect. Along with the deep practical application, unmanned aerial vehicle emergency rescue develops towards clustered and professional, and the responsible rescue task is also becoming harder and more complex. Among them, the multi-unmanned aerial vehicle autonomous cooperative control structure is generally classified into two types: centralized control architecture and distributed control architecture. The centralized control method has the advantage of obtaining the global optimal solution, and the distributed control method has the advantages of high reliability, less calculated amount, small communication amount and the like.
Because the working environment of the unmanned aerial vehicle is often changed dynamically and rapidly, especially under complex conditions such as poor communication, the unmanned aerial vehicle cluster is often required to make decisions and execute actions rapidly, and therefore task planning is required to be performed on the unmanned aerial vehicle cluster in advance. The traditional method for task planning of the unmanned aerial vehicle clusters generally uses a centralized or distributed method, different optimization models are established, the task planning problem of multiple tasks of a plurality of simple unmanned aerial vehicles is solved, and the method is only suitable for the unmanned aerial vehicle clusters with small-scale unmanned aerial vehicle clusters or unmanned aerial vehicle clusters with weak coupling structures.
Disclosure of Invention
Based on the above, in order to solve the above technical problems, the present invention provides a method, a system, a computer device and a storage medium for unmanned aerial vehicle cluster reconnaissance task planning based on distributed sequential allocation, which can improve the precision of unmanned aerial vehicle cluster task planning.
An unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation, the method comprising:
acquiring environment information and generating an undirected graph according to the environment information;
generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
Generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function.
In one embodiment, the generating an undirected graph according to the environmental information includes:
extracting environmental space characteristics in the environmental information;
determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics;
and generating the undirected graph according to the motion boundary and the motion vertex.
In one embodiment, the generating an information state transition model according to the environment information and the undirected graph includes:
acquiring a time step according to the environmental information;
acquiring environmental state change information according to the time step and the undirected graph;
and generating a state transition matrix based on the Markov chain and the environmental state change information, and obtaining the information state transition model.
In one embodiment, the generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information includes:
Generating total state information of the unmanned aerial vehicle cluster according to the state information;
generating a local return value function of each unmanned aerial vehicle through the total state information and the state information;
and generating the global value function according to each local return value function.
In one embodiment, the method further comprises:
establishing a TD-POMDP framework according to the state information and the global value function;
the calculating, according to the planning algorithm and the global value function, the target execution policy of each unmanned aerial vehicle includes:
and respectively calculating target execution strategies of the unmanned aerial vehicle through the TD-POMDP framework according to the planning algorithm and the global value function.
A unmanned aerial vehicle cluster mission planning system, the system comprising:
the undirected graph generating module is used for acquiring the environment information and generating an undirected graph according to the environment information;
the model generation module is used for generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
The function generation module is used for generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and the calculation module is used for acquiring a planning algorithm and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.
A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring environment information and generating an undirected graph according to the environment information;
generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring environment information and generating an undirected graph according to the environment information;
generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function.
According to the unmanned aerial vehicle cluster reconnaissance task planning method, the unmanned aerial vehicle cluster reconnaissance task planning system, the unmanned aerial vehicle cluster reconnaissance task planning computer equipment and the storage medium based on the distributed sequential allocation, environment information is acquired, and an undirected graph is generated according to the environment information; generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle; and acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function. Because the undirected graph consists of vertexes and edges, the unmanned aerial vehicle executes tasks on the designated vertexes or in the boundaries, and more valuable information can be collected; and generating a global value function through the information state transition model, and calculating a target execution strategy according to a planning algorithm to obtain a reconnaissance route of each unmanned aerial vehicle, so that the accuracy of unmanned aerial vehicle cluster task planning can be improved.
Drawings
FIG. 1 is an application environment diagram of unmanned aerial vehicle cluster mission planning in one embodiment;
fig. 2 is a flow chart of a method for planning a cluster reconnaissance task of an unmanned aerial vehicle based on distributed sequential allocation in one embodiment;
FIG. 3 is a schematic diagram of a Markov chain-based information state transition model in one embodiment;
FIG. 4 is a schematic illustration of a different number of drone reconnaissance areas in one embodiment;
FIG. 5 is a diagram showing a comparison of average return values of an algorithm of a scenario in an experiment;
FIG. 6 is a diagram showing the comparison of the average return values of the algorithms of scene two and scene three in the experiment;
FIG. 7 is a block diagram of a drone cluster mission plan in one embodiment;
fig. 8 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation provided by the embodiment of the application can be applied to an application environment shown in fig. 1. As shown in fig. 1, the application environment includes a computer device 110 and a drone 120, where the computer device 110 and the drone 120 may be connected wirelessly. The computer device 110 may obtain the environmental information and generate an undirected graph based on the environmental information; the computer device 110 may generate an information state transition model according to the environmental information and the undirected graph, and obtain the state information of each unmanned aerial vehicle 120 in the unmanned aerial vehicle 120 cluster according to the information state transition model; the computer device 110 may generate a global value function corresponding to the unmanned aerial vehicle cluster from the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle 120; the computer device 110 may obtain the planning algorithm, and calculate the target execution policy of each of the unmanned aerial vehicles 120 according to the planning algorithm and the global value function. The computer device 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, robots, tablet computers, portable wearable devices, and the like.
In one embodiment, as shown in fig. 2, a method for planning a scout task of a cluster of unmanned aerial vehicles based on distributed sequential allocation is provided, including the following steps:
and 202, acquiring environment information and generating an undirected graph according to the environment information.
The environmental information may include characteristics of the physical environment, which may be determined by spatiotemporal characteristics of the physical environment. The spatiotemporal features of the physical environment may include spatial features, temporal features, and the like. The undirected graph may be used to represent that each edge in the graph is undirected, and the computer device may generate the undirected graph according to the spatial features in the environmental information, i.e. the undirected graph may be used to represent the spatial features in the environmental information.
And 204, generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model.
The environmental information may be affected by various factors, such as cloud cover area, rainfall, and actual temperature, all of which affect the environmental information. The environmental information is different, and the status information of the unmanned aerial vehicle is also different. The information may be used to indicate the degree of change of the data of interest, and when the data of interest in the area is changed, uncertainty of the difference between the recorded data and the unknown data will increase, and the acquired state information of the unmanned aerial vehicle will also change. The information state transition model can calculate the state information of each unmanned aerial vehicle according to the collected environmental information.
Step 206, generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle.
The global value function can be used for calculating an execution strategy of the unmanned aerial vehicle, wherein the execution strategy of the unmanned aerial vehicle can comprise a movement area, movement time, movement route and the like of the unmanned aerial vehicle. After the state information of each unmanned aerial vehicle is obtained, the information value of each unmanned aerial vehicle can be calculated, so that the sum of the information values of the unmanned aerial vehicle clusters is obtained, and then a global value function corresponding to the unmanned aerial vehicle clusters is generated according to the sum of the information values.
Step 208, a planning algorithm is obtained, and target execution strategies of the unmanned aerial vehicles are calculated according to the planning algorithm and the global value function.
The planning algorithm can be a sequential distribution Monte Carlo planning (Factored Belief based Sequential Allocated Monte Carlo Planning, FB-SAMCP) algorithm based on factorization beliefs, and the FB-SAMCP algorithm can effectively solve conflicts among unmanned aerial vehicles and improve the cluster return value of the unmanned aerial vehicles. The target execution policy may be used to represent an optimal policy for the global value function.
In the embodiment, an undirected graph is generated by acquiring environment information and according to the environment information; generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle; and acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function. Because the undirected graph consists of vertexes and edges, the unmanned aerial vehicle executes tasks on the designated vertexes or in the boundaries, and more valuable information can be collected; and generating a global value function through the information state transition model, and calculating a target execution strategy according to a planning algorithm to obtain a reconnaissance route of each unmanned aerial vehicle, so that the accuracy of unmanned aerial vehicle cluster task planning can be improved.
In one embodiment, the unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation may further include a process of generating an undirected graph, and the specific process includes: extracting environmental space features in the environmental information; determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.
The environmental spatial features may be represented as undirected graphs, denoted g= < V, E >. Wherein the space coordinate set V is Euclidean space coordinates; the edge set E represents the set of edges of all unmanned aerial vehicle motion boundaries on which the unmanned aerial vehicle can reciprocate. The motion vertexes can be used for representing important point targets or surface targets, the target area size can be divided manually according to real scenes, and the number of the vertex points of the motion vertexes can be recorded as |V|. In a practical environment, adjacent motion vertices may not be reachable due to weather and terrain constraints.
In one embodiment, the unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation may further include a process of generating an information state transition model, and the specific process includes: acquiring a time step according to the environmental information; acquiring environmental state change information according to the time step and the undirected graph; based on the Markov chain and the environmental state change information, a state transition matrix is generated, and an information state transition model is obtained.
Since the spatiotemporal features of a physical environment may include temporal features, the temporal features may be abstracted into discrete time steps, which may be denoted as t e {0,1, 2. The information state may change once per time step, each drone moving to an adjacent vertex or staying in place. To ensure that each drone is at the vertex at any time t, the time step may be set to the longest time that all drones complete a pack-with-de-cycle (OODA). For example, 1 time step in the simulation environment may be set to 10 minutes in the real environment if each drone is able to complete data collection, data preprocessing, decision making, and flight to the next target area within 10 minutes.
The computer device may obtain the time step according to the environmental information, thereby obtaining environmental state change information according to the time step and the undirected graph. Specifically, the environmental state change information may include a plurality of levels, and the environmental state change information may be denoted as I k ∈{I 1 ,I 2 ,...,I N }, wherein I k Represents the kth environmental state change information level, and N represents the number of environmental state change information. The information status value may be used to quantitatively describe the level of environmental status change information, denoted as F k ∈{F 1 ,F 2 ,...,F N }. Wherein F is k =f(I k ),f:I k →R + . When k is larger, the environmental state change information level I k The higher the content, the more unknown data is contained, i.e. F 1 <F 2 <...<F N . Markov chains (Markov) are a stochastic process in probability theory and numerical statistics that has Markov properties and that exist within discrete sets of indices and state spaces, and are a common method of describing environmental dynamics. In this embodiment, it may be assumed that the ambient state change information transition for each moving vertex is a multi-state Markov chain of different, independent and discrete times. The computer device may generate a state transition matrix based on the multi-state markov chain and the environmental state change information, wherein the generated state transition matrix is:the state transition matrix may be a random matrix, where p ij Representing slave state I i Transition to State I j Is a probability of (2). In this embodiment, prior information needs to be collected from different information sources before the unmanned aerial vehicle is dispatched to perform tasks, and the state transition matrix can be assigned after the prior information is preprocessed through a machine learning technology.
In this embodiment, if some of the moving vertices are not accessed by the drone, the moving vertex unknown data and information values may increase over time. Typically, for two different motion vertices, if If a moving vertex has a higher information value at the current time, the vertex may also have a higher information value at the next time. Therefore, the state transition matrix P in the present embodiment may be a monotonic random matrix. I.e. ifThen the two N-dimensional probability vectors x and y satisfy a random dominance, which can be defined as x > y. In addition, if P N >P N-1 >...>P 1 Then P may be a monotonic random matrix.
In one embodiment, a Markov chain-based information state transition model is shown in FIG. 3, wherein I 1 、I 2 And I 3 Indicating the level of the environmental state change information.
In one embodiment, the drone may be a mobile autonomous entity capable of making decisions and performing actions with the purpose of providing accurate and up-to-date situational information. Marking a predetermined area as M, marking the unmanned aerial vehicle as M k Then the drone may be marked G to collect information in a predetermined area k =<V k ,E k >. Wherein G is k The sub-graph of G, the reconnaissance areas of different drones may overlap each other. As shown in fig. 4, 4 unmanned aerial vehicles and 8 unmanned aerial vehicle scout areas are shown in fig. 4, respectively, wherein black dots represent motion vertices, black lines represent motion boundaries, triangles represent unmanned aerial vehicles, and ellipses represent unmanned aerial vehicle scout areas.
In this embodiment, at any time, each unmanned aerial vehicle is on a certain motion vertex of the graph G, and different unmanned aerial vehicles may occupy the same motion vertex at the same time. Each drone moves with its motion vertices and motion boundaries in its predetermined area. Each time step of the drone may be moved from the current motion vertex to its adjacent one. When the unmanned aerial vehicle moves to the motion vertex, the information of the motion vertex can be automatically collected. At the same time, the environmental state change information level of the motion vertex will be reset to I 1 Wherein I 1 Indicating that there is no new information at the current time. Because of the limited observation capability of unmanned aerial vehicle, only the unmanned aerial vehicle can be observedAnd (5) observing the information of the current motion vertex at the current moment.
In this embodiment, the cooperative performance may be used to represent the ratio of the return value obtained by each unmanned aerial vehicle to the total return value, denoted as g: m, when multiple unmanned aerial vehicles access the same motion vertex at the same time k →R + ,m k E.m. Wherein, the expression of the collaboration performance may be:wherein m is first Representing the first drone assigned a scout motion vertex. The expression of the cooperative performance shows that if a plurality of unmanned aerial vehicles simultaneously scout the same motion vertex, the effect is equivalent to the scout effect of one unmanned aerial vehicle on the motion vertex.
In one embodiment, the unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation may further include a process of generating a global value function, and the specific process includes: generating total state information of the unmanned aerial vehicle cluster according to the state information; generating local return value functions of each unmanned aerial vehicle respectively through the total state information and the state information; and generating a global value function according to each local return value function.
The computer equipment can collect the collected state information of each unmanned aerial vehicle, so that total state information of the unmanned aerial vehicle cluster is generated.
In one embodiment, the unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation may further include a process of calculating a target execution policy according to a TD-POMDP framework, and the specific process includes: establishing a TD-POMDP framework according to the state information and the global value function; and respectively calculating target execution strategies of each unmanned aerial vehicle through the TD-POMDP framework according to the planning algorithm and the global value function.
Among other things, the partially observable markov decision process (Partially Observable Markov Decision Process, POMDP) is a generalized markov decision process, and the POMDP architecture can simulate different real world continuous processes. The TD-POMDP framework may be denoted as < M, S, A, O, T, Z, R, B, H >.
In the TD-POMDP framework, M= { M 1 ,…,m k Is the set of all drones, where m k Represents the kth unmanned aerial vehicle, and K represents the number of unmanned aerial vehicles in the set.
S can be used to represent a state set, and can be decomposed into a positional state feature of the unmanned aerial vehicle and an information state feature of the motion vertex, denoted as s=<S V ,S I >. The global state consists of the local states of all drones, and different drones may share the local states. In particular, the method comprises the steps of,representing unmanned plane m k Information state of each motion vertex of the unmanned plane m k All information states in the scout area of (1) are denoted +.>Wherein |V k I is the scout area G k The number of all motion vertices in the model. In addition, let->Sum s= [ S ] I ,S V ]E S. Unmanned plane m k Is characterized by the information status of its scout area +.>Unmanned plane m k Is that the position state is +.>In order to obtain more global return values, the actions of each drone need to be coordinated with the actions of the other drones.
A=× k A k Is a set of actions for all unmanned aerial vehicles, wherein A k Representing unmanned plane m k Is a mobile space of the system. The joint action is denoted as a= [ a ] 1 ,…a k ],a k ∈A k . Decision variable a k Representing the current roofDot unmanned plane m k Acts of (1). a, a k Is determined by the topology of the graph.
O=× k O k Is an observation set of all unmanned aerial vehicles, wherein O k Representing unmanned plane m k Is a space for observation of the object. The joint observation is noted as o= [ o 1 ,…o k ],And each unmanned aerial vehicle independently determines own actions according to the local information and the interaction information. The position status of all unmanned aerial vehicles is completely considerable +.>When the unmanned plane moves to a moving vertex at the time step t, the unmanned plane can automatically collect the information state of the moving vertex,/>However, the drone cannot acquire information status in other situations, such as for other time steps or other motion vertices.
T is a joint state transition probability set, T (s (t+1) |s (T), a (T))=n k T k (s k (t+1)|s k (t), a (t)). Wherein, the liquid crystal display device comprises a liquid crystal display device,is unmanned plane m k Is subject to a Markov chain of multi-state discrete times. />Is unmanned plane m k Is a local position state transition probability of (c). If s is k (t+1) is the state->Targets of action a (t) under the condition +.>On the contrary, let(s)>
Z is a joint observation transition probability set, Z (o (t+1) |a (t), s (t))=pi k Z k (o k (t)|a k (t),s k (t)). If o k (t)=s k (t), then Z k (o k (t)|a k (t),s k (t))=1; contrarily Z k (o k (t)|a k (t),s k (t))=0。
S×A→R + Is a decomposable global return value function. Wherein, the liquid crystal display device comprises a liquid crystal display device,is the sum of the information values collected by all unmanned aerial vehicles. Unmanned plane m k The local return value function of (2) is as follows: r is R k (a k ,o k )=g(m k )f(I k ). In this embodiment, the maximized original value function V needs to be solved π . Due to the resolvable nature of the reward value function, the global value function V π Can be factored into the sum of a plurality of local value functions: />Wherein h is k =[a k (0),o k (0),…,a k (T),o k (T),…,a k (T+t),o k (T=t)]Is unmanned plane m k Of the local history of its dimension h k 2 (t+T+1), determined by the current time step T and the simulation time step T; pi k =[a k (0),a k (1),…,a k (H-1)]Is unmanned plane m k Is a policy of (2); pi= [ pi ] 12 ,…,π K ]Is a combined strategy of all unmanned aerial vehicles; />Is unmanned plane m k Executing policy pi k Is a desired return value for (1); />Representing execution policy pi k Is a function of the local value of (a).
B is belief, including information belief and location belief, noted as b=<B V ,B I >. Let B k For unmanned plane m k Is a local belief of (c). B (B) I Is uncertain, and B V Is determined. At any time step t, the belief is a sufficient statistic to calculate the optimal strategy, and the information states of all the motion vertices change independently. Factorized information belief is written asMotion vertex v i Is +.>Variable->Refers to position v i The information state of (a) is I k Conditional probability at that time. The factorization belief is in a linear growth relation with the number of motion vertexes, so that the calculation complexity is greatly reduced. Further, the motion vertex v i The predictive formula of the information belief is: / >Wherein, the liquid crystal display device comprises a liquid crystal display device,v' represents the motion vertex visited by any one of the robots at time step t.
H∈Z + Representing the planning step size.
In this embodiment, the planning algorithm may be a sequential assignment Monte Carlo planning (Factored Belief based Sequential Allocated Monte Carlo Planning, FB-SAMCP) algorithm based on factorized beliefs. Can mark priority unmanned aerial vehicle collection asRecord->Is m j Unmanned aerial vehicle outside the priority policy set, refer to unmanned aerial vehicle m j The policy of the priority drone should be considered in making the decision. In this embodiment, the policy pi for each drone will be executed in sequence k The sum of the later obtained revised expected return values as a revised global cost function is recorded asRevising the global value function +.>The calculation formula of (2) is as follows: />Wherein (1)>Representing execution pi k Time-revised local value function->Is a revised return value. Revising the global value function +.>Equivalent to the original global value function V π (h)。And V π The difference between these is in the way of computation. First, the local value function is revised +.>Is calculated in turn according to the sequence of the unmanned aerial vehicle, and is equal to the sum of all the revised local value functions of the unmanned aerial vehicle. Second, the original global value function V π Is calculated from time. Specifically, the expected local return value E of each unmanned aerial vehicle at the time step t is calculated π (R k (t)), and the expected global return value E for all unmanned aerial vehicles π (R (t)). Original global value function V π Is the expected return value E π (R (t)) from t=0 to t=h-1.
Each revised local value function depends on a local state feature, which may be affected by other drones. In this embodiment, the effect of other drones is reflected in the return value with penalty factors. The factored revised global value function decomposes the global prediction tree into several local look-ahead trees.
In this embodiment, the FB-SAMCP algorithm consists of three programs: a sequential distribution program, a search program, and a simulation program. Each unmanned aerial vehicle executes the FB-SAMCP algorithm in parallel at each time step, and coordination of actions is completed after a plurality of iterations. Wherein the actions of the drone are coordinated after each iteration, i.e. after the search and expansion of the look-ahead tree is completed.
Unmanned plane m k The sequential allocation procedure is first executed, and the number of iterations executed does not exceed K. In each iteration, when initializing h k Then, the unmanned aerial vehicle executes a search program to obtain a priority strategy set pi Ck Optimal strategy pi under conditions k Sum function V k . Unmanned plane m k Will pi k And V k Transmitting to other unmanned aerial vehicles and receiving pi of other unmanned aerial vehicles (k) And V (k) . Unmanned plane m k It is necessary to wait for messages from K-n after the nth iteration. After the comparison of V (k) And V is equal to k Thereafter, unmanned plane m k Unmanned aerial vehicle corresponding to storage maximum function and strategy thereof, which are respectively expressed as m * Andif the unmanned aerial vehicle corresponding to the maximum function is self, unmanned aerial vehicle m k Complete the search and let pi k As a strategy for its current time step; otherwise, unmanned plane m k Let m * And->Respectively add to C k And pi Ck In its belief B k (h k ) Will be according to pi Ck And (5) updating again.
In the search procedure, pi is based on the priority policy set * The local look-ahead tree is extended. First update local belief B k (h k ) Pi is set * The local information state belief of the visited vertex will be reset to Λ. Secondly, calculating an H-step optimal strategySpecifically, the information state is sampled and simulated until a termination condition is reached. If the unmanned action is determined, the position status and the observation of the position status are also determined. In addition, observations of the information status are reflected directly in the reward value. The programming step size is successively reduced in the loop to ensure that the policy set pi is prioritized when executing simulator G * With unmanned plane m k Strategies with the same depth. In simulator G, two types of conflicts, synchronous repetition count and asynchronous repetition count, need to be considered. The synchronous repetition count refers to that when a plurality of unmanned aerial vehicles access the same motion vertex at the same time, each unmanned aerial vehicle receives a return value of the motion vertex. In asynchronous repetition counting, unmanned plane m k Determining at time step t 1 E {0,1, …, H-1} accessing the motion vertex v, and unmanned m j E C has decided at time step t 2 ∈{0,1,…,H-1},t 1 <t 2 Is used to access the motion vertices. At this time, unmanned plane m k The expected return to accessing the motion vertices is overestimated because the higher priority drone m is not considered j It has been decided to access the motion vertex. To resolve the conflict, a penalty factor is introduced, unmanned plane p e Punishment unmanned plane m k Overestimated return value, i.e. p e Refers to unmanned plane m k Creating drone m after t1 accesses motion vertex v j Is a loss of (2). Penalty factor p e The formula at time step t1 is:wherein R (t) 2 ) Is not considered in unmanned plane m k In the case of unmanned plane m j Accessing a sampling return value of the motion vertex v at a time step t 2; but->In consideration of unmanned plane m k In the case of unmanned plane m j The sampled return value of the motion vertex v is accessed at time step t 2. Let t2 be the time step closest to t1 in the asynchronous repeat calculation. Assume that at time step t1, the information state of motion vertex v is I i E I, then the return value is R (t 1 )=f(I i ). In unmanned plane m k After access, the information state will be reset to I 1 ,I i And I 1 Execution of Δt 2 =t 2 +t 1 Secondary state transitions. Let the two information states at time step t2 be represented as And->Thus (S)>And->Unmanned plane m k The revised return value at t1 is:in this embodiment, as the number of samples increases, the expected sampling beliefs will tend towards true beliefs. Furthermore, let the belief of the motion vertex v be b (t 1 ). The expected original return value at time step t2 is equal toThe desired revised return value at time step t2 is equal to +.>The expected penalty factor is->
In one embodiment, the process and results of the test performed using the present technique are as follows:
experiments compare FA-SAMCP with POMCP, TD-FMOP and SA-POMCP. POMCP is the most advanced general online planning algorithm at present. The TD-FMOP algorithm combines the MCTS method and the Max-Sum method to solve the problem of online planning of the loosely-coupled distributed and fixedly-connected unmanned aerial vehicle clusters, and frequent communication is needed between unmanned aerial vehicles to ensure that the overall performance is approximately optimal. SA-POMCP is an extension of FB-SAMCP, and the distinction between SA-POMCP and FB-SAMCP is the difference in the way the beliefs are expressed. SA-POMCP is represented using a particle filter, while FB-SAMCP is represented using a factor. SA-POMCP is a general algorithm for solving TD-POMDP. It can be applied to more complex problems where beliefs are difficult to express. Each simulation run was 100 time steps long and 50 runs per algorithm in each scenario. The experiment evaluates the performance of each algorithm by counting the average return value and average run time per round. The run time of each round of algorithm was limited to 30 minutes. All experiments were run on a computer with 2.6GHz intel dual core CPU and 4GB memory.
Experiments mainly evaluate the influence of expandability on FB-SAMCP, POMCP, TD-FMOP and SA-POMCP, and three scenes are constructed, namely: scene one: as shown in fig. 5, the figure has 14 motion vertices and 25 motion boundaries, and 4 drones each have 2 neighbors to perform a reconnaissance task in a designated area. Scene II: as shown in fig. 6, the figure has 40 motion vertices and 83 motion boundaries, and 12 drones each have 2 neighbors to perform a reconnaissance task in a designated area. Scene III: as shown in fig. 6, the figure has 40 motion vertices and 83 motion boundaries, and 12 drones each have 11 neighbors to perform a reconnaissance task in a designated area.
The first scene is a small-scale unmanned aerial vehicle reconnaissance scene with a weakly-coupled distributed and fixedly-connected structure. Compared with the first scene, the second scene is concentratedThe group is still a weakly coupled distributed and fixedly connected structure, but the number of unmanned aerial vehicles is expanded. In case three, the degree of coupling of the drone is extended from weak coupling to tight coupling, as compared to case two. The unmanned plane's planning step H is 3 time steps for all scenarios. Each motion vertex has three information states, and the information state value vector is set to be F= [0,1,2 ] ]Corresponding to the information state i= [ I ] 1 ,I 2 ,I 3 ]。
For scenario one, FIG. 5 depicts an average return value. Experimental results show that the return values of FB-SAMCP are respectively better than the return values of POMCP by 6.0%, 15.0% and 8.3% in 50 samples, 500 samples and 5000 samples. In addition, the return value for FB-SAMCP was slightly lower than TD-FMOP in 50 samples, but exceeded TD-FMOP by about 2.4% in 100 samples and by about 5.4% in 1000 simulations. For all scenarios, FB-SAMCP performs slightly better than SA-POMCP.
Table 1 depicts the run time of these algorithms in scenario one, where the symbol "-" indicates the result of exceeding the time limit and memory overflow, and NoS indicates the number of samples. The POMCP has a much lower runtime than the FB-SAMCP. The run time of FB-SAMCP is about one third of TD-FMOP, which is 2 times that of FB-SAMCP in all simulations. In addition, TD-FMOP exceeded the time limit in 5000 sampling simulation experiments.
Table 1:
Nos TD-FMOP FB-SAMCP POMCP SA-POMCP
50 23.9 9.1 0.8 4.5
100 50.0 17.9 1.5 9.5
500 267.7 89.0 7.2 45.7
1000 549.2 176.3 14.0 97.4
5000 - 777.5 75.3 473.4
and the second scene evaluates the influence of the expandability of the number of unmanned aerial vehicles on the algorithm performance. Fig. 6 depicts the average return value for each algorithm at different sampling times in scenario two. When POMCP is operated, the result cannot be calculated due to insufficient memory of a computer. Although the average return value of FB-SAMCP is 97.0% for TD-FMOP in 50 samples, it is 3.5% higher than the average return value for TD-FMOP in 500 samples and 1000 samples. In addition, the average return value of the FB-SAMCP is similar to that of the SA-POMCP.
Table 2 depicts the average run time of several algorithms in scenario two, where the symbol "-" indicates the result of exceeding the time limit and memory overflow. Similar to the results in scenario one, the runtime of FB-SAMCP is about twice the runtime of SA-POMCP, but about one third of TD-FMOP. In fact, performing TD-FMOP requires a significant amount of time because of the frequent communication and action synchronization that the drone needs to make in making joint decisions.
Table 2:
Nos TD-FMOP FB-SAMCP POMCP SA-POMCP
50 84.6 25.9 - 13.9
100 157.4 50.3 - 26.9
500 841.3 238.6 - 143.8
1000 1716.3 456.3 - 272.2
and thirdly, evaluating the influence of the expandability of the coupling degree of the unmanned aerial vehicle on the algorithm. Tables 3 and 4 show the average run time and average return values, respectively. When POMCP is operated, the result cannot be calculated due to insufficient memory of a computer. From the results, it is known that the average return of FB-SAMCP is similar to that of SA-POMCP, and the average running time is higher than that of SA-POMCP.
Table 3:
Nos TD-FMOP FB-SAMCP POMCP SA-POMCP
50 - 106.4 - 63.9
100 - 208.5 - 126.3
500 - 1043.0 - 584.7
table 4:
Nos TD-FMOP FB-SAMCP POMCP SA-POMCP
50 - 1115.5 - 1108.1
100 - 1153.5 - 1150.2
500 - 1206.4 - 1197.5
when constructing the look-ahead tree of the POMCP, the joint action of all unmanned aerial vehicles needs to be considered; and when constructing the look-ahead tree of the TD-FMOP, the actions of the neighbor unmanned aerial vehicle need to be considered. For FB-SAMCP and SA-POMCP, each unmanned aerial vehicle's local look-ahead tree has a lower branching factor, because it only includes the unmanned aerial vehicle's own actions. Thus, FB-SAMCP and SA-POMCP still have excellent performance in a small number of sampling times. In addition, due to the lower branching factor, FB-SAMCP and SA-POMCP have better scalability than POMCP in terms of number and coupling of unmanned aerial vehicles.
It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described above may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the sub-steps or stages of other steps or other steps.
In one embodiment, as shown in fig. 7, there is provided a unmanned aerial vehicle cluster mission planning system, including: an undirected graph generation module 710, a model generation module 720, a function generation module 730, and a calculation module 740, wherein:
the undirected graph generating module 710 is configured to obtain the environmental information, and generate an undirected graph according to the environmental information.
The model generating module 720 is configured to generate an information state transition model according to the environmental information and the undirected graph, and obtain state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model.
A function generating module 730, configured to generate a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle.
The calculating module 740 is configured to obtain a planning algorithm, and calculate target execution policies of each unmanned aerial vehicle according to the planning algorithm and the global value function.
In one embodiment, the undirected graph generation module 710 includes a feature extraction module, an information determination module, and an image generation module, wherein:
and the feature extraction module is used for extracting the environmental spatial features in the environmental information.
And the information determining module is used for determining the movement boundary and the movement vertex of the unmanned aerial vehicle according to the environmental spatial characteristics.
And the image generation module is used for generating an undirected graph according to the motion boundary and the motion vertex.
In one embodiment, the function generation module 730 includes a step size acquisition module, an information acquisition module, and a matrix generation module, where:
and the step length acquisition module is used for acquiring the time step length according to the environment information.
And the information acquisition module is used for acquiring the environment state change information according to the time step and the undirected graph.
And the matrix generation module is used for generating a state transition matrix based on the Markov chain and the environmental state change information and obtaining an information state transition model.
In one embodiment, the function generating module 730 is further configured to generate total status information of the unmanned aerial vehicle cluster according to the status information; generating local return value functions of each unmanned aerial vehicle respectively through the total state information and the state information; and generating a global value function according to each local return value function.
In one embodiment, the provided unmanned aerial vehicle cluster task planning system further comprises a frame building module, a data processing module and a data processing module, wherein the frame building module is used for building a TD-POMDP frame according to the state information and the global value function; the calculation module 740 is further configured to calculate, according to the planning algorithm and the global value function, a target execution policy of each unmanned aerial vehicle through the TD-POMDP framework.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by the processor is used for realizing a unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in FIG. 8 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
acquiring environment information and generating an undirected graph according to the environment information;
generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function.
In one embodiment, the processor when executing the computer program further performs the steps of: extracting environmental space features in the environmental information; determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.
In one embodiment, the processor when executing the computer program further performs the steps of: acquiring a time step according to the environmental information; acquiring environmental state change information according to the time step and the undirected graph; based on the Markov chain and the environmental state change information, a state transition matrix is generated, and an information state transition model is obtained.
In one embodiment, the processor when executing the computer program further performs the steps of: generating total state information of the unmanned aerial vehicle cluster according to the state information; generating local return value functions of each unmanned aerial vehicle respectively through the total state information and the state information; and generating a global value function according to each local return value function.
In one embodiment, the processor when executing the computer program further performs the steps of: establishing a TD-POMDP framework according to the state information and the global value function; and respectively calculating target execution strategies of each unmanned aerial vehicle through the TD-POMDP framework according to the planning algorithm and the global value function.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring environment information and generating an undirected graph according to the environment information;
Generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function.
In one embodiment, the computer program when executed by the processor further performs the steps of: extracting environmental space features in the environmental information; determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a time step according to the environmental information; acquiring environmental state change information according to the time step and the undirected graph; based on the Markov chain and the environmental state change information, a state transition matrix is generated, and an information state transition model is obtained.
In one embodiment, the computer program when executed by the processor further performs the steps of: generating total state information of the unmanned aerial vehicle cluster according to the state information; generating local return value functions of each unmanned aerial vehicle respectively through the total state information and the state information; and generating a global value function according to each local return value function.
In one embodiment, the computer program when executed by the processor further performs the steps of: establishing a TD-POMDP framework according to the state information and the global value function; and respectively calculating target execution strategies of each unmanned aerial vehicle through the TD-POMDP framework according to the planning algorithm and the global value function.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (6)

1. The unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation is characterized by comprising the following steps of:
acquiring environment information and generating an undirected graph according to the environment information;
generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
Generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
acquiring a planning algorithm, and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information, wherein the global value function comprises the following steps: generating total state information of the unmanned aerial vehicle cluster according to the state information; generating local return value functions of each unmanned aerial vehicle respectively through the total state information and the state information; generating a global value function according to each local return value function;
generating an information state transition model according to the environment information and the undirected graph, wherein the information state transition model comprises the following steps: acquiring a time step according to the environmental information;
acquiring environmental state change information according to the time step and the undirected graph;
generating a state transition matrix based on the Markov chain and the environmental state change information, and obtaining an information state transition model;
establishing a TD-POMDP framework according to the state information and the global value function;
according to the planning algorithm and the global value function, respectively calculating target execution strategies of each unmanned aerial vehicle, wherein the target execution strategies comprise:
and respectively calculating target execution strategies of each unmanned aerial vehicle through the TD-POMDP framework according to the planning algorithm and the global value function.
2. The method of claim 1, wherein generating the undirected graph based on the environmental information comprises:
extracting environmental space features in the environmental information;
determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.
3. Unmanned aerial vehicle cluster reconnaissance mission planning system based on distributed sequential distribution, which is characterized in that the system comprises:
the undirected graph generating module is used for acquiring the environment information and generating an undirected graph according to the environment information;
the model generation module is used for generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
the function generation module is used for generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
the calculation module is used for acquiring a planning algorithm and respectively calculating target execution strategies of each unmanned aerial vehicle according to the planning algorithm and the global value function; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information, wherein the global value function comprises the following steps: generating total state information of the unmanned aerial vehicle cluster according to the state information; generating local return value functions of each unmanned aerial vehicle respectively through the total state information and the state information; generating a global value function according to each local return value function;
Establishing a TD-POMDP framework according to the state information and the global value function;
according to the planning algorithm and the global value function, respectively calculating target execution strategies of each unmanned aerial vehicle, wherein the target execution strategies comprise:
according to a planning algorithm and a global value function, respectively calculating target execution strategies of each unmanned aerial vehicle through a TD-POMDP framework;
a model generation module comprising:
the step length acquisition module is used for acquiring a time step length according to the environmental information;
the information acquisition module is used for acquiring environmental state change information according to the time step and the undirected graph;
and the matrix generation module is used for generating a state transition matrix based on the Markov chain and the environmental state change information and obtaining an information state transition model.
4. A system according to claim 3, characterized in that the undirected graph generating module comprises: the feature extraction module is used for extracting the environmental space features in the environmental information;
the information determining module is used for determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics;
and the image generation module is used for generating an undirected graph according to the motion boundary and the motion vertex.
5. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 2 when executing the computer program.
6. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 2.
CN202010232017.5A 2020-03-27 2020-03-27 Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation Active CN111414006B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010232017.5A CN111414006B (en) 2020-03-27 2020-03-27 Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010232017.5A CN111414006B (en) 2020-03-27 2020-03-27 Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation

Publications (2)

Publication Number Publication Date
CN111414006A CN111414006A (en) 2020-07-14
CN111414006B true CN111414006B (en) 2023-09-08

Family

ID=71494617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010232017.5A Active CN111414006B (en) 2020-03-27 2020-03-27 Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation

Country Status (1)

Country Link
CN (1) CN111414006B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784211B (en) * 2020-08-04 2021-04-27 中国人民解放军国防科技大学 Cluster-based group multitask allocation method and storage medium
CN112131730B (en) * 2020-09-14 2024-04-30 中国人民解放军军事科学院评估论证研究中心 Fixed-grid analysis method and device for intelligent unmanned system of group
CN113111441B (en) * 2021-04-26 2023-01-31 河北交通职业技术学院 Method for constructing cluster unmanned aerial vehicle task model based on adjacency relation
CN114722946B (en) * 2022-04-12 2022-12-20 中国人民解放军国防科技大学 Unmanned aerial vehicle asynchronous action and cooperation strategy synthesis method based on probability model detection

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286071A (en) * 2008-04-24 2008-10-15 北京航空航天大学 Multiple no-manned plane three-dimensional formation reconfiguration method based on particle swarm optimization and genetic algorithm
CN106705970A (en) * 2016-11-21 2017-05-24 中国航空无线电电子研究所 Multi-UAV(Unmanned Aerial Vehicle) cooperation path planning method based on ant colony algorithm
WO2017177533A1 (en) * 2016-04-12 2017-10-19 深圳市龙云创新航空科技有限公司 Method and system for controlling laser radar based micro unmanned aerial vehicle
CN107632614A (en) * 2017-08-14 2018-01-26 广东技术师范学院 A kind of multiple no-manned plane formation self-organizing cooperative control method theoretical based on rigidity figure
EP3349086A1 (en) * 2017-01-17 2018-07-18 Thomson Licensing Method and device for determining a trajectory within a 3d scene for a camera
KR20190086081A (en) * 2018-01-12 2019-07-22 한국과학기술원 Multi­layer­based coverage path planning algorithm method of unmanned aerial vehicle for three dimensional structural inspection and the system thereof

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9464902B2 (en) * 2013-09-27 2016-10-11 Regents Of The University Of Minnesota Symbiotic unmanned aerial vehicle and unmanned surface vehicle system
US10692385B2 (en) * 2017-03-14 2020-06-23 Tata Consultancy Services Limited Distance and communication costs based aerial path planning
US11782141B2 (en) * 2018-02-05 2023-10-10 Centre Interdisciplinaire De Developpement En Cartographie Des Oceans (Cidco) Method and apparatus for automatic calibration of mobile LiDAR systems

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286071A (en) * 2008-04-24 2008-10-15 北京航空航天大学 Multiple no-manned plane three-dimensional formation reconfiguration method based on particle swarm optimization and genetic algorithm
WO2017177533A1 (en) * 2016-04-12 2017-10-19 深圳市龙云创新航空科技有限公司 Method and system for controlling laser radar based micro unmanned aerial vehicle
CN106705970A (en) * 2016-11-21 2017-05-24 中国航空无线电电子研究所 Multi-UAV(Unmanned Aerial Vehicle) cooperation path planning method based on ant colony algorithm
EP3349086A1 (en) * 2017-01-17 2018-07-18 Thomson Licensing Method and device for determining a trajectory within a 3d scene for a camera
CN107632614A (en) * 2017-08-14 2018-01-26 广东技术师范学院 A kind of multiple no-manned plane formation self-organizing cooperative control method theoretical based on rigidity figure
KR20190086081A (en) * 2018-01-12 2019-07-22 한국과학기술원 Multi­layer­based coverage path planning algorithm method of unmanned aerial vehicle for three dimensional structural inspection and the system thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Online Planning for Multiagent Situational Information Gathering in the Markov Environment;Xin Zhou;IEEE;全文 *

Also Published As

Publication number Publication date
CN111414006A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
CN111414006B (en) Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation
CN111367317A (en) Unmanned aerial vehicle cluster online task planning method based on Bayesian learning
Zhao et al. Systemic design of distributed multi-UAV cooperative decision-making for multi-target tracking
CN111126668B (en) Spark operation time prediction method and device based on graph convolution network
CN110059385B (en) Grid dynamics scenario simulation method and terminal equipment coupled with different-speed growth
Kyriakakis et al. A cumulative unmanned aerial vehicle routing problem approach for humanitarian coverage path planning
CN113780584B (en) Label prediction method, label prediction device, and storage medium
EP3789938A1 (en) Virtual intelligence and optimization through multi-source, real-time, and context-aware real-world data
Ma et al. Hierarchical reinforcement learning via dynamic subspace search for multi-agent planning
CN114578860A (en) Large-scale unmanned aerial vehicle cluster flight method based on deep reinforcement learning
Zhou et al. Online planning for multiagent situational information gathering in the Markov environment
Chen et al. Multi-agent patrolling under uncertainty and threats
CN113566831A (en) Unmanned aerial vehicle cluster navigation method, device and equipment based on human-computer interaction
CN113554680A (en) Target tracking method and device, unmanned aerial vehicle and storage medium
CN110824496B (en) Motion estimation method, motion estimation device, computer equipment and storage medium
CN116468831A (en) Model processing method, device, equipment and storage medium
CN114445692B (en) Image recognition model construction method and device, computer equipment and storage medium
Wu et al. Adaptive submodular inverse reinforcement learning for spatial search and map exploration
US20220107628A1 (en) Systems and methods for distributed hierarchical control in multi-agent adversarial environments
Parisotto Meta reinforcement learning through memory
CN115327926A (en) Multi-agent dynamic coverage control method and system based on deep reinforcement learning
CN110727291B (en) Centralized cluster reconnaissance task planning method based on variable elimination
Weng et al. Big data and deep learning platform for terabyte-scale renewable datasets
Khanzhahi et al. Deep reinforcement learning issues and approaches for the multi-agent centric problems
Toubeh et al. Risk-aware planning by confidence estimation using deep learning-based perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant