CN111414006A - Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential distribution - Google Patents

Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential distribution Download PDF

Info

Publication number
CN111414006A
CN111414006A CN202010232017.5A CN202010232017A CN111414006A CN 111414006 A CN111414006 A CN 111414006A CN 202010232017 A CN202010232017 A CN 202010232017A CN 111414006 A CN111414006 A CN 111414006A
Authority
CN
China
Prior art keywords
information
unmanned aerial
aerial vehicle
generating
value function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010232017.5A
Other languages
Chinese (zh)
Other versions
CN111414006B (en
Inventor
王维平
周鑫
王彦锋
井田
王涛
李小波
黄美根
杨松
李童心
段婷
刘国杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202010232017.5A priority Critical patent/CN111414006B/en
Publication of CN111414006A publication Critical patent/CN111414006A/en
Application granted granted Critical
Publication of CN111414006B publication Critical patent/CN111414006B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • G05D1/104Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The scheme relates to an unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential distribution. The method comprises the following steps: acquiring environment information, and generating an undirected graph according to the environment information; generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle; and acquiring a planning algorithm, and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function. Because the undirected graph consists of vertexes and edges, the unmanned aerial vehicle executes tasks on the specified vertexes or in the boundary, and more valuable information can be collected; the global value function is generated through the information state transition model, and then the target execution strategy is calculated according to the planning algorithm, so that the reconnaissance route of each unmanned aerial vehicle can be obtained, and the precision of unmanned aerial vehicle cluster task planning is improved.

Description

Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential distribution
Technical Field
The invention relates to the technical field of unmanned aerial vehicle mission planning, in particular to an unmanned aerial vehicle cluster reconnaissance mission planning method and system based on distributed sequential distribution, computer equipment and a storage medium.
Background
Along with the continuous development of unmanned aerial vehicle technique, unmanned aerial vehicle all plays more and more important effect in civilian field and for military use field. A drone swarm is a typical multi-agent system that can be controlled autonomously or remotely, and that can perform tasks without the need of a pilot. Unmanned aerial vehicles have significant advantages over manned unmanned aerial vehicles in performing boring, dirty, dangerous tasks; compared with a manned airplane, the unmanned aerial vehicle has the characteristics of low cost, small size, strong viability and the like, and the characteristics enable the unmanned aerial vehicle to be used for emergency rescue, so that the unmanned aerial vehicle has a wide prospect. And with the deepening of practical application, the unmanned aerial vehicle emergency rescue develops towards the cluster type and the specialty, and the rescue task in charge is also harder and more complicated. Wherein, many unmanned aerial vehicle independently cooperate control structure to divide into two types usually: centralized control architectures and distributed control architectures. The centralized control method has the advantage of obtaining a global optimal solution, and the distributed control method has the advantages of high reliability, small calculation amount, small communication traffic and the like.
Because the working environment of the unmanned aerial vehicle often changes rapidly and dynamically, especially under complex conditions such as poor communication, the unmanned aerial vehicle cluster often needs to make a decision and execute actions rapidly, and therefore task planning needs to be performed on the unmanned aerial vehicle cluster in advance. The traditional method for carrying out task planning on the unmanned aerial vehicle cluster generally uses a centralized or distributed method to establish different optimization models and process the task planning problem of simple multi-unmanned aerial vehicle multitask.
Disclosure of Invention
Based on this, in order to solve the above technical problems, a distributed sequential allocation-based unmanned aerial vehicle cluster reconnaissance task planning method, a distributed sequential allocation-based unmanned aerial vehicle cluster reconnaissance task planning system, a computer device and a storage medium are provided, so that the precision of unmanned aerial vehicle cluster task planning can be improved.
A distributed sequential distribution-based unmanned aerial vehicle cluster reconnaissance mission planning method comprises the following steps:
acquiring environment information, and generating an undirected graph according to the environment information;
generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and acquiring a planning algorithm, and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.
In one embodiment, the generating an undirected graph according to the environment information includes:
extracting environmental space characteristics in the environmental information;
determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics;
and generating the undirected graph according to the motion boundary and the motion vertex.
In one embodiment, the generating an information state transition model according to the environment information and the undirected graph includes:
acquiring a time step according to the environment information;
acquiring environment state change information according to the time step and the undirected graph;
and generating a state transition matrix based on the Markov chain and the environment state change information, and obtaining the information state transition model.
In one embodiment, the generating a global value function corresponding to the cluster of drones according to the state information includes:
generating total state information of the unmanned aerial vehicle cluster according to the state information;
respectively generating a local return value function of each unmanned aerial vehicle according to the total state information and the state information;
and generating the global value function according to each local return value function.
In one embodiment, the method further comprises:
establishing a TD-POMDP frame according to the state information and the global value function;
the calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function respectively comprises:
and respectively calculating the target execution strategy of each unmanned aerial vehicle through the TD-POMDP framework according to the planning algorithm and the global value function.
An unmanned aerial vehicle cluster mission planning system, the system comprising:
the undirected graph generating module is used for acquiring environment information and generating an undirected graph according to the environment information;
the model generation module is used for generating an information state transition model according to the environment information and the undirected graph and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
the function generation module is used for generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and the calculation module is used for acquiring a planning algorithm and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring environment information, and generating an undirected graph according to the environment information;
generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and acquiring a planning algorithm, and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring environment information, and generating an undirected graph according to the environment information;
generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and acquiring a planning algorithm, and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.
According to the unmanned aerial vehicle cluster reconnaissance task planning method, the unmanned aerial vehicle cluster reconnaissance task planning system, the computer equipment and the storage medium based on distributed sequential distribution, the undirected graph is generated according to the environmental information by acquiring the environmental information; generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle; and acquiring a planning algorithm, and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function. Because the undirected graph consists of vertexes and edges, the unmanned aerial vehicle executes tasks on the specified vertexes or in the boundary, and more valuable information can be collected; the global value function is generated through the information state transition model, the target execution strategy is calculated according to the planning algorithm, the reconnaissance route of each unmanned aerial vehicle can be obtained, and therefore the precision of unmanned aerial vehicle cluster task planning can be improved.
Drawings
FIG. 1 is a diagram of an application environment for unmanned aerial vehicle cluster mission planning in one embodiment;
fig. 2 is a schematic flow chart of a method for planning a cluster scout mission of an unmanned aerial vehicle based on distributed sequential allocation in one embodiment;
FIG. 3 is a diagram of a Markov chain-based information state transition model in one embodiment;
FIG. 4 is a schematic diagram of a different number of drone reconnaissance areas in one embodiment;
FIG. 5 is a graphical illustration of a comparison of average return values for algorithms in an experimental scenario;
FIG. 6 is a diagram illustrating comparison of average reward values of algorithms of scene two and scene three in an experiment;
FIG. 7 is a block diagram of an embodiment of unmanned aerial vehicle cluster mission planning architecture;
FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential distribution can be applied to the application environment shown in fig. 1. As shown in fig. 1, the application environment includes a computer device 110 and a drone 120, wherein the computer device 110 and the drone 120 may be connected by a wireless connection. The computer device 110 may obtain the environment information and generate an undirected graph according to the environment information; the computer device 110 may generate an information state transition model according to the environmental information and the undirected graph, and respectively obtain state information of each unmanned aerial vehicle 120 in the unmanned aerial vehicle 120 cluster according to the information state transition model; the computer device 110 may generate a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used to calculate the execution policy of the drone 120; the computer device 110 may obtain the planning algorithm, and respectively calculate the target execution policy of each drone 120 according to the planning algorithm and the global value function. The computer device 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, robots, tablet computers, portable wearable devices, and the like.
In one embodiment, as shown in fig. 2, there is provided a method for unmanned aerial vehicle cluster scout mission planning based on distributed sequential allocation, including the following steps:
step 202, obtaining environment information, and generating an undirected graph according to the environment information.
The environmental information may include characteristics of the physical environment, which may be determined by spatiotemporal characteristics of the physical environment. The spatio-temporal characteristics of the physical environment may include spatial characteristics, temporal characteristics, and the like. The undirected graph can be used for indicating that each edge in the graph is undirected, and the computer device can generate the undirected graph according to the spatial features in the environment information, namely, the undirected graph represents the spatial features in the environment information.
And 204, generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model.
The environmental information may be affected by various factors, such as cloud cover area, rainfall, and actual temperature. Environmental information is different, and unmanned aerial vehicle's state information is also different. The information can be used for representing the degree of change of the interested data, when the interested data in the region change, the uncertainty of the difference between the recorded data and the unknown data is increased, and the acquired state information of the unmanned aerial vehicle also changes. The information state transition model can calculate the state information of each unmanned aerial vehicle according to the collected environment information.
Step 206, generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating the execution strategy of the unmanned aerial vehicle.
The global value function may be used to calculate an execution policy of the drone, where the execution policy of the drone may include a movement area, a movement time, a movement route, and the like of the drone. After the state information of each unmanned aerial vehicle is obtained, the information value of each unmanned aerial vehicle can be calculated, so that the sum of the information values of the unmanned aerial vehicle cluster is obtained, and then a global value function corresponding to the unmanned aerial vehicle cluster is generated according to the sum of the information values.
And 208, acquiring a planning algorithm, and respectively calculating a target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.
The Planning algorithm can be a factorized belief-based Sequential allocation Monte Carlo Planning (FB-SAMCP) algorithm, and the FB-SAMCP algorithm can effectively solve the conflict between the unmanned aerial vehicles and improve the cluster return value of the unmanned aerial vehicles. The target execution policy may be used to represent an optimal policy for the global value function.
In the embodiment, an undirected graph is generated by acquiring the environment information and according to the environment information; generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model; generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle; and acquiring a planning algorithm, and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function. Because the undirected graph consists of vertexes and edges, the unmanned aerial vehicle executes tasks on the specified vertexes or in the boundary, and more valuable information can be collected; the global value function is generated through the information state transition model, the target execution strategy is calculated according to the planning algorithm, the reconnaissance route of each unmanned aerial vehicle can be obtained, and therefore the precision of unmanned aerial vehicle cluster task planning can be improved.
In one embodiment, the provided unmanned aerial vehicle cluster scout mission planning method based on distributed sequential allocation may further include a process of generating an undirected graph, where the specific process includes: extracting environmental space characteristics in the environmental information; determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.
The ambient spatial features may be represented as an undirected graph, denoted G ═ V, E >. Wherein, the space coordinate set V is an Euclidean space coordinate; the set of edges E represents the set of edges of all the drone movement boundaries on which the drone can move back and forth. The moving vertex can be used for representing an important point target or a surface target, the size of a target area can be divided by manpower according to a real scene, and the number of the moving vertex can be recorded as | V |. In a real environment, adjacent moving vertices may be unreachable due to weather and terrain constraints.
In one embodiment, the provided unmanned aerial vehicle cluster scout mission planning method based on distributed sequential allocation may further include a process of generating an information state transition model, where the specific process includes: acquiring a time step according to the environment information; acquiring environment state change information according to the time step and the undirected graph; and generating a state transition matrix based on the Markov chain and the environment state change information, and obtaining an information state transition model.
Since the spatiotemporal features of the physical environment may include temporal features, temporal features may be abstracted into discrete time steps, which may be denoted as t ∈ {0,1, 2. }.
The computer equipment can acquire the time step according to the environment information, so that the environment state change information is acquired according to the time step and the undirected graph. Specifically, the environment state change information may include a plurality of levels, and the environment change information may be denoted as Ik∈{I1,I2,...,INIn which IkIndicating the kth environmental state change information level, and N indicates the number of environmental state change information. The information state value can be used to quantitatively describe the level of the environmental state change information, denoted as Fk∈{F1,F2,...,FN}. Wherein, Fk=f(Ik),f:Ik→R+. Environmental State Change information level I when k is largerkThe higher the number of unknown data contained, i.e. F1<F2<...<FN. Markov chain (Markov) is a stochastic process in probabilistic theory and mathematical statistics of Markov nature and existing in discrete sets of exponents and state spaces, a common method of describing the dynamics of an environment. In this embodiment, it may be assumed that the environmental state change information transition for each moving vertex follows a different, independent, and discrete-time multi-state Markov chain. The computer device may generate a state transition matrix based on the multi-state Markov chain and the environmental state change information, wherein the generated stateThe state transition matrix is:
Figure BDA0002429556870000071
the state transition matrix may be a random matrix, where pijRepresents slave state IiTransition to State IjThe probability of (c). In this embodiment, prior information needs to be collected from different information sources before the unmanned aerial vehicle is dispatched to execute a task, and the state transition matrix is assigned after the prior information is preprocessed by a machine learning technique.
In this embodiment, if some of the moving vertices are not visited by the drone, the moving vertex unknowns and information values may increase over time. In general, for two different motion vertices, if one motion vertex has a higher information value at the current time, the vertex may also have a higher information value at the next time. Therefore, the state transition matrix P in the present embodiment may be a monotonic random matrix. I.e., if
Figure BDA0002429556870000072
Then the two N-dimensional probability vectors x and y satisfy a random dominance, which can be defined as x > y. Furthermore, if PN>PN-1>...>P1Then P may be a monotonic random matrix.
In one embodiment, a Markov chain based information state transition model is illustrated in FIG. 3, where I1、I2And I3Indicating the level of the environmental state change information.
In one embodiment, the drone may be a mobile autonomous entity capable of making decisions and performing actions with the goal of providing accurate and up-to-date situational information. Mark the predetermined area as M and mark the unmanned plane as MkThen, the drone may collect information in a predetermined area, denoted as Gk=<Vk,Ek>. Wherein G iskIs a subgraph of G, the reconnaissance areas of different drones may overlap each other. As shown in fig. 4, the reconnaissance areas of 4 drones and 8 drones are shown in fig. 4, respectively, wherein the black dots represent the moving vertices,the black lines represent the motion boundaries, the triangles represent drones, and the ellipses represent the reconnaissance area of a drone.
In this embodiment, at any moment, each drone is on a certain motion vertex of graph G, and different drones may occupy the same motion vertex at the same time. Each drone moves at its motion vertices and motion boundaries in a predetermined area. The drone may move from the current motion vertex to one of its neighbors every time step. When the drone moves to a motion vertex, information for the motion vertex can be automatically collected. At the same time, the environment state change information level of the motion vertex is reset to I1In which I1Indicating that there is no new information at the current time. Due to the limited observation capability of the unmanned aerial vehicle, only the information of the current movement vertex at the current moment can be observed.
In this embodiment, the cooperation performance may be used to represent a ratio of a return value obtained by each drone to a total return value when multiple drones access the same motion vertex at the same time, and is recorded as g: mk→R+,mk∈ M. wherein the expression of collaboration performance may be:
Figure BDA0002429556870000081
wherein m isfirstRepresenting the first drone assigned a scout motion vertex. The expression of the cooperative performance shows that if a plurality of unmanned planes simultaneously scout the same moving vertex, the effect is equal to the scout effect of one unmanned plane on the moving vertex.
In one embodiment, the provided unmanned aerial vehicle cluster scout mission planning method based on distributed sequential allocation may further include a process of generating a global value function, where the specific process includes: generating total state information of the unmanned aerial vehicle cluster according to the state information; respectively generating a local return value function of each unmanned aerial vehicle according to the total state information and the state information; and generating a global value function according to each local return value function.
Wherein, the computer equipment can gather the status information of each unmanned aerial vehicle that collects to the total status information of unmanned aerial vehicle cluster is generated.
In an embodiment, the unmanned aerial vehicle cluster scout mission planning method based on distributed sequential allocation further includes a process of calculating a target execution policy according to a TD-POMDP framework, where the specific process includes: establishing a TD-POMDP frame according to the state information and the global value function; and respectively calculating the target execution strategy of each unmanned aerial vehicle through the TD-POMDP framework according to a planning algorithm and a global value function.
Among them, Partially Observable Markov Decision Process (POMDP) is a generalized Markov decision process, and the POMDP architecture can simulate different real-world continuous processes. The TD-POMDP framework can be written as < M, S, A, O, T, Z, R, B, H >.
In the TD-POMDP framework, M ═ M1,…,mkIs the set of all drones, where mkRepresenting the kth drone, K representing the number of drones in the set.
S can be used to represent a state set, and can be decomposed into the information state features of the position state feature and the motion vertex of the drone, and is denoted as S ═<SV,SI>. The global state consists of the local states of all drones, and different drones may share the local states. In particular, the method of manufacturing a semiconductor device,
Figure BDA0002429556870000091
represent unmanned plane mkInformation state of each moving vertex of (1), let the unmanned plane mkIs represented as
Figure BDA0002429556870000092
Wherein | VkI is the scout area GkThe number of all moving vertices in (a). In addition, let
Figure BDA0002429556870000093
And S ═ SI,SV]∈ S. unmanned plane mkIs characterized by the information state of its scout area
Figure BDA0002429556870000094
Unmanned plane mkIs that the position state is
Figure BDA0002429556870000095
To obtain more global return values, the actions of each drone need to be coordinated with the actions of the other drones.
A=×kAkIs the set of actions for all drones, where AkRepresent unmanned plane mkThe motion space of (2). One combined action is denoted as a ═ a1,…ak],ak∈Ak. Decision variable akUnmanned plane m representing current vertexkThe action of (2). a iskIs determined by the topology of the graph.
O=×kOkIs the observation set of all drones, where OkRepresent unmanned plane mkOf the observation space. One joint observation is denoted as o ═ o1,…ok],
Figure BDA0002429556870000096
And each unmanned aerial vehicle independently determines the action of the unmanned aerial vehicle according to the local information and the interactive information. The position states of all drones are entirely considerable
Figure BDA0002429556870000097
When the unmanned plane moves to a motion vertex at the time step t, the unmanned plane can automatically collect the information state of the motion vertex,
Figure BDA0002429556870000098
however, the drone cannot acquire the information state in other situations, for example, for other time steps or other moving vertices.
T is a joint state transition probability set, T (s (T +1) | s (T), a (T) | ΠkTk(sk(t+1)|sk(t), a (t)). Wherein the content of the first and second substances,
Figure BDA0002429556870000101
is an unmanned plane mkObey a multi-state discrete-time Markov chain.
Figure BDA0002429556870000102
Is an unmanned plane mkLocal position state transition probability. If s isk(t +1) is the state
Figure BDA0002429556870000103
A (t) under the condition, then
Figure BDA0002429556870000104
On the contrary, the method can be used for carrying out the following steps,
Figure BDA0002429556870000105
z is a set of joint observation transition probabilities, Z (o (t +1) | a (t), s (t) | ΠkZk(ok(t)|ak(t),sk(t)). If o isk(t)=sk(t), then Zk(ok(t)|ak(t),sk(t)) ═ 1; otherwise Zk(ok(t)|ak(t),sk(t))=0。
S×A→R+Is a decomposable global return value function. Wherein the content of the first and second substances,
Figure BDA0002429556870000106
is the sum of the information values collected by all drones. Unmanned plane mkThe local reward value function of (1) is as follows: rk(ak,ok)=g(mk)f(Ik). In this embodiment, it is necessary to solve the maximized original value function Vπ. Global value function V due to resolvability of the return value functionπMay be factored into the sum of a plurality of local value functions:
Figure BDA0002429556870000107
wherein h isk=[ak(0),ok(0),…,ak(T),ok(T),…,ak(T+t),ok(T=t)]Is an unmanned plane mkLocal history of, its dimension hkIs 2(T + T +1), is determined by the current time step T and the simulation time step T; pik=[ak(0),ak(1),…,ak(H-1)]Is an unmanned plane mkThe policy of (1); pi ═ pi12,…,πK]Is a joint strategy for all drones;
Figure BDA0002429556870000108
is an unmanned plane mkExecution strategy pikExpected return value of;
Figure BDA0002429556870000109
representing an execution policy pikA local value function of.
B is a belief comprising an information belief and a location belief, and is marked as B<BV,BI>. Let BkFor unmanned plane mkLocal beliefs of (1). B isIIs uncertain, and BVIs determined. At any time step t, the belief is sufficient statistic for calculating the optimal strategy, and the information states of all the moving vertexes are changed independently. Factorized information beliefs as
Figure BDA00024295568700001010
Vertex of motion viIs that
Figure BDA00024295568700001011
Variables of
Figure BDA00024295568700001012
Refers to position viHas an information state of IkConditional probability of time. The factorization beliefs and the number of the movement vertexes are in a linear increasing relation, and the calculation complexity is greatly reduced. Further, the vertex v is movediThe prediction formula of the information beliefs is as follows:
Figure BDA0002429556870000111
wherein the content of the first and second substances,
Figure BDA0002429556870000112
v' represents the vertex of the motion visited by any drone at time step t.
H∈Z+Representing the planning step.
In this embodiment, the Planning algorithm may be a factorized Belief-based Sequential allocation Monte Carlo Planning (FB-SAMCP) algorithm. Can record the set of the prior unmanned aerial vehicles
Figure BDA0002429556870000113
Note the book
Figure BDA0002429556870000114
Is mjA drone outside the set of priority policies, referred to as drone mjThe policy of the priority drone should be considered when making the decision. In this embodiment, the policy of each drone is to be executed in sequencekThe sum of the resulting revised expected reward values as a revised global cost function is recorded
Figure BDA0002429556870000115
Revising a global value function
Figure BDA0002429556870000116
The calculation formula of (2) is as follows:
Figure BDA0002429556870000117
wherein the content of the first and second substances,
Figure BDA0002429556870000118
represents executing pikThe function of the local values is revised at a time,
Figure BDA0002429556870000119
is a revised return value. Revising a global value function
Figure BDA00024295568700001110
Equivalent to the original global value function Vπ(h)。
Figure BDA00024295568700001111
And VπThe difference between them is the way of calculation. First, a local value function is revised
Figure BDA00024295568700001112
The method is sequentially calculated according to the sequence of the unmanned aerial vehicles and is equal to the sum of the revised local value functions of all the unmanned aerial vehicles. Second, the original global value function VπIs calculated according to time. Specifically, an expected local return value E for each drone at time step t is calculatedπ(Rk(t)), and the expected global return values E for all dronesπ(R (t)). Original global value function VπIs the desired expected return value Eπ(r (t)) from t ═ 0 to the sum of t ═ H-1.
Each revised local value function depends on local state characteristics that may be affected by other drones. In this embodiment, the impact of other drones is reflected in the return value with a penalty factor. The factorized revised global value function decomposes the global prediction tree into a number of local look-ahead trees.
In this embodiment, the FB-SAMCP algorithm consists of three programs: the program, the search program, and the simulation program are sequentially allocated. And each unmanned aerial vehicle executes the FB-SAMCP algorithm in parallel at each time step, and the coordination of actions is completed after multiple iterations. Wherein the actions of the drones are coordinated after each iteration, i.e. after the search and expansion of the look-ahead tree is completed.
Unmanned plane mkThe sequential allocation program is executed first, and the number of iterations executed does not exceed K. In each iteration, h is initializedkThen, the unmanned aerial vehicle executes a search program to obtain a priority strategy set pi thereofCkOptimal strategy under conditions pikSum function Vk. Unmanned plane mkWill pikAnd VkTransmit to other unmanned aerial vehicles, and receive other unmanned aerial vehicles' pi(k)And V(k). Unmanned plane mkIt is necessary to wait for messages from K-n after the nth iteration. After the comparison of V(k)And VkThen, unmanned plane mkStoring unmanned aerial vehicle corresponding to maximum function and strategy thereof, respectively expressed as m*And
Figure BDA0002429556870000121
if the unmanned aerial vehicle corresponding to the maximum function is self, then the unmanned aerial vehicle mkComplete the search and sum pikAs a policy for its current time step; otherwise, unmanned plane mkM is to be*And
Figure BDA0002429556870000122
are added to C separatelykAnd piCkIn middle, its belief Bk(hk) Will be according to piCkAnd (5) updating again.
In the search program, based on the priority strategy set pi*The local look-ahead tree is extended. First, local beliefs B are updatedk(hk) Will pi*The local information state beliefs for the visited vertices will reset to Λ
Figure BDA0002429556870000123
In particular, the information state is sampled and simulated until a termination condition is reached. If the action of the drone is deterministic, the location state and the observation of the location state is also deterministic. Furthermore, observations of the status of information are directly reflected in the reward value. The programming step size is successively reduced in the loop to ensure that the set of priority policies pi is used when executing the simulator G*With unmanned plane mkStrategies with the same depth. In the simulator G, two types of conflicts, synchronous repeat count and asynchronous repeat count, need to be considered. The synchronous repeat counting means that when a plurality of unmanned aerial vehicles access the same motion vertex at the same time, each unmanned aerial vehicle receives the return value of the motion vertex. In asynchronous repeat counting, drone mkDetermining at a time step t1∈ {0,1, …, H-1}, whereas drone m has visited the motion vertex vj∈ C has decided at time step t2∈{0,1,…,H-1},t1<t2The motion vertex is visited. At this time, unmanned plane mkExpected reward for visiting a moving vertexIs overestimated because drone m with higher priority is not consideredjIt has been decided to visit the moving vertex. To resolve this conflict, a penalty factor drone p is introducedePunishment of unmanned aerial vehicle mkOverestimated return value, i.e. peMeans unmanned plane mkUnmanned aerial vehicle m is caused to be visited after t1 visit moving vertex vjIs lost. Penalty factor peThe formula at time step t1 is:
Figure BDA0002429556870000131
wherein R (t)2) Is not considering the unmanned plane mkIn case of (2), unmanned plane mjAccessing the sampled return values of the motion vertices v at time step t 2; while
Figure BDA0002429556870000132
Considering unmanned plane mkIn case of (2), unmanned plane mjThe sampled return values for the motion vertex v are accessed at time step t 2. Let t2 be the time step closest to t1 in the asynchronous iteration calculation. Suppose that at time step t1, the information state of the motion vertex v is Ii∈ I, then the return value is R (t)1)=f(Ii). At unmanned aerial vehicle mkUpon access, the information state will be reset to I1,IiAnd I1Perform Δ t2=t2+t1The secondary state transition. Let the two information states at time step t2 be represented as
Figure BDA0002429556870000133
And
Figure BDA0002429556870000134
therefore, the temperature of the molten metal is controlled,
Figure BDA0002429556870000135
and
Figure BDA0002429556870000136
unmanned plane mkThe revised reward value at t1 is:
Figure BDA0002429556870000137
in this embodiment, the expected sampling beliefs will tend to true beliefs as the number of samples increases. Furthermore, let the belief of the moving vertex v be b (t) at time step t11). The expected raw return value at time step t2 is equal to
Figure BDA0002429556870000138
The desired revision return value at time step t2 is equal to
Figure BDA0002429556870000139
The desired penalty factor is
Figure BDA00024295568700001310
In one embodiment, the process and results of the experiment using the present solution are as follows:
experiments FA-SAMCP was compared to POMCP, TD-FMOP and SA-POMCP. POMCP is the most advanced general online planning algorithm at present. The TD-FMOP algorithm combines MCTS and Max-Sum methods to solve the problem of online planning of a loosely-coupled distributed fixed-connection unmanned aerial vehicle cluster, and frequent communication is needed among the unmanned aerial vehicles to ensure that the overall performance is approximately optimal. SA-POMCP is an extension of FB-SAMCP, and the distinction between SA-POMCP and FB-SAMCP is the way belief is expressed. SA-POMCP is represented using a particle filter, while FB-SAMCP is represented using a factor. SA-POMCP is a general algorithm for solving TD-POMDP. It can be applied to more complex problems where beliefs are difficult to express. Each simulation run was 100 time steps long, and each algorithm was run for 50 runs per scene. Experiments the performance of each algorithm was evaluated by counting the average return and average run time for each round. Let the run time of each round of the algorithm be limited to 30 minutes. All experiments were run on a computer with a 2.6GHz intel dual core CPU and 4GB memory.
The experiment mainly evaluates the influence of expandability on FB-SAMCP, POMCP, TD-FMOP and SA-POMCP, and constructs three scenes which are respectively as follows: scene one: as shown in fig. 5, which has 14 motion vertices and 25 motion boundaries, 4 drones perform a reconnaissance mission in a designated area, each drone having 2 neighbors. Scene two: as shown in fig. 6, which has 40 motion vertices and 83 motion boundaries, 12 drones perform a reconnaissance mission in a designated area, each drone having 2 neighbors. Scene three: as shown in fig. 6, which has 40 motion vertices and 83 motion boundaries, 12 drones perform a reconnaissance mission in a designated area, each drone having 11 neighbors.
The scene is a small-scale unmanned aerial vehicle reconnaissance scene with a weak coupling distribution fixed connection type structure. Compared with the first scene, the cluster in the second scene is still in a weak coupling distribution fixed connection type structure, but the number of the unmanned aerial vehicles is expanded. Compared with the second scenario, in the third scenario, the coupling degree of the unmanned aerial vehicle is expanded from weak coupling to tight coupling. For all scenes, the planning step H of the drone is 3 time steps. Each motion vertex has three information states, with the information state value vector set to F ═ 0,1,2]Corresponding to the information state I ═ I1,I2,I3]。
For scenario one, FIG. 5 depicts the average reward value. The experimental results show that the return values of FB-SAMCP are respectively 6.0%, 15.0% and 8.3% better than the return values of POMCP in the simulation of 50 times of sampling, 500 times of sampling and 5000 times of sampling. In addition, the FB-SAMCP reported values were slightly lower than the TD-FMOP for 50 samples, but about 2.4% over the TD-FMOP for 100 samples and about 5.4% over the TD-FMOP for 1000 simulations. The performance of FB-SAMCP is slightly better than SA-POMCP for all scenarios.
Table 1 describes the run time of these algorithms in scenario one, where the "-" symbol indicates the result of exceeding the time limit and memory overflow, and NoS indicates the number of samples. The run time of POMCP is much lower than that of FB-SAMCP. FB-SAMCP has a run time of about one third that of TD-FMOP, which is 2 times that of FB-SAMCP in all simulations. Furthermore, TD-FMOP exceeded the time limit in 5000 sampling simulation experiments.
Table 1:
Nos TD-FMOP FB-SAMCP POMCP SA-POMCP
50 23.9 9.1 0.8 4.5
100 50.0 17.9 1.5 9.5
500 267.7 89.0 7.2 45.7
1000 549.2 176.3 14.0 97.4
5000 - 777.5 75.3 473.4
and the second scenario evaluates the influence of the expandability of the number of the unmanned aerial vehicles on the performance of the algorithm. FIG. 6 depicts the average reward value for each algorithm at different sampling times in scenario two. When the POMCP is operated, the result cannot be calculated due to insufficient memory of a computer. Although the average reward value of FB-SAMCP is 97.0% of TD-FMOP in 50 samples, it is 3.5% higher than the average reward value of TD-FMOP in 500 samples and 1000 samples. The average recovery value of FB-SAMCP is similar to that of SA-POMCP.
Table 2 describes the average run time of several algorithms in scenario two, where the symbol "-" indicates the result of exceeding the time limit and memory overflow. Similar to the results in scenario one, FB-SAMCP has a run time of approximately twice the SA-POMCP run time, but approximately one third the TD-FMOP. In fact, performing TD-FMOP takes a lot of time since the drones need to perform frequent communication and action synchronization when making joint decisions.
Table 2:
Nos TD-FMOP FB-SAMCP POMCP SA-POMCP
50 84.6 25.9 - 13.9
100 157.4 50.3 - 26.9
500 841.3 238.6 - 143.8
1000 1716.3 456.3 - 272.2
and a third scene evaluates the influence of the expandability of the coupling degree of the unmanned aerial vehicle on the algorithm. Table 3 and table 4 show the average run time and average return values, respectively. When the POMCP is operated, the result cannot be calculated due to insufficient memory of a computer. From the results, it is known that the average return of FB-SAMCP is similar to that of SA-POMCP, while the average operation time is higher than that of SA-POMCP.
Table 3:
Nos TD-FMOP FB-SAMCP POMCP SA-POMCP
50 - 106.4 - 63.9
100 - 208.5 - 126.3
500 - 1043.0 - 584.7
table 4:
Nos TD-FMOP FB-SAMCP POMCP SA-POMCP
50 - 1115.5 - 1108.1
100 - 1153.5 - 1150.2
500 - 1206.4 - 1197.5
when constructing the look-ahead tree of the POMCP, the joint action of all unmanned aerial vehicles needs to be considered; while the actions of the neighbor drones need to be considered when building the look-ahead tree of TD-FMOP. For FB-SAMCP and SA-popcp, the local look-ahead tree for each drone has a lower branching factor because it only includes the actions of the drone itself. Therefore, FB-SAMCP and SA-POMCP still have excellent performance in a small number of sampling times. In addition, due to the lower branching factor, FB-SAMCP and SA-POMCP have better scalability in the number and coupling degree of the drones than POMCP.
It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the above-described flowcharts may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 7, there is provided an unmanned aerial vehicle cluster mission planning system, including: undirected graph generation module 710, model generation module 720, function generation module 730, and computation module 740, wherein:
and an undirected graph generating module 710, configured to obtain the environment information, and generate an undirected graph according to the environment information.
And the model generation module 720 is configured to generate an information state transition model according to the environmental information and the undirected graph, and obtain state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model.
The function generation module 730 is configured to generate a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating the execution strategy of the unmanned aerial vehicle.
The calculating module 740 is configured to obtain a planning algorithm, and calculate a target execution policy of each drone according to the planning algorithm and the global value function.
In one embodiment, the undirected graph generation module 710 comprises a feature extraction module, an information determination module, and an image generation module, wherein:
and the characteristic extraction module is used for extracting the environmental space characteristics in the environmental information.
And the information determining module is used for determining the motion boundary and the motion vertex of the unmanned aerial vehicle according to the environmental space characteristics.
And the image generation module is used for generating an undirected graph according to the motion boundary and the motion vertex.
In one embodiment, the function generation module 730 includes a step size obtaining module, an information obtaining module, and a matrix generation module, wherein:
and the step length obtaining module is used for obtaining the time step length according to the environment information.
And the information acquisition module is used for acquiring the environment state change information according to the time step and the undirected graph.
And the matrix generation module is used for generating a state transition matrix based on the Markov chain and the environment state change information and obtaining an information state transition model.
In one embodiment, the function generation module 730 is further configured to generate total status information of the drone cluster according to the status information; respectively generating a local return value function of each unmanned aerial vehicle according to the total state information and the state information; and generating a global value function according to each local return value function.
In an embodiment, the provided unmanned aerial vehicle cluster mission planning system further includes a frame establishing module, configured to establish a TD-POMDP frame according to the state information and the global value function; the calculating module 740 is further configured to calculate the target execution policy of each drone through the TD-POMDP framework according to the planning algorithm and the global value function.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for unmanned aerial vehicle cluster scout mission planning based on distributed sequential allocation. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
acquiring environment information, and generating an undirected graph according to the environment information;
generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and acquiring a planning algorithm, and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.
In one embodiment, the processor, when executing the computer program, further performs the steps of: extracting environmental space characteristics in the environmental information; determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a time step according to the environment information; acquiring environment state change information according to the time step and the undirected graph; and generating a state transition matrix based on the Markov chain and the environment state change information, and obtaining an information state transition model.
In one embodiment, the processor, when executing the computer program, further performs the steps of: generating total state information of the unmanned aerial vehicle cluster according to the state information; respectively generating a local return value function of each unmanned aerial vehicle according to the total state information and the state information; and generating a global value function according to each local return value function.
In one embodiment, the processor, when executing the computer program, further performs the steps of: establishing a TD-POMDP frame according to the state information and the global value function; and respectively calculating the target execution strategy of each unmanned aerial vehicle through the TD-POMDP framework according to a planning algorithm and a global value function.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring environment information, and generating an undirected graph according to the environment information;
generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and acquiring a planning algorithm, and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.
In one embodiment, the computer program when executed by the processor further performs the steps of: extracting environmental space characteristics in the environmental information; determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics; and generating an undirected graph according to the motion boundary and the motion vertex.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a time step according to the environment information; acquiring environment state change information according to the time step and the undirected graph; and generating a state transition matrix based on the Markov chain and the environment state change information, and obtaining an information state transition model.
In one embodiment, the computer program when executed by the processor further performs the steps of: generating total state information of the unmanned aerial vehicle cluster according to the state information; respectively generating a local return value function of each unmanned aerial vehicle according to the total state information and the state information; and generating a global value function according to each local return value function.
In one embodiment, the computer program when executed by the processor further performs the steps of: establishing a TD-POMDP frame according to the state information and the global value function; and respectively calculating the target execution strategy of each unmanned aerial vehicle through the TD-POMDP framework according to a planning algorithm and a global value function.
It will be understood by those of ordinary skill in the art that all or a portion of the processes of the methods of the embodiments described above may be implemented by a computer program that may be stored on a non-volatile computer-readable storage medium, which when executed, may include the processes of the embodiments of the methods described above, wherein any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An unmanned aerial vehicle cluster reconnaissance mission planning method based on distributed sequential distribution is characterized by comprising the following steps:
acquiring environment information, and generating an undirected graph according to the environment information;
generating an information state transition model according to the environment information and the undirected graph, and respectively acquiring state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and acquiring a planning algorithm, and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.
2. The method of claim 1, wherein the generating an undirected graph from the context information comprises:
extracting environmental space characteristics in the environmental information;
determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics;
and generating the undirected graph according to the motion boundary and the motion vertex.
3. The method of claim 1, wherein generating an information state transition model based on the environmental information and the undirected graph comprises:
acquiring a time step according to the environment information;
acquiring environment state change information according to the time step and the undirected graph;
and generating a state transition matrix based on the Markov chain and the environment state change information, and obtaining the information state transition model.
4. The method of claim 1, wherein generating a global value function corresponding to the cluster of drones from the state information comprises:
generating total state information of the unmanned aerial vehicle cluster according to the state information;
respectively generating a local return value function of each unmanned aerial vehicle according to the total state information and the state information;
and generating the global value function according to each local return value function.
5. The method of claim 1, further comprising:
establishing a TD-POMDP frame according to the state information and the global value function;
the calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function respectively comprises:
and respectively calculating the target execution strategy of each unmanned aerial vehicle through the TD-POMDP framework according to the planning algorithm and the global value function.
6. An unmanned aerial vehicle cluster mission planning system, the system comprising:
the undirected graph generating module is used for acquiring environment information and generating an undirected graph according to the environment information;
the model generation module is used for generating an information state transition model according to the environment information and the undirected graph and respectively acquiring the state information of each unmanned aerial vehicle in the unmanned aerial vehicle cluster according to the information state transition model;
the function generation module is used for generating a global value function corresponding to the unmanned aerial vehicle cluster according to the state information; the global value function is used for calculating an execution strategy of the unmanned aerial vehicle;
and the calculation module is used for acquiring a planning algorithm and respectively calculating the target execution strategy of each unmanned aerial vehicle according to the planning algorithm and the global value function.
7. The system according to claim 6, wherein the undirected graph generation module comprises:
the characteristic extraction module is used for extracting environmental space characteristics in the environmental information;
the information determining module is used for determining a motion boundary and a motion vertex of the unmanned aerial vehicle according to the environmental space characteristics;
and the image generation module is used for generating the undirected graph according to the motion boundary and the motion vertex.
8. The system of claim 6, wherein the model generation module comprises:
the step length obtaining module is used for obtaining a time step length according to the environment information;
the information acquisition module is used for acquiring environment state change information according to the time step length and the undirected graph;
and the matrix generation module is used for generating a state transition matrix based on the Markov chain and the environment state change information and obtaining the information state transition model.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 5 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
CN202010232017.5A 2020-03-27 2020-03-27 Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation Active CN111414006B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010232017.5A CN111414006B (en) 2020-03-27 2020-03-27 Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010232017.5A CN111414006B (en) 2020-03-27 2020-03-27 Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation

Publications (2)

Publication Number Publication Date
CN111414006A true CN111414006A (en) 2020-07-14
CN111414006B CN111414006B (en) 2023-09-08

Family

ID=71494617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010232017.5A Active CN111414006B (en) 2020-03-27 2020-03-27 Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation

Country Status (1)

Country Link
CN (1) CN111414006B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784211A (en) * 2020-08-04 2020-10-16 中国人民解放军国防科技大学 Cluster-based group multitask allocation method and storage medium
CN112131730A (en) * 2020-09-14 2020-12-25 中国人民解放军军事科学院评估论证研究中心 Freezing analysis method and device for group intelligent unmanned system
CN113111441A (en) * 2021-04-26 2021-07-13 河北交通职业技术学院 Method for constructing cluster unmanned aerial vehicle task model based on adjacency relation
CN114722946A (en) * 2022-04-12 2022-07-08 中国人民解放军国防科技大学 Unmanned aerial vehicle asynchronous action and cooperation strategy synthesis method based on probability model detection

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286071A (en) * 2008-04-24 2008-10-15 北京航空航天大学 Multiple no-manned plane three-dimensional formation reconfiguration method based on particle swarm optimization and genetic algorithm
US20160018224A1 (en) * 2013-09-27 2016-01-21 Regents Of The University Of Minnesota Symbiotic Unmanned Aerial Vehicle and Unmanned Surface Vehicle System
CN106705970A (en) * 2016-11-21 2017-05-24 中国航空无线电电子研究所 Multi-UAV(Unmanned Aerial Vehicle) cooperation path planning method based on ant colony algorithm
WO2017177533A1 (en) * 2016-04-12 2017-10-19 深圳市龙云创新航空科技有限公司 Method and system for controlling laser radar based micro unmanned aerial vehicle
CN107632614A (en) * 2017-08-14 2018-01-26 广东技术师范学院 A kind of multiple no-manned plane formation self-organizing cooperative control method theoretical based on rigidity figure
EP3349086A1 (en) * 2017-01-17 2018-07-18 Thomson Licensing Method and device for determining a trajectory within a 3d scene for a camera
US20180268720A1 (en) * 2017-03-14 2018-09-20 Tata Consultancy Services Limited Distance and communication costs based aerial path planning
KR20190086081A (en) * 2018-01-12 2019-07-22 한국과학기술원 Multi­layer­based coverage path planning algorithm method of unmanned aerial vehicle for three dimensional structural inspection and the system thereof
US20200041623A1 (en) * 2018-02-05 2020-02-06 Centre Interdisciplinaire De Developpement En Cartographie Des Oceans (Cidco) METHOD AND APPARATUS FOR AUTOMATIC CALIBRATION OF MOBILE LiDAR SYSTEMS

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286071A (en) * 2008-04-24 2008-10-15 北京航空航天大学 Multiple no-manned plane three-dimensional formation reconfiguration method based on particle swarm optimization and genetic algorithm
US20160018224A1 (en) * 2013-09-27 2016-01-21 Regents Of The University Of Minnesota Symbiotic Unmanned Aerial Vehicle and Unmanned Surface Vehicle System
WO2017177533A1 (en) * 2016-04-12 2017-10-19 深圳市龙云创新航空科技有限公司 Method and system for controlling laser radar based micro unmanned aerial vehicle
CN106705970A (en) * 2016-11-21 2017-05-24 中国航空无线电电子研究所 Multi-UAV(Unmanned Aerial Vehicle) cooperation path planning method based on ant colony algorithm
EP3349086A1 (en) * 2017-01-17 2018-07-18 Thomson Licensing Method and device for determining a trajectory within a 3d scene for a camera
US20180268720A1 (en) * 2017-03-14 2018-09-20 Tata Consultancy Services Limited Distance and communication costs based aerial path planning
CN107632614A (en) * 2017-08-14 2018-01-26 广东技术师范学院 A kind of multiple no-manned plane formation self-organizing cooperative control method theoretical based on rigidity figure
KR20190086081A (en) * 2018-01-12 2019-07-22 한국과학기술원 Multi­layer­based coverage path planning algorithm method of unmanned aerial vehicle for three dimensional structural inspection and the system thereof
US20200041623A1 (en) * 2018-02-05 2020-02-06 Centre Interdisciplinaire De Developpement En Cartographie Des Oceans (Cidco) METHOD AND APPARATUS FOR AUTOMATIC CALIBRATION OF MOBILE LiDAR SYSTEMS

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIN ZHOU: "Online Planning for Multiagent Situational Information Gathering in the Markov Environment", IEEE *
陈少飞: "无人机集群系统侦察监视任务规划方法", pages 21 - 64 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784211A (en) * 2020-08-04 2020-10-16 中国人民解放军国防科技大学 Cluster-based group multitask allocation method and storage medium
CN111784211B (en) * 2020-08-04 2021-04-27 中国人民解放军国防科技大学 Cluster-based group multitask allocation method and storage medium
CN112131730A (en) * 2020-09-14 2020-12-25 中国人民解放军军事科学院评估论证研究中心 Freezing analysis method and device for group intelligent unmanned system
CN112131730B (en) * 2020-09-14 2024-04-30 中国人民解放军军事科学院评估论证研究中心 Fixed-grid analysis method and device for intelligent unmanned system of group
CN113111441A (en) * 2021-04-26 2021-07-13 河北交通职业技术学院 Method for constructing cluster unmanned aerial vehicle task model based on adjacency relation
CN113111441B (en) * 2021-04-26 2023-01-31 河北交通职业技术学院 Method for constructing cluster unmanned aerial vehicle task model based on adjacency relation
CN114722946A (en) * 2022-04-12 2022-07-08 中国人民解放军国防科技大学 Unmanned aerial vehicle asynchronous action and cooperation strategy synthesis method based on probability model detection

Also Published As

Publication number Publication date
CN111414006B (en) 2023-09-08

Similar Documents

Publication Publication Date Title
CN111414006B (en) Unmanned aerial vehicle cluster reconnaissance task planning method based on distributed sequential allocation
Zhao et al. Systemic design of distributed multi-UAV cooperative decision-making for multi-target tracking
CN110059385B (en) Grid dynamics scenario simulation method and terminal equipment coupled with different-speed growth
Zeng et al. Exploiting model equivalences for solving interactive dynamic influence diagrams
CN111367317A (en) Unmanned aerial vehicle cluster online task planning method based on Bayesian learning
Lanillos et al. Minimum time search for lost targets using cross entropy optimization
Zhou et al. Bayesian reinforcement learning for multi-robot decentralized patrolling in uncertain environments
Kyriakakis et al. A cumulative unmanned aerial vehicle routing problem approach for humanitarian coverage path planning
CN114261400B (en) Automatic driving decision method, device, equipment and storage medium
US11763191B2 (en) Virtual intelligence and optimization through multi-source, real-time, and context-aware real-world data
CN111983923B (en) Formation control method, system and equipment for limited multi-agent system
Zhou et al. Online planning for multiagent situational information gathering in the Markov environment
Fu et al. Mobile robot object recognition in the internet of things based on fog computing
Chen et al. Multi-agent patrolling under uncertainty and threats
CN113566831B (en) Unmanned aerial vehicle cluster navigation method, device and equipment based on human-computer interaction
CN117195976A (en) Traffic flow prediction method and system based on layered attention
Wu et al. Adaptive submodular inverse reinforcement learning for spatial search and map exploration
Panov Simultaneous learning and planning in a hierarchical control system for a cognitive agent
Thangeda et al. Adaptive sampling site selection for robotic exploration in unknown environments
CN110727291B (en) Centralized cluster reconnaissance task planning method based on variable elimination
US20220107628A1 (en) Systems and methods for distributed hierarchical control in multi-agent adversarial environments
Weng et al. Big data and deep learning platform for terabyte-scale renewable datasets
CN113642592B (en) Training method of training model, scene recognition method and computer equipment
Felicioni et al. Goln: Graph object-based localization network
Haggerty et al. What, how, and when? A hybrid system approach to multi-region search and rescue

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant