CN112907015A

CN112907015A - UUV self-adaptive task planning method under uncertain condition

Info

Publication number: CN112907015A
Application number: CN202011100273.5A
Authority: CN
Inventors: 马硕; 杨振宇; 张炜; 胡英娣; 李卉; 许杰; 吴小兵; 韩守鹏; 刘海波
Original assignee: People's Liberation Army 92578
Current assignee: People's Liberation Army 92578
Priority date: 2019-10-15
Filing date: 2020-10-15
Publication date: 2021-06-04

Abstract

The invention discloses a UUV self-adaptive task planning method under an uncertain condition, which comprises the steps of dividing an initial task target into a target set G to be executed according to task requirements₀And a spare target set B₀(ii) a II, according to G₀Generating an initial mission plan Π₀(ii) a And are opposite₀The success probability is evaluated, and if the success probability of the initially planned task plan meets the given threshold value requirement, the task plan is loaded into a UUV control system to be executed; otherwise, the target set needs to be reset; thirdly, evaluating the success rate of the subsequent task plan according to the task type by the current state of the UUV system, comparing the success rate with a set threshold value, if the success rate of the task is smaller, calling a task target deleting algorithm RAMG, otherwise calling an increased task target algorithm AAMG, generating a new target set G, re-planning according to G to generate a new task plan, and continuously executing the new task plan by the UUV; fourthly, all actions in the task plan are executed in sequenceThe task is ended; the invention can effectively improve the autonomous ability of the UUV.

Description

UUV self-adaptive task planning method under uncertain condition

Technical Field

The invention belongs to the field of task planning of Unmanned Underwater Vehicles (UUV), and particularly relates to a UUV self-adaptive task planning method under an uncertain condition.

Background

In the underwater task execution process of the UUV, factors such as hydrology and geography, task targets and system working states are likely to change, the UUV can work completely by means of manual remote control or in a pre-programmed mode, and cannot timely and effectively cope with complex and variable dynamic environments, so that the UUV is required to be capable of realizing real-time autonomous situation perception, evaluation and action planning, and a correct decision can be made without manual intervention. The task planning method under the uncertain condition is a hot research problem in recent years, and related research methods for solving the uncertain task planning problem mainly comprise a Markov decision process method, a re-planning method, a branch planning method, a planning problem decomposition method, a plan modification method and the like.

1. Task planning method based on Markov decision process

In the Markov Decision Process (MDP), a "state transition function" is constructed, each uncertain effect is correspondingly given a corresponding probability value, and an optimal strategy capable of obtaining the maximum reward value is found under the condition of giving the state transition probability. The main problem of the markov decision process method is that under the condition of large scale of the state space, the solution calculation amount is large (NP-hard problem), especially for the condition that the state variables are continuous (the number of the state spaces is infinite), thereby limiting the range of practical application. One solution is to discretize the continuous state space, or reduce the number of state spaces, but modeling methods are relevant to the problem domain specifically addressed. For example, Capitan et al adopts a Partially Observable Markov Decision Process (POMDP) method to solve the task planning problem of multiple unmanned aerial vehicles for reconnaissance and multiple targets, and in order to reduce the computational complexity, each UAV plans a local task by the POMDP method and completes global task allocation based on an auction method; redding et al propose a GA-Dec-MMDP (genetic algorithm-decision-directed graph) in a grouped aggregation Markov decision process, which can greatly reduce the number of state spaces, is suitable for the problem of large-scale unmanned aerial vehicle cluster task planning, but cannot realize optimization. In addition, the value iterative algorithm based on the heuristic forward state space search can also improve the calculation efficiency, and the returned strategy is not necessarily an optimal solution and the algorithm is incomplete.

2. Rescheduling method

Yoon et al propose a re-planning method FF-Replan to solve the problem of uncertain task planning.

And the FF-Replan is based on the fast forward planner FF, and when the system enters an unexpected state in the task execution process, the FF is started to be re-planned to generate a new task plan. Since FF is a deterministic task planning method, FF-Replan deals with uncertainty problems in two ways: firstly, the executed action has only one deterministic result (the result can be determined by adopting a relevant heuristic rule, such as the result with the maximum occurrence probability) and secondly, the executed action generates all possible results.

In the task planning problem, the system resource usage is continuously changed, and there are infinite possible results when each related action is executed, so when the FF-Replan planning task is used, only a deterministic single-result processing mode can be adopted, for example, the energy consumption value of each action is defined as the mean value of the corresponding distribution. However, according to the characteristics of the probability distribution, the probability that the energy consumption is equal to the mean value is 0, which causes the FF-Replan planner to judge that the system enters an unspecified state after each action is performed, and thus trigger task re-planning. Another characteristic of the UUV task is that abnormal conditions such as system failure or insufficient resources may occur, and the FF-Replan does not have the capability of evaluating the occurrence probability of special conditions and the capability of planning to process abnormal events, so that the FF-Replan method is not suitable for UUV task planning.

3. Branch planning method

The method for processing the uncertain planning problem is mainly characterized in that a new task plan branch is added at a task execution node with high possibility of generating uncertain results, and the original task is adjusted at the branch node according to the actual execution condition of the task plan. Coles provides a branch task planning method under the uncertain resource condition, and in the off-line task planning stage, the method assumes that the task environment, the task target, the emergency and the possible position are known, so as to determine the branch node and the execution condition in the initial plan, and the problems exist that the use flexibility is poor, and the task environment and all the possible event information need to be accurately obtained. Gough et al comparatively analyzed the performance of two planning methods, namely, branched planning method and unbranched planning method, under the same task condition, although the branched planning method can improve the task completion benefit to a certain extent (18.31%), a large amount of planning time is consumed.

The branch planning method has some problems in practical application: firstly, it is difficult to determine reasonable branch nodes. In the off-line task planning stage, the initial task plan needs to be evaluated, and the required branch plan nodes are calculated according to a certain criterion, but the action completion result is possibly greatly different from the prediction result during task execution, so that the method is not suitable for complex and changeable task scenes; secondly, only a limited number of action results can be processed, and the requirement of uncertain planning of continuous state change cannot be met.

4. Problem decomposition planning method

The basic idea of the decomposition planning method is to decompose a complex problem into a series of relatively independent subtasks, and combine the subtasks into a complete task plan after solving the problem respectively. Yang provides a task target decomposition method GDECOMP, divides an initial task target into a plurality of target subsets through a machine learning algorithm, plans each target subset respectively to obtain a related subtask plan, and processes 'conflict' between the subtasks through a constraint satisfaction problem method. Bibai et al propose a metaheuristic evolutionary programming algorithm DAEX based on state decomposition, but the calculation amount of subtask combination is large, and the online task programming efficiency is greatly influenced by the heuristic algorithm.

In the off-line task planning stage, the problem decomposition planning method needs to analyze the specified task planning problem in detail, comprehensively considers various possible task execution results, decomposes the task into a series of task sub-targets as a whole, and merges corresponding task plan subsets according to the actual situation in the task execution process. The main problem of the processing mode is that the execution results of various tasks need to be comprehensively considered in advance, and the processing mode is not suitable for more complex tasks or tasks with higher requirements on real-time performance.

5. Plan modification planning method

Unlike mission re-planning, the basic idea of the plan modification method is to satisfy new mission requirements by modifying some elements of the original mission plan. Cashmore and the like provide an AUV task planning system based on benefits, online task scheduling can be realized, the sequence of task target execution is adjusted to adapt to the current working condition of the AUV or obtain larger task benefits, but tasks can only be modified based on the current task execution state, and relevant adjustments cannot be made in advance through predictive reasoning. Woods and the like propose a plan modification planning method based on a template, wherein a series of plan segments are generated off line and stored in a plan library according to the types of emergencies before a task is executed, and when a robot encounters an emergency in the process of executing the task, the related plan segments are called from the plan library and inserted into a current task plan.

Intuitively, the cost (such as the amount of calculation) of modifying the plan should be less than that of task re-planning, but the research of Nebel and the like shows that the task re-planning method is more advantageous because the task modification involves the calculation of matching, comparing, adding, deleting actions and the like of the task plan.

The classical planning method can efficiently generate action sequences meeting the planning target, but UUV task generation belongs to the field of uncertain planning, and the classical planning method cannot be used; the planning method based on the Markov decision process has overlarge state space quantity and low solving efficiency; the traditional re-planning method has single processing mode and poor flexibility; most of the existing branch planning methods, problem decomposition methods and plan modification methods need to be adjusted according to preset processing flows, and have the problems of large calculated amount, complex implementation and the like, and the adaptability to dynamic complex environmental conditions is poor. By comparing the advantages and disadvantages of the above methods, it can be seen that: compared with the method that the prediction result is analyzed in the off-line task planning stage, the actual operation information of the system can be effectively fused by using the on-line task monitoring function, and the more accurate prediction and judgment can be made on the operation process of the system; the task re-planning has relatively good real-time performance and is relatively simple to realize, but can be 'passively' planned only according to the generated result.

Disclosure of Invention

In view of the above, the invention provides a UUV adaptive task planning method under an uncertain condition, which can combine online monitoring, evaluation prediction and task adjustment to adapt to a constantly changing task condition, and can obtain a maximum task benefit, thereby effectively improving the autonomous capability of the UUV.

The technical scheme for realizing the invention is as follows:

a UUV self-adaptive task planning method under an uncertain condition comprises the following steps:

step one, before a UUV launches, dividing an initial task target into a target set G to be executed according to task requirements₀And a spare target set B₀；G₀The target task which must be completed to achieve the purpose of task action belongs to the target of the task to be executed; b is₀In the abnormal conditionNew task targets can be adjusted, or more tasks can be completed to improve the use benefit under the condition that system resources are remained;

step two, the task planner bases on G₀Generating an initial mission plan Π₀(ii) a According to resource calculation formula pair pi₀The success probability is evaluated, and if the success probability of the initially planned task plan meets the given threshold value requirement, the task plan is loaded into a UUV control system to be executed; otherwise, the target set needs to be reset;

step three, respectively evaluating the success rate of subsequent task plans according to the task types by the current state of the UUV system, comparing the success rate with a set threshold value, if the success rate of the task is relatively small, calling a task target deleting algorithm RAMG, otherwise calling an increased task target algorithm AAMG, generating a new target set G, re-planning according to the target set G to generate a new task plan, and continuously executing the new task plan by the UUV;

and step four, finishing all actions in the task plan according to the sequence, and finishing the task.

Furthermore, the self-adaptive task planning method is realized by three functional modules, namely a task planning module, a task target management module and a task online monitoring module.

Furthermore, the task on-line monitoring module is provided with a data communication interface with the bottom control system of the unmanned system, can acquire the working state and fault information of components of each system group in real time, and evaluates the tasks in real time according to the working conditions of the system; and monitoring the working state of the system in real time, updating the target task set and adjusting the task plan according to the situation.

Further, the task target management module adjusts the task target according to the system working information, and starts the task planning module after adjustment; in the task execution process, the self-adaptive task adjustment is realized by selecting a task target to be executed; the goal selection aims at optimizing a task plan, reducing task risks and improving task benefits, and the selection mode comprises an adding mode and a deleting mode; adding the task target refers to selecting a target from the standby task target set to a task target set to be executed; deleting a task object refers to removing a related object in the set of task objects to be executed to the set of standby task objects.

Further, the task planning module generates a task plan according to the current system state and the task target.

Furthermore, the self-adaptive task planning method adopts a deterministic planning algorithm and generates an initial task planning scheme according to a given task target; because more random factors exist in the UUV task planning problem, after the generation of an offline plan or the re-planning of an online task, a task evaluation model is needed to be used for calculating the success rate of the task and evaluating the feasibility of a new plan meeting an expected target; meanwhile, by taking the definition of a return function in a Markov decision process method as reference, assigning values to each task target, and evaluating a task expected return value according to the task success rate;

in the task action execution process, the planning system monitors the residual resources of the system in real time and calculates the success probability of completing the subsequent task target; and if the success probability of the task is lower than a set threshold value, adjusting a task target set according to the properties of various system events, starting re-planning and optimizing a subsequent task scheme.

Has the advantages that:

1. the online monitoring, the evaluation prediction and the task re-planning adjustment are combined, and the problems of single processing mode, poor flexibility and the like of the traditional re-planning method are solved.

2. The modeling of the complex uncertain task planning problem does not need to introduce huge state space information, and the problems of overlarge state space quantity, low solving efficiency and the like of the traditional Markov decision process-based method are solved.

3. The system can simultaneously monitor and adaptively adjust the factors such as the service condition of system resources, system faults, external control instructions and the like in real time, and meets the complex and variable real-time task requirements.

Drawings

FIG. 1 is a diagram of an adaptive mission planning system.

Fig. 2 is an information processing flow of the adaptive mission planning system.

Fig. 3 is a data storage space usage staging.

Fig. 4 is a schematic diagram of a process of delivering task loads.

Fig. 5 shows the task success rate variation of simulation example 2.

Fig. 6 shows the variation of the success rate of the task in simulation example 3.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

The invention provides a UUV self-adaptive task planning method under an uncertain condition, which comprises the following specific processes:

1. task representation method

And according to the characteristics of UUV underwater tasks, referring to PDDL related definitions, and formally expressing the planning problem. The world model of the PDDL is represented by various system states (states) which are composed of logical propositions and predicate primitives, called state variables. The goals (goals) of the planning problem can be represented by propositional examples, and when all propositional conditions of a goal are satisfied, the goal state (goal state) is realized.

Let the UUV action set be a, and any action a e of the UUV defined by a is shown in table 1.

Table 1 UUV action definitions

The system state S ∈ S is defined by a set of state variables, and some examples of state variable definitions are shown in table 2.

TABLE 2 System State variable definitions

2. Overall framework of self-adaptive task planning method

The self-adaptive task planning method is composed of a task planning module, a task target management module, a task on-line monitoring module and the like, and is shown in figure 1. The task planning module generates a task plan according to the current system state and the task target; the task on-line monitoring module is provided with a data communication interface with the bottom layer control system of the unmanned system, can acquire the working state and fault information of each system component in real time, and evaluates a task in real time according to the working condition of the system; and the task target management module adjusts the task target according to the system working information and starts the task planning module after adjustment.

The information processing flow of the adaptive mission planning method is shown in fig. 2. Unlike conventional uncertain planning methods, this method employs a deterministic planning algorithm to generate an initial mission plan (a set of consecutive sets of action sequences) based on a given mission objective. Because more random factors exist in the UUV task planning problem, after the generation of an offline plan or the re-planning of an online task, a task evaluation model is needed to be used for calculating the success rate of the task and evaluating the feasibility of a new plan meeting an expected target; meanwhile, by taking the definition of a return function in the Markov decision process method as a reference, each task target is assigned with a value, and the expected return value of the task is evaluated according to the success rate of the task.

In the task action execution process, the planning system monitors the residual resources of the system in real time and calculates the success probability of completing the subsequent task target; and if the task success probability exceeds a set threshold value, adjusting a task target set according to the properties of various system events, starting re-planning and optimizing a subsequent task scheme.

(1) Task planning module

The task planning module realizes the functions of initial task planning and on-line task re-planning, has higher solving efficiency on the performance requirement, and needs a planner to have certain numerical optimization reasoning capability (namely, the desired task target is completed with less action cost) in part of task scenes. And when the task target needs to be adjusted, the task planning module is called by the task target management module. In order to meet the task planning efficiency requirement, mature Fast classical planners can be considered as task planners, and typical Fast planners are such as Metric-FF, Fast Down ward and the like.

(2) Task online evaluation module

The task on-line evaluation module has the main functions of monitoring the working state of the system in real time, updating a target task set and adjusting a task plan according to the situation. After the task plan is determined, the task success rate mainly depends on the use condition of various system resources in the actual task environment. If the system resources are insufficient to achieve the task goal to be completed, the task fails. The resource usage of each action may be modeled by an associated probability model.

Modeling of resource types with continuous monotone decreasing change

A typical continuously monotonically decreasing changing resource is an energy source. All actions in the task plan consume energy resources, and energy cannot be supplemented in the task execution process, so that the total energy used by the UUV is the sum of all action consumption energy random variables. According to the relevant conclusion of probability theory, the sum of a limited number of independent normal random variables still obeys normal distribution, so that the total energy utilization random variable obeys normal distribution. In the task execution process, the UUV control system collects energy use parameters in real time, so that the number of the residual energy can be accurately calculated, the current residual energy is set to be Ce (known constant), and the total energy use amount in subsequent actions is Te to (TeN (mu, sigma)²) Event S represents the successful completion of the mission plan (related only to energy resources). Under the condition of the current energy Ce, the successful completion of the subsequent actions meets the condition that Te is more than or equal to 0 and less than or equal to Ce, and the success rate P (S) of the task has the expression:

② resource modeling capable of continuously increasing and decreasing change

A typical resource is data storage space. Different from energy resources, the data storage space is a reusable resource, so that the data sending and collecting actions can not be simply processed into the sum of random variables by referring to the energy resources, and the change condition in the task execution process needs to be considered integrally. Influenced by factors such as navigation errors and the like, the storage space occupied by the collected data can be regarded as a random variable obeying normal distribution. Although the UUV can accurately control the amount of data to be sent, under special conditions such as poor hydrometeorology conditions, data cannot be transmitted or only partial data can be transmitted and then interrupted, assuming that an incomplete data set cannot be used, the condition is processed according to transmission failure; thus, the storage space freed by UUV transfer data can be viewed as obeying bernoulli distribution.

The mission plan is processed in stages according to the characteristics of the data storage space resource usage, as shown in FIG. 3 for example. Staging nodes are placed before each transfer action, where the numbers in the boxes indicate the amount of memory usage (averaged for the collected data), negative values indicate occupied memory, positive values indicate freed memory, and 0 indicates no change to memory. Let event T_nIndicating successful execution of the task to T_nNode (n is a non-negative integer and represents the number of stages of planned division), event T_n' is indicated at T_nThe node residual storage space is not less than 0. Is provided with C_nIndicates the value of the storage space released by the transmission data in the nth stage (when n is 1, the initial value of the data storage space), and mu_nIs the mean, σ, of the recorded data volume of the nth stage_nIs the standard deviation, x_nRepresenting the random variation of the data recorded in the nth stage, and the probability density function is f (x)_n). Let the data recording process of each phase not influence each other, so x₁,…,x_nAre independent of each other.

Let A₁、A₂…A_nIndependent events, event B, that are successfully executed in order in turn for a task planning action Transdata (uuv, d)₁,...,B_mForm a complete event group, wherein

Event E represents the successful completion of the mission plan (related only to storage space resources). Obtaining an expression P (E) by a total probability formula as follows:

wherein, P (B)_i) Calculating according to the success probability value of each given transmission action; p (T)₁'T₂'…T_n'|B_i) The expression of (a) is:

resource modeling of discrete monotone decreasing change

A typical discrete monotonically decreasing resource is a task load, and the probability distribution of the resource usage of the task load is related to the specific task. Taking UUV carrying weapons as an example, the target hit by one attack obeys Bernoulli distribution, and the number of consumed weapons is related to the target damage standard and the weapon hit probability. For ease of handling, the number of weapons required to destroy a target using a particular type of weapon is a constant number r (e.g., 2 torpedoes are required to sink a large destroyer), and thus the number of weapons required to fire r hits on the target follows a Pascal distribution. And according to the relation between the negative binomial distribution and the Pascal distribution, adopting the negative binomial distribution to describe the load resource consumption.

Setting the current residual load number as N_pN targets need to be hit, and the successful destruction of each target requires r_i(i-1, …, n) loads (e.g. torpedo, etc.), k_iRepresenting the number of missed loads for the ith target, with a hit probability of p, then k₁+…+k_n～NB(r₁+…+r_nP) and 0. ltoreq. k₁+…+k_n≤N_p-(r₁+…+r_n). Event W represents the successful completion of the mission plan (associated only with mission payload resources), at the current payload quantity N_pUnder the condition of (1), the success rate P (W) of completing the follow-up action is expressed as (wherein I is a regular incomplete beta function):

P(W)＝I_1-p(r₁+…+r_n,1+N_p-r₁-…-r_n)

(3) task object management module

In the task execution process, the self-adaptive task adjustment is realized by selecting the task target to be executed. The goal selection aims to optimize the task plan, reduce the task risk and improve the task benefit, and the selection mode comprises two modes of addition and deletion. Adding the task target refers to selecting a target from the standby task target set to a task target set to be executed; deleting a task object refers to removing a related object in the set of task objects to be executed to the set of standby task objects. In addition, under actual task conditions, task targets are not necessarily independent of each other, and certain relevance may exist. Therefore, when the task targets are adjusted, the related task targets are synchronously adjusted according to the association relation. The pseudo code of the task object Management algorithm mgma (session Goals Management algorithm algorithmm) is shown in table 3. Before a task begins, the UUV operator decides an initial task goal based on the specific task requirements. According to task requirements, the initial task target can be divided into a target set G to be executed₀And a spare target set B₀。G₀Is a target task which must be completed to achieve the purpose of task action, and belongs to the task to be executed

Marking; b is₀The method considers the new task target which can be adjusted under the abnormal condition, or completes more tasks under the condition that system resources are remained so as to improve the use benefit. In the off-line task phase, the task planner is according to G₀An initial mission plan is generated (line 3). If it is at firstIf the success probability of the initially planned task plan meets the given threshold value requirement, the task plan is loaded into the UUV control system to be executed (line 11); otherwise the target set needs to be reset.

Adding task target algorithm

The pseudo code of the basic flow of the added task target algorithm AAMG (addtion algorithm of mission targets, AAMG) is shown in table 4.

And the AAMG algorithm increases the targets in the standby target set in turn according to the greedy principle from large to small in the benefit value (line 4), if the targets meeting the task success rate requirement are searched, new targets are further searched until all the targets to be searched are completed, and the updated task target set is returned (line 15). In the MAX _ B function, for the task targets with the same benefit value, the task targets with relatively small consumption are firstly added according to the average value of the consumed resources.

Algorithm for deleting task target

According to the characteristics of the UUV task, a task target deletion algorithm RAMG (random of transmission targets) is designed, and the basic flow pseudo code is shown in table 5.

The RAMG is divided into two procedures of reducing and expanding a target set. Firstly, the number of deleted task targets is increased one by one, a minimum target set which can meet the task success rate requirement can be obtained, and the target with the minimum benefit value is deleted each time according to the greedy principle, which is shown in lines 6-14. Because the process of reducing the target is that the target benefit value is large

Smaller than a metric, there may be a goal of less resource consumption and meeting task success rate requirements in the deleted target set. In order to improve the task execution efficiency, all deleted task targets need to be traced back on the basis of the minimum target set, feasible task targets capable of being increased are found, and the search process is expanded to be similar to the AAMG algorithm. In the MIN _ B function, task targets with the same benefit value are sorted according to the average value of consumed resources, and the task targets with relatively large consumption are deleted firstly.

(3) Task object association

According to the characteristics of the UUV task, four types of association relations of the task target are defined and shown in a table 6.

TABLE 6 task object Association relationship definition

From this table, a relational structure of task object sets can be established. In the RAMG and AAMG algorithms, when the operation of deleting and adding the target is involved, the related target should be synchronously processed according to the task relationship structure. According to the definition of the task relationship, in the task target set to be executed, the target selection operation should conform to the following rules:

rule 1: the task targets with independent relations, and the addition or deletion of one task target has no influence on other task targets.

Rule 2: task object g with one-way dependency₁→g₂G needs to be deleted₂When it is, then g₁And then deleted; g needs to be deleted₁Then according to the specific constraint condition defined between task targets determining whether to delete g₂(ii) a It is necessary to increase g₂Then, whether to increase g is determined according to task target constraint requirements₁Increasing g with satisfactory task success rate₁(ii) a If g does not exist in the task object set₂Then g cannot be increased alone₁。

Rule 3: when a task object with a bidirectional dependency relationship needs to be added or deleted, all other task objects related to the task object are synchronously added or deleted.

Rule 4: task targets with an incompatibility relationship (appearing in the standby task target set), the task benefit values cannot be equal; when the target needs to be increased, the target of the maximum benefit value is increased; and when the target needs to be deleted, deleting the target with the minimum benefit value.

(V) detailed description of the preferred embodiments

The UUV task load cabin loads material loads and throws the material at a designated position point, which is shown in figure 4.

The UUV starts to execute a task in a submerged mode at the position of the waypoint 1, and the UUV plans to deliver task loads at two designated positions of the waypoint2 and the waypoint 4. In order to improve the delivery position precision, a satellite calibration point (waypoint 3) is arranged in front of the waypoint4, and a recovery point (waypoint 5) is arranged. The voyage between waypoints is shown in table 7.

TABLE 7 voyage ranges (unit: kilometer)

The task object set is G ═ real 1 ═ (have _ load traffic 2 load1), real 2 ═ calibranched myauv traffic 3, real 3 ═ have _ load myauv traffic 4 load2, and real 4 ═ missing _ completed myauv traffic 5.

TABLE 8 spare voyage range (unit: kilometer)

Flight segment	Waypoint4 → 6	Waypoint6 → 7	Waypoint6 → 5
				Voyage	200	500	1100

In order to further improve the task benefit, two standby task targets are set, one is a standby load delivery point (waypoint 6), and the other is a standby recovery point (waypoint 7), so that the return trip mileage is shortened compared with the original planned recovery point, the purpose is to reduce the probability that UUV can not be recovered due to insufficient energy, and the standby route section is shown in a table 8.

The standby task target set backup _ G is (have _ load myauv waypoint6 load3), and bgoal2 is (miss _ completed myauv waypoint 7). The constraint relationships between task objectives are shown in table 9.

TABLE 9 task object set Association

And setting the autonomous navigation error of the UUV as 0.2% range. In order to facilitate task load marine search, the requirement of delivery accuracy is not more than 0.5 kilometer. Therefore, the one-way dependency relationship of the gold 2 → (gold 3, bgoal1) is constrained by the delivery accuracy condition, after deleting the gold 2, the error of navigating to waypoint4 is 2.2 kilometers, which is greater than the delivery accuracy requirement, and the gold 3 is deleted, and similarly, the bgoal1 is deleted.

The task target benefit value adopts a scoring evaluation method, the value range is 0-10 points, and the result is shown in a table 10. The delivery demand of bgoal1 is relatively weak, while the recovery point of bgoal2 is far from the wharf and is difficult to implement, so the spare target benefit value is relatively low.

TABLE 10 Targeted benefit values for each task

The UUV leaves 500 kw of energy before launching the water to perform the mission.

Task success rate threshold P_dSet to 0.841 (corresponding to the energy consumption doubled by the standard deviation), P_aSet to 0.9 (corresponding to an energy consumption of mean plus 1.282 standard deviations).

The basic flow is as follows:

step 1: generating an initial mission plan

Planning the initial task target set G by adopting a planner to obtain an initial task plan Π₀The following were used:

action1:dive(myauv)

action2:navigate(myauv waypoint1 waypoint2)

action3:releaseload(myauv load1 waypoint2)

action4:navigate(myauv waypoint2 waypoint3)

action5:surface(myauv)

action6:calibrate(myauv waypoint3)

action7:dive(myauv)

action8:navigate(myauv waypoint3 waypoint4)

action9:releaseload(myauv load2 waypoint4)

action10:navigate(myauv waypoint4waypoint5)

action11:surface(myauv)

action12:completemission(myauv waypoint5)

the total voyage is 2460 km.

Under the condition of still water, the UUV navigation energy consumption rate is 0.2 kilowatt-hour/kilometer, and the energy consumed by the task load releasing action releaceload is ignored (the releasing mechanism is generally a mechanical lock or a blasting bolt and the like, and only instantaneous current is needed for starting a releasing switch). The energy consumption of each operation is shown in table 11.

TABLE 11 energy consumption parameters for each action

Step 2: initial mission plan evaluation

Calculating task plan pi according to the above table₀Mean value of energy consumed by action432.5 kWh, the standard deviation is 58.6 kWh. From formulas 5-5, pi is calculated₀The task success rate is 0.875 (without regard to reliability), which is greater than a specified minimum threshold P_aThe execution condition is met.

And step 3: task execution

Opposite II₀The energy consumption values are randomly generated by each action according to the designated probability distribution, different random event sequences can cause different task execution processes, and several typical simulation results are as follows:

1. simulation example 1-case without adjustment task

The system energy can meet the task consumption without adjusting the task plan pi₀. The energy consumption value of each action is shown in table 12, and the remaining energy of the system is about 39 kw hour at the end of the mission.

Watch 12 pi₀Actual energy consumption simulation result (unit: kilowatt hour) when each action is executed

2. Simulation example 2-situation of insufficient System resources

After action4 is completed, the residual energy of the system is 188.8 kilowatts, the average value of the energy consumed by the action to be executed is 206.4, the standard deviation is 50.0036, and the success rate of the task is 0.362 and is less than P_dTherefore, the RAMG module is called to reduce the original task target. The executed action consumption energy values are shown in table 13. The energy consumption is greater than the expected value, the main reason is that the simulated energy consumption value 243.1 is much greater than the expected value 160 during the voyage of the section, which indicates that the UUV may encounter an unknown large reverse flow when sailing in the sea area of the section.

TABLE 13 simulation of energy consumption for executed actions (Unit: kilowatt-hour)

Action number	1	2	3	4
					Consumption of	0.09	68.1	0	243.1

The current state inputs for the RAMG algorithm are: s₁{ (at _ point myauv traffic 3), (under _ water myauv), (have _ load myauv traffic 2 load1) }, the current set of task targets to be executed is input as: g₁{ (calibred myauv waypoint3), (have _ load myauv waypoint4 load2), (missing _ completed myauv waypoint5) } (after action4 executes, target real 1 has been realized).

In the current set of task targets, the target of the minimum benefit value is (constrained myauv waypoint3), so the RAMG algorithm first deletes the good 2; according to the task target association relation, after the gold 2 is deleted, the gold 3 is invalid; planning according to the task goal real 4 to obtain a plan pi₁The following were used:

action1:navigate(myauv waypoint3 waypoint5)

action2:surface(myauv)

action3:completemission(myauv waypoint5)

II is calculated according to the current residual energy₁The success rate of the task is 0.43, so the goal set of the current task is an empty set because the goal set of the current task is continuously deleted 4

At this time, the task success rate is 1, and therefore the target set needs to be expanded. The input parameters of the AAMG are: current task object set

Task target set B to be added₀Is G₁U.S. backup _ G, the current state is S₁. In B₀In this case, the goal of the maximum profit value is gold 3, so first increase the gold 3 (according to the task goal selection rule 2, increase the gold 2 synchronously), calculate the task success rate 0.999, greater than P_aContinuing to add new targets; b is₀The medium goal4 and the bgoal2 are incompatible relations, and according to the task target selection rule 4, the high goal4 with relatively large benefit value is selected, but the task success rate does not meet the requirement, so after the AAMG algorithm deletes the goal4, the task targets bgoal1 and bgoal2 are continuously added, the task success rate is 0.95, the requirement is met, at this time, the standby target set is empty, and the task target adjustment process is finished.

After plan adjustment, the UUV can continue to complete satellite calibration and waypoint4 load delivery tasks in the initial plan and can also complete waypoint6 load delivery tasks, but due to insufficient energy, the UUV cannot return to a preset recovery point, and only a standby recovery point can be selected. The new mission plan is as follows:

action1:dive(myauv)

action2:navigate(myauv waypoint1 waypoint2)

action3:releaseload(myauv load1 waypoint2)

action4:navigate(myauv waypoint2 waypoint3)

action5:surface(myauv)

action6:calibrate(myauv waypoint3)

action7:dive(myauv)

action8:navigate(myauv waypoint3 waypoint4)

action9:releaseload(myauv load2 waypoint4)

action10:navigate(myauv waypoint4 waypoint6)

action11:releaseload(myauv load3 waypoint6)

action12:navigate(myauv waypoint6 waypoint7)

action13:surface(myauv)

action14:completemission(myauv waypoint7)

the variation of the task success probability during the execution of the task is shown in fig. 5. After the UUV passes through the unpredictable high energy consumption navigation stage (action4), the residual energy can not effectively complete the subsequent actions of the original plan. If the UUV continues to execute according to the original task plan, the UUV terminates the task at a certain position in the return voyage section due to insufficient energy, and the UUV cannot be searched and recovered in time under the condition of being far away from the preset recovery area. After the adaptive task planning adjustment, the task success rate meets the threshold requirement again, and the total benefit value after the planning adjustment is 33, which is close to the task benefit value 34 of the original plan. Although the recuperation zone is advanced, it is still within the effective operating range, while also increasing the number of load drop points. And the use benefit of the UUV is effectively improved after the self-adaptive task planning is adopted.

3. Simulation example 3-UUV System failure

Reliability is related to a specific UUV system structure, so UUV reliability modeling and simulation work is not conducted herein, but rather simplified processing into fixed system failure events. After the action4 is completed, the remaining energy of the system is 259.1 kilowatts, the average value of the energy consumed by the action to be executed is 206.4, and the standard deviation is 50.0036, so that the task success rate 0.8542 is obtained. The UUV is scheduled to perform satellite calibration operations, assuming that the satellite buoy release mechanism of the UUV is detected to be malfunctioning, and therefore determines that the mission objective, gold 2, is malfunctioning, and thus the mission objective association relationship, gold 3 and bgoal1, is also malfunctioning. The executed action consumption energy values are shown in table 14.

TABLE 14 simulation of energy consumption for executed actions (Unit: kilowatt-hour)

When the target set of the tasks to be executed becomes { (permission _ completed myauv waypoint4) }, the task on-line management module starts the task planner to carry out the re-planning, and the result is as follows:

action1:navigate(myauv waypoint3 waypoint5)

action2:surface(myauv)

action3:completemission(myauv waypoint5)

the task success rate of the plan is 0.943, which is greater than P_aThe AAMG algorithm is invoked accordingly. Since the benefit value of bgoal2 is less than goal4, the task to be performed output by the AAMG is targeted to goal 4. Due to the failure of the satellite calibration function, the UUV cannot deliver the task load according to the specified precision requirement, and therefore, the UUV finally returns to the original recovery point waypoint 5. The actual task process performed is as follows:

action1:dive(myauv)

action2:navigate(myauv waypoint1 waypoint2)

action3:releaseload(myauv load1 waypoint2)

action4:navigate(myauv waypoint2 waypoint3)

action5:navigate(myauv waypoint3 waypoint5)

action6:surface(myauv)

action7:completemission(myauv waypoint5)

the task success rate variation is shown in figure 6. After action4, although the residual energy can meet the task requirement, the task execution condition is changed to cause the failure of the subsequent action of the original plan, and the self-adaptive task is adjusted and then the self-adaptive task is navigated back. Because more energy remains, the subsequent plan execution can keep higher task success rate. Due to the failure of the satellite calibration function, the self-adaptive task planning system abandons the scheduled delivery tasks which cannot be completed, and enables the UUV to realize the recovery more reliably

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A UUV self-adaptive task planning method under an uncertain condition is characterized by comprising the following steps:

step one, before a UUV launches, dividing an initial task target into a target set G to be executed according to task requirements₀And a spare target set B₀；G₀The target task which must be completed to achieve the purpose of task action belongs to the target of the task to be executed; b is₀The task is a new task target which can be adjusted under the abnormal condition, or more tasks are completed under the condition that system resources are remained so as to improve the use benefit;

2. The method for UUV adaptive task planning under uncertain conditions as recited in claim 1, wherein the adaptive task planning method is implemented by three functional modules, namely a task planning module, a task target management module and a task online monitoring module.

3. The method for UUV adaptive task planning under uncertain conditions according to claim 2, wherein the task online monitoring module has a data communication interface with an unmanned system underlying control system, and can acquire the working state and fault information of each system component in real time and evaluate the task in real time according to the system working conditions; and monitoring the working state of the system in real time, updating the target task set and adjusting the task plan according to the situation.

4. The method for UUV adaptive task planning under uncertain conditions as recited in claim 2, wherein the task goal management module adjusts the task goal according to the system work information, and starts the task planning module after the adjustment; in the task execution process, the self-adaptive task adjustment is realized by selecting a task target to be executed; the goal selection aims at optimizing a task plan, reducing task risks and improving task benefits, and the selection mode comprises an adding mode and a deleting mode; adding the task target refers to selecting a target from the standby task target set to a task target set to be executed; deleting a task object refers to removing a related object in the set of task objects to be executed to the set of standby task objects.

5. The method of claim 2, wherein the mission planning module generates the mission plan based on the current system state and mission objective.

6. The UUV adaptive mission planning method under uncertain conditions according to claim 1, wherein the adaptive mission planning method uses a deterministic planning algorithm to generate an initial mission planning solution based on a given mission objective; because more random factors exist in the UUV task planning problem, after the generation of an offline plan or the re-planning of an online task, a task evaluation model is needed to be used for calculating the success rate of the task and evaluating the feasibility of a new plan meeting an expected target; meanwhile, by taking the definition of a return function in a Markov decision process method as reference, assigning values to each task target, and evaluating a task expected return value according to the task success rate;