CN114756017A

CN114756017A - Navigation obstacle avoidance method combining unmanned aerial vehicle and unmanned ship

Info

Publication number: CN114756017A
Application number: CN202111544714.5A
Authority: CN
Inventors: 武星; 钟鸣宇; 陈成; 赵明
Original assignee: Wuxi Zhongdun Technology Co ltd
Current assignee: Wuxi Zhongdun Technology Co ltd
Priority date: 2021-12-16
Filing date: 2021-12-16
Publication date: 2022-07-15

Abstract

The invention provides a navigation obstacle avoidance method combining an unmanned aerial vehicle and an unmanned ship, which comprises the following operation steps: (1) a map unit; (2) an obstacle unit; (3) an unmanned aerial vehicle decision management unit; (4) unmanned ship navigation keeps away barrier execution unit. According to the method, firstly, marine environment map information including information of obstacles and related information of a starting point, an end point and a position is acquired through an aerial unmanned aerial vehicle, and uncertain factors of the environment are considered, namely dynamic characteristics of the obstacles are introduced. And then integrating all observation information by a decision management unit on the unmanned aerial vehicle, performing problem modeling through a Markov decision process, constructing a simulation environment of the unmanned intelligent navigation obstacle avoidance, and training the unmanned intelligent to complete the navigation obstacle avoidance task by utilizing a deep reinforcement learning algorithm. Designing a corresponding reward and punishment function as a feedback index of the navigation obstacle avoidance effect, constructing a deep neural network to fit the navigation obstacle avoidance capability, carrying out intelligent decision, and generating the next decision action of the unmanned ship in the navigation obstacle avoidance process. The unmanned ship executes a sequence of decision actions generated by the unmanned plane decision management unit, so that the navigation and obstacle avoidance functions of the unmanned ship are completed. The method is different from the traditional path planning method, not only is the planning instantaneity improved, but also the planning efficiency is improved, and the cooperative operation of heterogeneous multi-agent is realized, so that the method has great practical significance.

Description

Navigation obstacle avoidance method combining unmanned aerial vehicle and unmanned ship

Technical Field

The invention relates to the field of computer machine learning, in particular to a navigation obstacle avoidance method combining an unmanned aerial vehicle and an unmanned ship, which is a method for fitting the real-time navigation obstacle avoidance capability of the unmanned ship by using a deep neural network.

Background

Compared with the manually controlled entity, the unmanned submarine has the advantages of small size, flexible operation and the like, is wide in operation range, and can avoid the injury to operators. At present, unmanned boats are put into use in relevant military fields, and different functional missions are completed by bearing equipment with different functions. Therefore, the executive tasks of the system also show diversity so as to realize the functions of battlefield information collection, monitoring and investigation, anti-potential anti-terrorism, accurate striking and the like. In recent years, unmanned boats play an important role in pollution treatment, environmental monitoring and the like, and the casualty risk of personnel is reduced to the greatest extent. Furthermore, in the aspect of supervision, equipment such as sonar, radar, a detecting instrument or image video is carried out patrol, and the device can work effectively under night conditions or severe weather conditions. In the social and economic aspects, if a disaster occurs and a search and rescue needs to be carried out in time, if a large-scale entity ship is only used, a search and rescue task in a large-scale area needs to be completed, high time cost is often needed, and the search and rescue task cannot accurately arrive at a rescue site in a gold rescue time period. However, if unmanned boats are used, the number of unmanned boats is large, the search range is wide, and the search and rescue task can be completed quickly. In the aspect of academic research, the key performance of the unmanned ship can be represented by detection classification, target identification and the like, and the difficulty lies in a multi-source information acquisition and fusion method, static and dynamic target identification capability and motion trajectory prediction capability. The method is extremely important for the research of autonomous planning of real-time paths, automatic obstacle avoidance, cooperative operation, dynamics control and data communication. Therefore, various scientific technologies are continuously developed, the research on key technologies of the unmanned ship is developed, and the development of future unmanned systems is accelerated and promoted.

Aiming at the problem of planning the moving track of the unmanned ship, factors such as energy loss and planning efficiency need to be considered, and obstacle avoidance planning of obstacle information is involved, and the obstacle avoidance is generally integrated as a part of path planning. According to the acquisition means of the obstacle information, the method can be divided into two types of local path planning based on sensor information and global path planning based on given environment information; according to the known degree of the environment information, path planning which is environment unknown, environment part known and environment known can be divided; according to the acquisition time of the environment information, the time relation between planning and scheme execution according to the flight path is divided into two types of online planning and offline planning; in addition, the planning can be classified into non-limiting planning and limiting planning according to the limitation of the constraint conditions in planning, such as acceleration limitation, steering range limitation, and the like. There is no strict division standard among various planning methods, and there may be an intersection part. More classical methods are visual graph method and Voronoi method based on graph search, which search path based on graph construction; in the potential field planning method, a gradient field is required to be established during potential field path planning, and an intelligent agent is guided to reach a target point along the direction of a negative gradient; the depth-first search algorithm is to expand a certain node around the node to the deepest level until the node cannot be expanded forwards and then returns to the previous level to expand the other nodes; the breadth-first search algorithm is based on the current node as the center, and then the layer-by-layer expansion is carried out according to all nodes in the neighborhood of the current node; the Dijkstra algorithm is based on breadth-first search, and simultaneously introduces heap theory, so that in the process of stepwise exploration, not only is the loss of the optimal path from the starting point to the end point obtained, but also all paths with the lowest cost from any starting node to a target node in the graph are calculated, and the like.

Obviously, a certain amount of research has been carried out on path planning and navigation obstacle avoidance methods in the past, but each method has limitations and individual adaptability, and new development vigor is injected into behavior research of unmanned boats for meeting the current development requirements and the rise of the field of machine learning. Deep and reinforcement learning, two research hotspots in the field of machine learning, has been applied to a variety of scenes for a short time since their introduction, and presents strong and effective results, and attracts wide attention of numerous scholars. The deep learning is based on a multi-layer neural network structure, and has good characteristic learning and expression capability; the reinforcement learning focuses on continuous exploration of the environment, and the problems are solved through accumulated experience in the trial and error process, so that the reinforcement learning method is relatively close to the learning mode of human beings. Machine learning is originally intended to enable an intelligent agent to have learning cognitive ability like human, and deep reinforcement learning is considered to be an important way for realizing general artificial intelligence due to the advantages of strong combination.

Therefore, unmanned boats have become a great trend of modern technology development, and have been widely applied to mission in dangerous or manpower-inaccessible areas, thereby greatly expanding the operation capacity and detection range thereof. When the unmanned ship executes corresponding tasks, the self navigation obstacle avoidance capability has most influence on the performance of the unmanned ship, and meanwhile, the self intelligent key is also embodied. Therefore, in order to overcome the defects that the traditional navigation obstacle avoidance method is low in real-time performance, low in planning efficiency, easy to fall into local optimization and the like, the navigation obstacle avoidance method combining the unmanned aerial vehicle and the unmanned ship is developed according to actual conditions, and the navigation obstacle avoidance method has the advantages of being high in automation degree, intelligence, real-time performance, planning efficiency and the like.

Disclosure of Invention

The invention aims to provide a navigation obstacle avoidance method combining an unmanned aerial vehicle and an unmanned ship, which is a method for combining a map unit, an obstacle unit, an unmanned aerial vehicle decision management unit and an unmanned ship navigation obstacle avoidance execution unit to complete the whole navigation obstacle avoidance process and can greatly improve the intelligence of the unmanned ship.

In order to achieve the purpose of the invention, the conception of the invention is as follows:

the invention provides a navigation obstacle avoidance method combining an unmanned aerial vehicle and an unmanned ship, aiming at achieving the purpose that the unmanned ship can successfully navigate and avoid obstacles in the process of reaching a terminal. Firstly, marine environment map information including information of obstacles and related information of a starting point, an end point and a position is collected by an unmanned aerial vehicle in high altitude in the method, and uncertain factors of the environment are considered, namely dynamic characteristics of the obstacles are introduced. And then integrating all observation information by a decision management unit on the unmanned aerial vehicle, performing problem modeling through a Markov decision process, constructing a simulation environment for navigation and obstacle avoidance of the unmanned ship, and training the unmanned ship to complete a navigation and obstacle avoidance task by utilizing a deep reinforcement learning algorithm. Designing a corresponding reward and punishment function as a feedback index of the navigation obstacle avoidance effect, constructing a deep neural network to fit the navigation obstacle avoidance capability, carrying out intelligent decision, and generating the next decision action of the unmanned ship in the navigation obstacle avoidance process. The unmanned ship executes a sequence of decision actions generated by the unmanned plane decision management unit, so that the navigation and obstacle avoidance functions of the unmanned ship are completed.

According to the inventive concept, the invention adopts the following technical scheme:

a navigation obstacle avoidance method combining an unmanned aerial vehicle and an unmanned ship comprises the following operation steps:

a) map unit

The high-altitude unmanned aerial vehicle collects marine environment map information which comprises map range size and boundary information, positions and range information of the unmanned ship and a plurality of obstacles on a map are counted, and the map unit is mainly used for counting situation observation information and is responsible for updating the map information in real time according to real-time conditions.

b) Obstacle unit

In the management of the obstacle units, uncertainty information of the marine environment is introduced, i.e. the static and dynamic movement characteristics of the obstacles that occur in real situations are taken into account.

c) Unmanned aerial vehicle decision management unit

The unmanned aerial vehicle decision management stage is the most critical step, firstly, a decision management unit on the unmanned aerial vehicle integrates all observation information, including map information, obstacle information and the like, secondly, problem modeling is carried out through a Markov decision process, corresponding reward and punishment functions are designed to serve as feedback indexes of navigation obstacle avoidance effects, reward and punishment returns generated by four different states when the unmanned aerial vehicle walks out of a map boundary, collides with an obstacle and reaches a terminal point or is located in an intermediate state of a certain legal position in the map are respectively considered, then, a deep neural network is constructed to fit the navigation obstacle avoidance capability of the unmanned aerial vehicle, intelligent decision is carried out, and the next step decision action of the unmanned aerial vehicle in the navigation obstacle avoidance process is generated.

d) Unmanned ship navigation obstacle avoidance execution unit

The unmanned ship navigation obstacle avoidance execution unit is a practice stage and is used for receiving and executing the decision-making action generated by the decision-making management unit and finishing the whole navigation obstacle avoidance process under the guidance of a sequence of decision-making actions.

Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable technical progress: the navigation obstacle avoidance method combining the unmanned aerial vehicle and the unmanned ship has great practical significance, different reward punishment function value feedback navigation obstacle avoidance effects are established on the basis of unique innovativeness of function division of design modules, and a deep neural network is established to fit the navigation obstacle avoidance capacity of the unmanned ship, so that the planning efficiency and the planning instantaneity are greatly improved.

Drawings

FIG. 1 is a bottom-layer simulation environment diagram of the unmanned aerial vehicle and unmanned boat combined navigation obstacle avoidance method of the invention

FIG. 2 is a network structure diagram of the navigation obstacle avoidance method combining unmanned aerial vehicle and unmanned vehicle

Detailed Description

The first embodiment is as follows:

in the process of autonomous navigation and obstacle avoidance of the unmanned ship, the most important task is to realize the control of the unmanned ship through a control strategy generated by a decision algorithm under the condition of a given starting point, a given terminal point and an initial position of an obstacle, successfully avoid the obstacle and reach the terminal point under the guidance of a series of decision actions, and realize an effective real-time intelligent algorithm is the root for completing the task, so that:

(a) and calculating decision actions at the current moment while sensing the state information of the unmanned ship, the surrounding environment and the obstacle information.

(b) And the generated series of decision actions are used for realizing the whole navigation obstacle avoidance process.

(c) Each strategy in the set of decision control strategies is efficient and easy to implement for the unmanned boat.

Aiming at the problem of navigation and obstacle avoidance of the unmanned ship, a reinforcement learning method is used for solving the problem.

Unmanned boat is realizing avoiding barrier and arrivingIn the process of a target point, a reinforcement learning method is used for calculating a control strategy while observing a map, barrier information and current state information of a ship body, and then continuous control of the unmanned ship is realized through the control sequence, so that the unmanned ship continuously approaches to and reaches the target point under the condition of successfully avoiding barriers. At each moment t, the unmanned ship is in the current state s_t(s) selected actions a_t(a) After-transfer to s_t+1(s') while R_tRepresenting the resulting reward penalty value for the corresponding state transition. a ═ a₁,a₂,a₃…a_nThe method comprises the steps of, collecting a set of actions, and judging whether the unmanned ship is located in the unmanned ship. Further, obs_tRepresents a set of observation information, and obs_t＝[δ_t,S_t,A_t-1]The information set represents the information set observed by the unmanned ship at the current moment t by respectively corresponding to surrounding obstacle environment information, the state information of the unmanned ship and the decision control action at the previous moment. T is_pThe time step of the historical observation information is involved in decision calculation, and epsilon is the exploration rate in the decision process.

In addition, in the reward and punishment function design part of the reinforcement learning algorithm, corresponding reward and punishment values under different conditions are set, such as rwd_targetPrize value indicating the arrival of unmanned boat at terminal rwd_outsidePenalty values representing the unbounded boat exit map, rwd_collisionPenalty for collision of unmanned vehicle against obstacle rwd_{interim-status}And the intermediate state is represented and used for controlling the unmanned boat to reach the terminal point as soon as possible under the shortest distance.

Example two:

according to the problem description of unmanned ship navigation obstacle avoidance, the mathematical characteristics of the problem can be regarded as a Markov process under discrete time, and the components of the Markov process comprise states, actions, transfer functions, rewards and the like. Typically, a state corresponds to an action or probability of taking an action, and when the action in that state is determined, the state after the transition is known. To some extent, the good and bad availability estimates for a certain state of the unmanned ship are described, and therefore the reward G is used_tFor indicating on navigationThe unmanned boat state at a certain time t in the barrier process will have a return:

wherein G_tRepresents the discount sum of the immediate reward, and λ is the discount factor.

In practice, however, when the whole decision making process is not finished, i.e. no unmanned boat has reached the end point, no collision obstacle, or no out-of-bounds map, we cannot get all the returns to calculate the reward and punishment value of each state, so the valuation function is used to represent the future potential value of each state, i.e.:

v(s)＝E[G_t|S_t＝s]

＝E[R_t+λ(R_t+1+λ*R_t+2+…)|S_t＝s]

＝E[R_t+λG_t+1|S_t＝s]＝E[R_t+λv(S_t+1)|S_t＝s] (2)

considering that the unmanned ship has a plurality of selectable movement actions in each state, and the states after different actions are transferred are different, what we need to solve is an optimal strategy formed by a series of unmanned ship control actions, which is equivalent to solving an optimal evaluation function. Therefore, by taking the Q-learning idea as a reference, on the basis of estimation iteration, the expected value is replaced by the Q value and the historical Q value is updated by the current Q value and the reward punishment value, that is:

s_t←s_t+1(3)

here, the estimated Q value is not directly given to the new Q value, but slowly approaches in a gradual manner, where α represents a learning factor. From the decision point of view, the decision principle of both Q-learning and Sarsa is based on the Q-table, that is, a large action value is selected in the Q-table, and then the action value is applied to the unmanned boat as a decision action to exchange for reward and punishment, but the update mode of Sarsa is slightly different from the above:

Q(s_t,a_t)←Q(s_t,a_t)+α*(R_t+λ*Q(s_t+1,a_t+1)-Q(s_t,a_t))

s_t←s_t+1，a_t←a_t+1 (4)

from the above, it can be seen that Q-learning is based on maxQ(s)_t+1) It is a greedy algorithm that is not concerned with the "wrong" or "dead" states of unmanned boat collision barriers or out-of-range maps, while Sarsa updates Q(s)_t,a_t) Time of day based on Q(s) at the next time instant_t+1,a_t+1) It is a conservative approach, sensitive to the unmanned boat being in an unreasonable state, that would first choose the actions far from the dangerous state and then consider the maximum reward value.

From the above, we can obtain the Q value according to the value iteration mode, but the Q values of all states and corresponding actions in the navigation obstacle avoidance problem need to be updated once every time. In fact, we cannot traverse all states and actions, only operate with limited samples, and the Q table cannot be satisfied when the state space is infinitely expanded or when the state space is continuous. Therefore, the occurrence of DQN (deep Q-Network), namely the application of the deep neural Network, changes the update problem of the Q value matrix into a function fitting problem:

Q(s,a,θ)≈Q^π(s,a) (5)

the function approximator is described as a parameterized function of the unmanned ship state and action, the result can be output for any state, and the Q function is made to approach the optimal Q value by updating the network parameter theta. The target function, namely the loss function, takes the updated Q value as a target value, and calculates the mean square error with the current value to obtain the deviation:

wherein γ is a learning factor. The optimization problem for optimal Q values can therefore be solved using a gradient-based approach (e.g., random gradient, etc.), namely:

example three:

because the Q value of the unmanned ship decision action is generally overestimated by the traditional DQN, and estimation errors are accumulated along with the increase of the number of actions. And the overestimation is not uniform, so that the overestimated Q value of a certain suboptimal unmanned boat control action exceeds the Q value of the optimal control action, and the optimal strategy can never be found. Therefore, the dulling DQN is used on the basis of DQN, a deep competition network (dulling network) is used to fit the Q value in unmanned ship navigation obstacle avoidance, but the network is divided into two parts at the end, namely, a state value function v(s) represents the value of a static state environment and an Action advantage function a (a) represents the additional value brought by selecting an Action. The Q value is obtained by adding the state V value and the action a value, and the purpose is to say that the state values are the same, but the advantages brought by each action are different. Therefore, the strengthening algorithm based on the Dueling DQN is provided for solving the problem of navigation and obstacle avoidance of the unmanned ship, and a real-time control strategy is calculated according to surrounding observation information and state information of the unmanned ship, so that the unmanned ship is continuously close to a target point under the condition of successful obstacle avoidance to complete a navigation task. Through the reinforcement learning algorithm, a discrete decision control action is generated on the rasterized discrete map, and the algorithm can be described as:

obs_t＝[δ_t,S_t,A_t-1] (8)

wherein f is_ANOARepresenting the proposed algorithm, A_tRepresenting the decision-making action at time t.

According to the characteristic of the Dueling DQN, the Q value is the value of the state of the unmanned ship, and the action advantage value of each decision control action on the state is added, namely:

Q(s,a；θ,α,β)＝V(s；θ,β)+A(s,a；θ,α) (9)

where θ represents the parameters of the convolutional layer and α and β are sub-tables representing the two streaming parameters of the fully-connected layer. However, there is an uncertain problem that when a Q value is given, a unique V value and a unique a value cannot be obtained, so that to solve the problem, an average value of a merit function is used for correction, and the stability of the algorithm is improved, that is:

therefore, according to the deviation between the target value and the actual value, a loss function is established to continuously update the network parameters, namely:

L(θ)＝E_s,a,R,s'[(y-Q_t(s,a,θ))²]

reuse of gradient updates

To update the Q value for solving the optimum. Wherein the general steps of the algorithm are described as:

Claims

1. a navigation obstacle avoidance method combining an unmanned aerial vehicle and an unmanned ship is characterized by comprising the following operation steps:

a) and a map unit: the high-altitude unmanned aerial vehicle acquires marine environment map information, including map range size and boundary information, counts the position and range information of the unmanned ship and a plurality of obstacles on the map, and is responsible for updating the map information in real time according to real-time conditions;

b) an obstacle unit: the uncertainty factor of the sea environment in practice is considered, and the static and dynamic characteristics of the barrier are introduced;

c) and the unmanned aerial vehicle decision management unit: a decision management unit on the unmanned aerial vehicle integrates all observation information, problem modeling is carried out through a Markov decision process, a corresponding reward and punishment function is designed to serve as a feedback index of a navigation obstacle avoidance effect, a deep neural network is constructed to fit the navigation obstacle avoidance capability, intelligent decision is carried out, and the next decision action of the unmanned ship in the navigation obstacle avoidance process is generated;

d) and the unmanned ship navigation obstacle avoidance execution unit: the unmanned ship receives and executes the action command sent by the unmanned aerial vehicle decision management unit, so that the whole navigation obstacle avoidance process is realized.

2. The unmanned aerial vehicle and unmanned vehicle combined navigation obstacle avoidance method according to claim 1, wherein the unmanned aerial vehicle at high altitude collects map information of marine environment, including size and boundary information of the map, position information of the unmanned vehicle and a plurality of obstacles in the map, and the like, not only counts map information, but also updates information according to real-time changes.

3. The method for navigating and avoiding obstacles by combining the unmanned aerial vehicle and the unmanned boat as claimed in claim 1, wherein step b) introduces uncertainty information of marine environment in the management of the obstacle units, i.e. considering the static and dynamic movement characteristics of the obstacles appearing in practical situations.

4. The navigation obstacle avoidance method combining the unmanned aerial vehicle and the unmanned aerial vehicle according to claim 1, wherein the decision management unit stage of the unmanned aerial vehicle in the step c) is the most critical step, firstly, the decision management unit on the unmanned aerial vehicle integrates all observation information including map information, obstacle information and the like, secondly, problem modeling is performed through a Markov decision process, corresponding reward and punishment functions are designed to serve as feedback indexes of the navigation obstacle avoidance effect, reward and punishment returns generated by four different states when the unmanned aerial vehicle walks out of a map boundary, collides with an obstacle, reaches a destination, or is located in an intermediate state of a legal position in the map are respectively considered, then, a deep neural network is constructed to fit the navigation obstacle avoidance capability, intelligent decision is performed, and the next step decision action of the unmanned aerial vehicle in the navigation obstacle avoidance process is generated.

5. The unmanned aerial vehicle and unmanned ship combined navigation obstacle avoidance method according to claim 1, wherein the unmanned ship navigation obstacle avoidance unit in step d) is a practice execution unit, and is configured to receive and execute actions sent by the decision management unit, and complete the whole navigation obstacle avoidance process under guidance of a sequence of decision actions.