CN114089762A

CN114089762A - Water-air amphibious unmanned aircraft path planning method based on reinforcement learning

Info

Publication number: CN114089762A
Application number: CN202111381994.2A
Authority: CN
Inventors: 杨晓飞; 史逸伦; 叶辉; 杜昭平; 佘宏伟; 严鑫; 刘伟; 冯北镇
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2021-11-22
Filing date: 2021-11-22
Publication date: 2022-02-25
Anticipated expiration: 2041-11-22
Also published as: CN114089762B

Abstract

The invention discloses a water-air amphibious unmanned aircraft path planning method based on reinforcement learning. The method comprises the following steps: s1, selecting a region S of the amphibious unmanned aircraft for executing a path planning task, and extracting data of the region S corresponding to the electronic chart according to the region S to perform three-dimensional environment modeling; s2, constructing a Markov Decision Process (MDP) for planning paths of the amphibious unmanned aircraft; and S3, setting a starting point and a target point, and completing global path planning according to different working scenes of the amphibious unmanned aircraft based on a Depth Q Network (DQN) algorithm according to the MDP for path planning of the amphibious unmanned aircraft. Compared with the existing environment modeling method for planning the path of the amphibious unmanned aircraft, the planning range of the amphibious unmanned aircraft path planning method is increased to dozens of kilometers, the motion characteristics of the amphibious unmanned aircraft are effectively considered, and the optimal path which accords with the working scene of the amphibious unmanned aircraft can be found more quickly and effectively by combining a DQN algorithm.

Description

Water-air amphibious unmanned aircraft path planning method based on reinforcement learning

Technical Field

The invention belongs to the technical field of autonomous path planning, and particularly relates to an intelligent path planning method for a water-air amphibious unmanned aircraft.

Background

The water-air amphibious unmanned aircraft has the functions of water navigation and air flight, and has the advantages of fast arrival at a task point and wide search visual field compared with a common unmanned ship. The method can effectively solve the defects of slow rescue time, high cost and low frequency existing in the conventional method for searching and rescuing on water by depending on rescuers to drive patrol boats to the incident place. The path planning is one of key technologies for realizing the autonomy of the amphibious unmanned aircraft. The performance of the path planning module is directly related to the quality of the running path selection and the running fluency of the amphibious unmanned aircraft, and whether the amphibious unmanned aircraft can meet the indexes of minimum energy consumption, fastest speed and the like during task execution.

The invention patent CN109871022A introduces an intelligent path planning and local obstacle avoidance method for an amphibious unmanned aircraft, a three-dimensional grid map is built by acquiring working environment information of the amphibious unmanned aircraft in real time, and an improved A-star algorithm is used for path planning. The invention patent CN112698646A introduces a navigation vehicle path planning method based on reinforcement learning, which constructs a virtual force field by accessing barrier information in an electronic chart, and sets a virtual force field reward function for path planning.

The existing path planning method for the amphibious unmanned aircraft can only be applied to local path planning in a small range within tens of meters around the amphibious unmanned aircraft by constructing the three-dimensional grid map in real time, however, the working radius of the amphibious unmanned aircraft can reach dozens of kilometers, and the method cannot solve the path planning task of the map in the large range. And the traditional path planning searching method (A) cannot find the optimal solution of the path planning by using the characteristics of the cross-dimensional motion of the amphibious unmanned aircraft. The existing path planning method for the amphibious unmanned aircraft through reinforcement learning is generally to plan by an auto-built grid environment model, and the method has the advantages that the algorithm search space is large, and the method is different from a real environment and cannot be applied to an actual planning task; a path planning task of an environment model is built through an actual environment map such as an electronic chart, the electronic chart is not digitally modeled, and the training efficiency of a path planning algorithm based on reinforcement learning is influenced.

Disclosure of Invention

The purpose of the invention is as follows: the method aims to overcome the defects that the existing method for planning the path of the amphibious unmanned aircraft cannot cope with the path planning task suitable for the large-range working radius of the amphibious unmanned aircraft, the optimal solution of the path planning cannot be found by effectively considering the motion characteristics of the amphibious unmanned aircraft, and the training efficiency of the reinforcement learning algorithm is influenced by the fact that the electronic chart is used for path planning and digital modeling is not carried out on the electronic chart. The invention provides a water-air amphibious unmanned aircraft path planning method based on reinforcement learning.

According to the method, the data of the electronic chart in the S-57 format are extracted, and an environment model for planning the path of the amphibious unmanned aircraft based on the electronic chart is established by combining actual digital elevation data. The reward function is established based on the risk of the aircraft colliding with the obstacle and some other rules. Then, repetitive training is performed using the Deep Q Network (DQN) algorithm principle. After full training, artificial intelligence of path planning is established, and a meaningful and reasonable path can be found according to different working scenes of the aircraft.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following specific technical scheme:

a water-air amphibious unmanned aircraft path planning method based on reinforcement comprises the following steps:

s1, selecting a region S of the amphibious unmanned aircraft for executing a path planning task, and extracting data of the region S corresponding to the electronic chart according to the region S to perform three-dimensional environment modeling;

s2, constructing a Markov Decision Process (MDP) for planning paths of the amphibious unmanned aircraft;

s3, setting a starting point and a target point, and completing global path planning according to different working scenes of the amphibious unmanned aircraft based on a Depth Q Network (DQN) algorithm according to the MDP for path planning of the amphibious unmanned aircraft;

in a further improvement of the present invention, the step S1 specifically includes:

and S101, selecting a region S in which the amphibious unmanned aircraft needs to execute a task, namely the longitude and latitude range of the task execution region.

And S102, selecting the electronic chart file with the S-57 format of the corresponding area according to the area S, namely according to the longitude and latitude of the area S, and extracting the data of the electronic chart required by path planning by taking S as the range for extracting the data in the electronic chart.

S103, extracting data of a corresponding area S in the electronic chart, namely selecting object types which need to be extracted from the electronic chart and are used for path planning, usually land, reef and the like, obtaining image layer numbers corresponding to the object types by referring to official documents IHO S-57(ENC) of the electronic chart, and extracting the required electronic chart data by using a geospatial vector data open source library (OGR) library;

s104, extracting the required electronic chart data through a geospatial vector data open source library (OGR), namely opening an electronic chart file in an S-57 format through a function in an OGRSFDriver in the OGR, and calling (S57reader) to read the chart data layer by layer according to the layer number of the type of the required extracted object.

S105, storing the data read Layer by Layer on the electronic charts such as land, submerged reefs and the like in extensible markup language (xml) (extensible markup language) files Layer by Layer according to Layer (Layer) -element (feature) -field (field) and geometric object (geometry); wherein the geometric object (geometry) holds the geometric attribute of the element, indicating that the type of the element is one of a point (point), a line (line) or a polygon (polygon).

S106, performing three-dimensional environment modeling on the data of the area S, namely extracting the electronic chart data, storing the electronic chart data as an extensible markup language (xml) file, and establishing a grid matrix with columns of (wlong-elong)/squaresize +1 and rows of (hlati-lati)/squaresize +1 according to the longitude range (wlong, elong) and the latitude range (hlati, lati) of the area S and the grid size of squaresize. The latitude and longitude of the center point of the ith row and j columns of the grid matrix can be represented as:

the latitude and longitude of the four vertices a, b, c, d of this grid can be expressed as:

and S107, obtaining an extensible markup language (xml) file by extracting the electronic chart data of the region S, traversing the xml, and accessing each node in sequence. The method mainly comprises two tasks: 1. establishing navigable points and non-navigable points of a grid map; 2. fill the water depth value of each grid.

S107-1, establishing navigable points and non-navigable points of a grid map by the first task, namely when elements are located under land and submerged reef layers and the types of the elements are polygons (Polygons) and lines (lines), accessing a child node point set (waypoints) of the elements, storing longitude and latitude coordinates of all coordinate point (waypoint) nodes under the point set (waypoints) nodes, filling the stored longitude and latitude coordinate points into polygons by using a Polygon function, performing Polygon intersection judgment with a rectangle formed by the points a, b, c and d, and if the two nodes intersect, setting a grid where an abcd point is located as a non-navigable grid; when the element type is a point (p-oil), longitude and latitude coordinates of a coordinate point (waypoint) node under a point set (waypoints) of the element type are obtained, whether each coordinate is in a rectangular grid formed by a, b, c and d is judged, and if yes, the grid where the current point a, b, c and d is located is set as a non-navigable grid.

S107-2, filling the water depth value of each grid by the second task, namely when the element sub-node is positioned below the node of the isobologram layer, similarly, acquiring the longitude and latitude coordinates of a coordinate point (waypoint) node under a point set (waypoints) of the element sub-node, judging whether the coordinate point is in a rectangular grid formed by a, b, c and d, and if the coordinate point is in the rectangular grid formed by a, b, c and d, the grid is a navigable grid, and if the coordinate point is in the rectangular grid, assigning the value of the water depth (depth) to the grid. Because the geometry of the iso-depth line is a line (line), not all navigable grids have depth values, and other unassigned grids are incrementally interpolated with the iso-depth line as a range, i.e., grids with small values near the depth of water are assigned small values and grids with large values near the depth of water are assigned large values. This way all navigable grids can be assigned depth values.

And S108, finally obtaining a grid matrix with actual geographic information, wherein the cells with numerical values in the matrix represent navigable areas with depth values.

And S109, acquiring elevation data of the area S, wherein the elevation data is generally in tif format.

S110, acquiring the elevation data of the area S, namely, when the elevation data of the area S is intercepted, obtaining the longitude and latitude coordinates of the vertex at the upper left corner and the size of the two-dimensional array. Therefore, according to the resolution of the elevation data, the longitude and latitude information of each pixel point can be calculated, and the value of each pixel point is an elevation value. Therefore, after obtaining the two-dimensional array of elevation data and the elevation value and the longitude and latitude coordinates of each unit, assigning the elevation value to the non-navigable area of the grid matrix in the step S108 according to the longitude and latitude coordinates thereof by a comparison and assignment method, so as to obtain the grid matrix with the elevation data.

The invention further improves that the specific method for constructing the amphibious unmanned aircraft path plan (MDP) in the step S2 comprises the following steps:

s201, constructing a Markov Decision Process (MDP) for planning paths of the amphibious unmanned aircraft, and determining a state space of the amphibious unmanned aircraft firstly, wherein the state space is defined as position coordinates (x, y) and a height z of the amphibious unmanned aircraft, the position coordinates (x, y) are expressed as a two-dimensional continuous space, and the height z is expressed as a one-dimensional discrete space in order to simplify a training process. The state space of the amphibious unmanned aircraft is represented as

[(x₁,y₁,z₁),(x₂,y₂,z₂),.......,(x_n,y_n,z_n)] (3)

S202, constructing a Markov Decision Process (MDP) for planning paths of the amphibious unmanned aircraft, and secondly determining an action space of the amphibious unmanned aircraft. Considering that the amphibious unmanned aircraft has the characteristics of underwater flight and air flight at the same time, the actions of the amphibious unmanned aircraft are dispersed into six actions of up, down, left, right, takeoff and landing, namely an action space A [ up, down, left, right, fly, descan ].

S203, the moving distance of the four actions of up, down, left and right is considered in two situations, namely navigation and flight. Under the navigation condition, the displacement distance of one minute of advance is taken as the moving distance of the up, down, left and right actions (d) through the test of the navigation speed of the laboratory ship_sail) (ii) a In the case of flight, the displacement distance of one minute of advance is taken as the moving distance of up, down, left and right actions (d) through the test of the navigation speed of the laboratory ship_flight). The moving distance of take-off and landing actions is simplified, namely after the take-off action is executed, the amphibious unmanned aircraft can vertically take off to reach the maximum height (h) which can be reached by the amphibious unmanned aircraft_max) And after the landing action is executed, the amphibious unmanned aircraft can vertically land to the water surface with the height of 0. The state transition in a given motion, according to the defined state and motion space, can be expressed as

Where [ x ' y ' z ' ] is the next state and [ x y z ] is the current state.

And S204, constructing the MDP of the amphibious unmanned aircraft path planning, and determining a reward function of the amphibious unmanned aircraft path planning.

S204-1, target zone reward (r_terminal). The training efficiency is improved, and the amphibious unmanned aircraft is regarded as a task to be completed in the area where the amphibious unmanned aircraft reaches the target point.

S204-2, distance reward function (r)_distance). The influence of a target area is strengthened, and the amphibious unmanned aircraft is restrained to be capable of moving to the target area more quickly.

The distance between the amphibious unmanned aircraft and the target point at the current state is represented by DistaneNow, and the distance between the amphibious unmanned aircraft and the target point at the next state is represented by DistaneFuture. Lambda [ alpha ]_distanceIs a distance weight coefficient.

S204-3, energy consumption reward function (r)_power). When the amphibious unmanned aircraft moves, the energy consumed by the flight and navigation states of the amphibious unmanned aircraft is different, and in order to enable the duty ratio of flight and navigation in a route planned by a path to meet the requirements of different working scenes, an energy consumption reward function r is adopted_power. Through energy consumption testing of an amphibious unmanned aircraft in a laboratory, flight energy consumption of one minute and navigation energy consumption of one minute are obtained, and the ratio of the flight energy consumption to the navigation energy consumption is lambda_flight:λ_sailSo the energy consumption reward function can be expressed as

And alpha is a proportionality coefficient, and when the amphibious unmanned aircraft is in a flight state and a navigation state, negative energy consumption rewards are generated when the amphibious unmanned aircraft does one action.

S204-4, water depth reward (r)_depth). According to the environment model analyzed by the electronic chart, each coordinate point has a corresponding water depth. Different from other works, the distance between the amphibious unmanned aircraft and large obstacles such as land, island and reef is represented by the Depth value Depth of the coordinate point. Normally, the place with larger water depth is farther from the land, and the place with smaller water depth isThe closer to land. Water depth reward function r_depthCan be expressed as:

wherein λ₁～λ₆For the numerical value of the reward function, an obstacle flag bit (obstance) is used for better ensuring the safety of the amphibious unmanned aircraft and the appropriateness of the takeoff opportunity, a 3 x 3 square area is formed around the periphery of the amphibious unmanned aircraft and is used as a detection area of the amphibious unmanned aircraft, and if the area has an obstacle, the obstance is output as 1

S204-5, collision reward function (r)_obstance). The collision reward is intended to prevent the amphibious unmanned vehicle from colliding with an obstacle. In the reinforcement learning algorithm training process, once the amphibious unmanned aircraft collides with the barrier, the collision reward function returns a large negative reward. The collision reward function may be expressed as:

r_obstance＝-λ_obstance(Depth＞0 and z＝0) (9)

λ_obstanceand when the coordinate water depth value of the next state of the amphibious unmanned aircraft is positive and is not in the flying state, the amphibious unmanned aircraft is considered to collide with the obstacle, and collision reward is generated.

S205, the total reward function can be expressed as:

r_total＝λ_a*r_terminal +λ_b*r_distance+λ_c*r_power+λ_d*r_depth+λ_e*r_obstance (10)

wherein λ_a、λ_b、λ_c、λ_d、λ_eAre weight coefficients.

The invention is further improved, wherein the step S3 of giving a starting point and a target point, and completing global path planning according to different working scenes of the amphibious unmanned aircraft based on a Depth Q Network (DQN) algorithm according to the MDP for amphibious unmanned aircraft path planning specifically comprises:

s301, the given starting point and the target point, namely longitude and latitude coordinates of the starting point and the target point of the selected path planning task;

and S302, the MDP for planning the path of the amphibious unmanned aircraft, namely the Markov decision process for constructing the amphibious unmanned aircraft, which is described in S2, comprises a state space, an action space and a reward and punishment function. The method is characterized in that a Depth Q Network (DQN) algorithm is selected as a path planning algorithm, values of Batch size (Batch _ size) (the size of data to be learned by the amphibious unmanned aircraft each time), Learning rate (Learning rate), training times (episode), attenuation factor (ga mma) and memory playback unit size (memory _ size) are set, the number of layers of a Q prediction network is set, and training is performed according to the MDP of the amphibious unmanned aircraft at S2 and the three-dimensional environment model at S1.

S303, setting three different working scenes of the amphibious unmanned aircraft according to the different working scenes of the amphibious unmanned aircraft: in the first scenario, when an emergency event occurs, an emergency action is required to execute a task, and the target place is required to be reached at the highest speed. And a second scene is a daily work task and requires energy storage allowance. And a third scene is that the reserve margin is over half and needs to be rewound and charged. And realizing the path planning task of different working scenes by modifying different weight coefficients of the reward function in the S205.

Compared with the prior art, the invention has the following remarkable advantages:

1. the three-dimensional environment modeling method based on the electronic chart is simple and effective, the electronic chart is effectively combined with an unmanned aircraft, the route planning within the range of dozens of kilometers can be performed by utilizing the abundant geographic information of the electronic chart, and the defect of small planning range in the prior art is effectively overcome.

2. The invention adopts a reinforcement learning method to plan the path of the amphibious unmanned aircraft, sets the action space, takes take-off and landing as independent actions, and effectively considers the motion characteristics of the amphibious unmanned aircraft.

3. According to the method, a reward function is set by considering the working scene of the amphibious unmanned aircraft; the flight and navigation of the amphibious unmanned aircraft are effectively limited through the water depth information extracted from the electronic chart; the requirements of different task scenes are realized through the weight of each part rewarded.

4. The trained model has good generalization, the area of the path planning task map is changed, the existing A-x algorithm needs to search once again, the trained model can quickly search a proper path by using effective priori knowledge, and a search period is saved compared with the existing method.

Drawings

FIG. 1 is a logic step diagram of the amphibious unmanned aircraft path planning method based on reinforcement learning according to the invention,

FIG. 2 is a diagram of a structure of an extensible markup language (xml) file saved after parsing an electronic chart,

figure 3 is a grid matrix diagram of a portion of the electronic chart-based three-dimensional environment modeling of the present invention containing high level information,

figure 4 is a schematic diagram of the electronic chart-based three-dimensional environment modeling of the present invention,

figure 5 is a flow chart of the depth Q-based algorithm of the present invention,

fig. 6 is a schematic view of the working scene of the amphibious unmanned aircraft.

Detailed Description

In order to enhance the understanding of the present invention, the present invention will be described in further detail with reference to the following examples, which are provided for the purpose of illustration only and are not intended to limit the scope of the present invention.

As shown in fig. 1, the method for planning a route based on reinforcement learning of the present invention includes the following three steps.

And S1, selecting a region S of the amphibious unmanned aircraft for executing a path planning task, and extracting data of the region S corresponding to the electronic chart according to the region S to perform three-dimensional environment modeling.

In this embodiment, S1 selects a region S where the amphibious unmanned aircraft performs a path planning task, and according to the region S, data of the region S corresponding to the electronic chart is extracted to perform three-dimensional environment modeling, which is specifically implemented as follows:

step1, selecting an amphibious unmanned aircraft to perform a path planning work area S, wherein the longitude range of the S is (wlong ═ 119.5325, elong ═ 119.7325), the latitude range is (hlati ═ 23.729659, lati ═ 23.529659), and the work area S is an island area of the penghua of china on an actual map.

Step2, selecting an electronic chart with an S-57 format and with an area S according to the longitude and latitude coordinates of the area S: the south China sea electronic chart EA200001 refers to IHO S-57(ENC) to determine the layer number of an object to be analyzed from the electronic chart, and in this embodiment, land, island reef and isobath in the electronic chart area S are analyzed. The examination shows that the land map layer number is 71, the reef layer number is 153, and the equal water depth map layer number is 43. Reading the electronic chart data layer by layer according to the layer number of the type of the required extraction object by calling an S57reader class in a geospatial vector data open source library (OGR). And stored in an extensible markup language (xml) file according to the structure shown in fig. 2.

Step3, in this embodiment, according to the maximum one-minute cruising distance of the amphibious unmanned aircraft, selecting the grid size squarize of 0.002, generating a grid matrix with the row (row) of 101 and the column (col) of 101 in the latitude and longitude range of the region S, and performing polygon intersection judgment according to the element (feature) and the geometric shape (g eometery) and giving navigable and non-navigable attributes to the grid through the latitude and longitude coordinates of each grid vertex a, b, c, d and the extensible markup language (xml) file in Step 2. For example, in the extensible markup language (xml), a certain element (feature) is located under a layer (layer) of a land (land), the geometric shape of the feature is polygon, the feature is composed of a plurality of point sets (waypoints), the polygon intersection judgment is carried out on the same longitude and latitude areas as the self-built grid map according to the longitude and latitude information of the point sets (waypoints), and the grid is endowed with the non-navigable grid as long as the polygon composed of the element intersects with the grid. Meanwhile, the navigable grids are assigned according to the coordinate information of the point sets (waypoints) under the equal-water-depth line layers, and interpolation is performed according to the grids which are not assigned and the surrounding grids. And obtaining elevation information of the corresponding area S through GIS software to obtain a two-dimensional array of the elevation information of the area S, and performing elevation data assignment on the land area of the grid map according to the corresponding longitude and latitude coordinates. This results in a grid matrix with elevation data and actual geographic information, as shown in FIG. 3, which is a grid matrix with elevation data and actual geographic information for a portion of the area S. A three-dimensional environment model visualization is shown in fig. 4.

And S2, constructing a Markov Decision Process (MDP) for planning paths of the amphibious unmanned aircraft.

In this embodiment, the specific implementation steps of the S2 for constructing the Markov Decision Process (MDP) for amphibious unmanned aircraft path planning are as follows:

step1, the state space is defined as the position coordinate (x, y) and the height z of the amphibious unmanned vehicle, the position coordinate (x, y) is represented as a two-dimensional continuous space, and the height z is represented as a one-dimensional discrete space for simplifying the training process. The state space of the amphibious unmanned aircraft is represented as

[(x₁,y₁,z₁),(x₂,y₂,z₂),.......,(x_n,y_n,z_n)] (1)

Step2, separating the motion of the amphibious unmanned aircraft into six motions of up, down, left, right, takeoff and landing, namely motion space A ═ up, down, left, right, fly, descan.

Step3, under the navigation condition, testing the navigation speed of the laboratory ship, and taking the displacement distance which advances one minute as the moving distance d of the up, down, left and right actions_sail(ii) a Under the condition of flight, the navigation speed of the self ship in the laboratory is tested, and the displacement distance of one-minute advance is taken as the moving distance d of the up, down, left and right actions_flight. The moving distance of take-off and landing actions is simplified, namely after the take-off action is executed, the amphibious unmanned aircraft can vertically take off to reach the maximum value which can be reached by the amphibious unmanned aircraftHeight h_maxAnd after the landing action is executed, the amphibious unmanned aircraft can vertically land to the water surface with the height of 0. The state transition in a given motion, according to the defined state and motion space, can be expressed as

Where [ x ' y ' z ' ] is the next state and [ x y z ] is the current state.

Step3, the reward function, is represented as:

r_total＝λ_a*r_terminal +λ_b*r_distance+λ_c*r_power+λ_d*r_depth+λ_e*r_obstance (3)

wherein λ_a、λ_b、λ_c、λ_d、λ_eAre the weight coefficients.

Targeted zone awards (r)_terminal). The training efficiency is improved, and the amphibious unmanned aircraft is regarded as a task to be completed in the area where the amphibious unmanned aircraft reaches the target point.

Distance reward function (r)_distance). The influence of a target area is enhanced, and the amphibious unmanned aircraft is restrained to move to the target area more quickly.

Energy consumption reward function (r)_power). When the amphibious unmanned aircraft moves, the energy consumed by the flight and navigation states of the amphibious unmanned aircraft is different, and in order to enable the proportion of flight and navigation in a route planned by a path to meet the requirements of different working scenes, an energy consumption reward function r is adopted_power. By a pair experimentThe energy consumption of the amphibious unmanned aircraft in the room is tested, and the flying energy consumption lambda of one minute is obtained_flightAnd one minute navigation energy consumption lambda_sailTo obtain their ratio of lambda_flight:λ_sailSo the energy consumption reward function can be expressed as

Depth of water reward (r)_depth). According to the environment model analyzed by the electronic chart, each coordinate point has a corresponding water depth. Different from other works, the distance between the amphibious unmanned aircraft and large obstacles such as land, island and reef is represented by the Depth value Depth of the coordinate point. Normally, places with greater water depth are farther from the land, and places with smaller water depth are closer to the land. Water depth reward function r_depthCan be expressed as:

wherein λ₁～λ₆For the numerical value of the reward function, obstance is used for better ensuring the safety of the amphibious unmanned aircraft and the appropriateness of the takeoff opportunity, a 3 x 3 square area is formed around the periphery of the amphibious unmanned aircraft and is used as a detection area of the amphibious unmanned aircraft, and if obstacles exist in the area, obstance is output as 1

Collision reward function (r)_obstance). The collision reward is intended to prevent the amphibious unmanned vehicle from colliding with an obstacle. In the training process of the reinforcement learning algorithm, the amphibious unmanned aircraftIn the event of a collision with an obstacle, the collision reward function will return a large negative reward. The collision reward function may be expressed as:

r_obstance＝-λ_obstance(Depth＞0 and z＝0) (8)

in this embodiment, according to the environment modeling of S1 and the MDP construction of S2, the specific implementation of S3 that completes global path planning according to different working scenarios of the amphibious unmanned aircraft based on a Depth Q Network (DQN) algorithm is as follows:

step1, starting point and target point of given path plan

Step2, importing the environment model established by S1, selecting a Depth Q Network (DQN) algorithm, and obtaining a Depth Q Network (DQN) algorithm flow chart based on amphibious unmanned aircraft path planning as shown in FIG. 5. As an algorithm for path planning, the number of layers of the Q network is set to 3 by setting the Batch size (Batch _ size) to 32, the Learning rate (Learning rate) to 0.01, the number of times of training (episode) to 5000, the attenuation factor (gamma) to 0.9, and the memory playback unit size (memory _ size) to 20000, and training is performed based on the MDP of the amphibious unmanned aircraft at S2 and the three-dimensional environment model at S1.

S303, setting three different working scenes of the amphibious unmanned aircraft as shown in FIG. 6: in the first scenario, emergency starting is required to execute tasks when an emergency occurs, and the target place is required to be reached at the fastest speed. Mainly by adjusting the reward function r_totalWeight coefficient λ in_c、λ_dEqual to 0, regardless of energy storage and water depth limitations;

and a second scene is a daily work task and requires energy storage allowance. Adjusting the reward function r_totalAll weight coefficients in。

And a third scene is that the reserve margin is over half and needs to be rewound and charged. Adjusting the reward function r_totalIn, the weight coefficient lambda_b、λ_dEqual to 0 regardless of water depth and distance limitations.

The foregoing shows and describes the general principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are given by way of illustration of the principles of the present invention, and that various changes and modifications may be made without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A water-air amphibious unmanned aircraft path planning method based on reinforcement learning is characterized by comprising the following steps:

and S3, setting a starting point and a target point, and completing global path planning according to different working scenes of the amphibious unmanned aircraft based on a Depth Q Network (DQN) algorithm according to the MDP for path planning of the amphibious unmanned aircraft.

2. The reinforcement learning-based water-air amphibious unmanned aircraft path planning method according to claim 1, characterized in that: the specific content and steps of selecting the area S of the amphibious unmanned aircraft for executing the path planning task in the S1 and extracting the data of the area S corresponding to the electronic chart according to the area S are as follows:

(1) selecting a region S required by the amphibious unmanned aircraft to execute tasks;

(2) selecting an electronic chart with an S-57 format of a corresponding area according to the longitude and latitude of the area S, and extracting the electronic chart data required by path planning by taking the area S as a range for extracting the electronic chart data;

(3) selecting object types which need to be extracted from the electronic chart and are used for path planning, wherein the object types are usually land and reef islands, and acquiring layer numbers corresponding to the object types by referring to official documents IHO S-57(ENC) of the electronic chart;

(4) reading the chart data layer by layer according to the layer number of the type of the object to be extracted through a geospatial vector data open source library (OGR);

(5) storing the data read on the land and submerged reef electronic chart Layer by Layer in an extensible markup language (xml) file according to a Layer-element (feature) -field (field) and a geometric object (geometry); wherein the geometry object (geometry) holds the geometric property of the element, indicating that the type of the element is one of a point (point), a line (line) or a polygon (pol ygon).

3. The reinforcement learning-based water-air amphibious unmanned aircraft path planning method according to claim 1, characterized in that: the specific contents and steps for extracting the data of the corresponding area S in the electronic chart and performing three-dimensional environment modeling in the S1 are as follows:

(1) establishing a grid matrix with the column number of (wlong-elong)/square +1 and the row number of (hlati-lati)/square +1 according to the longitude range (wlong, elong) of the area S, the latitude range of (hlati, lati) and the grid size of (square esize); the latitude and longitude of the center point of the ith row and j columns of the grid matrix can be represented as:

the latitude and longitude of the four vertices a, b, c, d of the grid is expressed as:

a＝(centerpointlon-0.5*squaresize,centerpointlat-0.5*squaresize)

b＝(centerpointlon+0.5*squaresize,centerpointlat+0.5*squaresize)

c＝(centerpointlon+0.5*squaresize,centerpointlat-0.5*squaresize)

d＝(centerpointlon-0.5*squaresize,centerpointlat-0.5*squaresize) (2)

(2) obtaining an extensible markup language (xml) file by extracting the electronic chart data of the region S, traversing the xml, and accessing each node in sequence; the method is divided into two tasks: the method comprises the steps of establishing a first task, and establishing a navigable point and an unviable point of a grid map; a second task, filling the water depth value of each grid; the specific method comprises the following steps:

(2.1) establishing navigable points and non-navigable points of a grid map by the first task, namely when elements are located under land and submerged reef layers and the types of the elements are polygons (Polygon) and lines (line), accessing a child node point set (waypoints) of the elements, storing longitude and latitude coordinates of all coordinate point (waypoint) nodes under the point set (waypoints) nodes, filling the stored longitude and latitude coordinate points into polygons by using a Polygon function, performing Polygon intersection judgment with a rectangle formed by the points a, b, c and d, and if the two nodes intersect, setting a grid where the abcd point is located as a non-navigable grid; when the element type is point (point), acquiring longitude and latitude coordinates of a coordinate point (waypoint) node under a point set (waypoints) of the point, judging whether each coordinate is in a rectangular grid formed by a, b, c and d, and if so, setting the grid where the current a, b, c and d points are positioned as a non-navigable grid;

(2.2) filling the water depth value of each grid by the second task, namely when the element sub-node is positioned below the nodes of the isobologram layer, similarly, acquiring longitude and latitude coordinates of a coordinate point (waypoint) node under a point set (waypoints) of the element sub-node, judging whether the coordinate point is in a rectangular grid formed by a, b, c and d, and the grid is a navigable grid, and if so, assigning the water depth (dep value to the grid;

(3) finally, a grid matrix with actual geographic information can be obtained, and the cells with numerical values in the matrix represent navigable areas with depth values;

(4) obtaining elevation data of the area S, wherein the elevation data is generally in a tif format;

(5) acquiring the elevation data of the area S, namely acquiring longitude and latitude coordinates of a vertex at the upper left corner and the size of a two-dimensional array when intercepting the elevation data of the area S; calculating longitude and latitude information of each pixel point according to the resolution of the elevation data, wherein the value of each pixel point is an elevation value; therefore, after the two-dimensional array of the elevation data and the elevation value and the longitude and latitude coordinates of each unit are obtained, the non-navigable area of the grid matrix in the step (3) is assigned with the elevation value according to the longitude and latitude coordinates by a comparison assignment method, and thus the grid matrix with the elevation data can be obtained.

4. The reinforcement learning-based water-air amphibious unmanned aircraft path planning method according to claim 1, wherein the Markov Decision Process (MDP) for constructing the amphibious unmanned aircraft path planning in step S2 is defined by the following details with respect to the action space and the state space of the amphibious unmanned aircraft:

(1) the state space of the amphibious unmanned aircraft is defined as a position coordinate (x, y) and a height z of the amphibious unmanned aircraft, the position coordinate (x, y) is expressed as a two-dimensional continuous space, and the height z is expressed as a one-dimensional discrete space in order to simplify the training process; the state space of the amphibious unmanned aircraft is thus represented as

[(x₁,y₁,z₁),(x₂,y₂,z₂),.......,(x_n,y_n,z_n)] (3)

(2) Considering that the amphibious unmanned aircraft has the characteristics of underwater flight and air flight at the same time, the actions of the amphibious unmanned aircraft are dispersed into six actions of up, down, left, right, takeoff and landing, namely an action space A [ up, down, left, right, fly, descan ];

(3) and under the sailing condition, testing the sailing speed of the ship in the laboratory, and taking the displacement distance of one minute of forward movement as the moving distance (d) of the up, down, left and right actions_sail) (ii) a In the case of flight, the displacement distance of one minute of advance is taken as the moving distance of up, down, left and right actions (d) through the test of the navigation speed of the laboratory ship_flight) (ii) a The moving distance of the take-off and landing actions is simplified, namely after the take-off action is executed, the amphibious unmanned aircraft vertically takes off to reach the maximum height (h) which can be reached by the amphibious unmanned aircraft_max) After the landing action is executed, the amphibious unmanned aircraft can vertically land to the water surface with the height of 0; the state transition in a given motion, according to the defined state and motion space, can be expressed as

Where [ x ' y ' z ' ] is the next state and [ x y z ] is the current state.

5. The reinforcement learning-based water-air amphibious unmanned aircraft path planning method according to claim 1, wherein the Markov Decision Process (MDP) for constructing the amphibious unmanned aircraft path planning in step S2 is defined by the following specific contents in terms of amphibious unmanned aircraft reward function:

(1) target zone reward (r)_terminal) (ii) a The training efficiency is improved, and the amphibious unmanned aircraft is regarded as a task to be completed in the area where the amphibious unmanned aircraft reaches the target point;

(2) distance reward function (r)_distance) (ii) a The influence of a target area is enhanced, and the amphibious unmanned aircraft is restrained to move to the target area more quickly;

wherein DistaneNow indicates that the currentThe distance between the state amphibious unmanned aircraft and the target point, and the DistaneFuture represents the distance between the next state amphibious unmanned aircraft and the target point; lambda [ alpha ]_distanceIs a distance weight coefficient;

(3) energy consumption reward function (r)_power) (ii) a When the amphibious unmanned aircraft moves, the energy consumed by the flight and navigation states of the amphibious unmanned aircraft is different, and in order to enable the duty ratio of flight and navigation in a route planned by a path to meet the requirements of different working scenes, an energy consumption reward function r is adopted_power(ii) a Through energy consumption test of the amphibious unmanned aircraft in the laboratory, the flight energy consumption lambda of one minute is obtained_flightAnd its one minute energy consumption λ for navigation_sailTo obtain their ratio of lambda_flight:λ_sailSo the energy consumption reward function can be expressed as

The alpha is a proportionality coefficient, and when the amphibious unmanned aircraft is in a flying state and a sailing state, negative energy consumption rewards are generated when each action is taken;

(4) water depth reward (r)_depth) (ii) a According to the environment model analyzed by the electronic chart, each coordinate point has a corresponding water depth; when the amphibious unmanned aircraft works different from other works, the distance between the amphibious unmanned aircraft and large obstacles such as land, island and reef is represented by the Depth value (Depth) of the coordinate point; normally, the places with larger water depth are farther from the land, and the places with smaller water depth are closer to the land; water depth reward function r_depthCan be expressed as:

wherein λ₁～λ₆For the numerical value of the reward function, an obstacle flag bit (obstance) is used for better ensuring the safety of the amphibious unmanned aircraft and the appropriateness of the takeoff opportunity, a 3 x 3 square area is formed around the periphery of the amphibious unmanned aircraft and serves as a detection area of the amphibious unmanned aircraft, and if the area has an obstacle, obstance is output as 1;

(5) collision reward function (r)_obstance) (ii) a The collision reward aims to prevent the amphibious unmanned aircraft from colliding with the obstacle; in the reinforcement learning algorithm training process, once the amphibious unmanned aircraft collides with the barrier, a collision reward function returns a large negative reward; the collision reward function may be expressed as:

r_obstance＝-λ_obstance(Depth＞0 and z＝0) (9)

λ_obstancerepresenting a negative reward value returned by the collision reward, and generating the collision reward when the coordinate water depth value of the next state of the amphibious unmanned aircraft is positive and the amphibious unmanned aircraft is not in a flight state, namely the amphibious unmanned aircraft is considered to collide with the obstacle;

(6) the overall reward function may be expressed as:

r_total＝λ_a*r_terminal+λ_b*r_distance+λ_c*r_power+λ_d*r_depth+λ_e*r_obstance (10)

wherein λ_a、λ_b、λ_c、λ_d、λ_eAre weight coefficients.

6. The reinforced learning-based water-air amphibious unmanned aircraft path planning method according to claim 1, wherein the starting point and the target point are given in the step S3, and according to the MDP for amphibious unmanned aircraft path planning, a specific process based on a Depth Q Network (DQN) algorithm is as follows:

(1) a starting point and a target point of the given path plan;

(2) and importing the environment model established in S1, selecting a Deep Q Network (DQN) algorithm as a path planning algorithm, setting Batch size (Batch _ size) to 32, Learning rate (Learning rate) to 0.01, training frequency (episode) to 5000, attenuation factor (gamma) to 0.9, and memory playback unit size (memory _ size) to 20000, setting the number of layers of the Q network to 3, and training according to the MDP of the amphibious unmanned aircraft in S2 and the three-dimensional environment model in S1.

7. The reinforcement learning-based water-air amphibious unmanned aircraft path planning method according to claim 1, wherein in the step S3, the setting of working scenes of the amphibious unmanned aircraft in the global path planning is completed according to different working scenes of the amphibious unmanned aircraft as follows:

(1) setting three different working scenes of the amphibious unmanned aircraft: in a first scene, when an emergency event occurs, an emergency action is required to execute a task, and a target place is required to be reached at the highest speed; mainly by adjusting the reward function r_totalWeight coefficient λ in_c、λ_dEqual to 0, regardless of energy storage and water depth limitations;

(2) the scene two is a daily work task and requires energy storage allowance; adjusting the reward function r_totalAll weight coefficients in (1);

(3) and the third scene is that the reserve margin is over half and needs return-voyage charging; adjusting the reward function r_totalIn, the weight coefficient lambda_b、λ_dEqual to 0 regardless of water depth and distance limitations.