CN113433937A

CN113433937A - Heuristic exploration-based layered navigation obstacle avoidance system and layered navigation obstacle avoidance method

Info

Publication number: CN113433937A
Application number: CN202110639497.1A
Authority: CN
Inventors: 李鹏; 张文祺; 赵锴; 朱晓; 叶发萍; 江微杰; 王韬
Original assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Current assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2021-09-24
Anticipated expiration: 2041-06-08
Also published as: CN113433937B

Abstract

The invention discloses a layered navigation obstacle avoidance method, a layered navigation obstacle avoidance device, a layered navigation obstacle avoidance system, computer equipment and a storage medium based on heuristic exploration. The method comprises the following steps: and sequentially sending the plurality of first intermediate points to the robot, and taking the first intermediate points as corresponding first temporary target points. Therefore, according to the embodiment of the application, the path planning submodule of the upper module can accurately calculate the first midway points according to the selected first optimal exploration point, sequentially send the first midway points to the lower obstacle avoidance module as the corresponding first temporary target point, and sequentially send the accurately calculated first midway points to the robot, so that the robot sequentially runs to the first midway points until the first optimal exploration point is reached, and thus, the navigation obstacle avoidance method can enable the robot to perform autonomous navigation in the high-dynamic complex scene without the map.

Description

Heuristic exploration-based layered navigation obstacle avoidance system and layered navigation obstacle avoidance method

Technical Field

The invention relates to the technical field of robot navigation, in particular to a heuristic exploration-based layered navigation obstacle avoidance system and a layered navigation obstacle avoidance method.

Background

In recent years, robot navigation algorithms have been widely studied in the industry, and many commercial service robots have a certain indoor navigation capability. But in real environment, wheeled or legged robots are usually required to perform tasks such as rescue, detection and the like in map-free scenes such as disaster sites, fields and the like. Often no environmental map can be provided in advance in such emergency scenarios, and there may also be a large number of high speed moving obstacles in the environment. Therefore, the robot is required to have good navigation and obstacle avoidance capabilities. In addition, since a large number of galleries, rooms and other complex layouts may exist in a task scene, the robot also needs to have long-term history memory and planning capability to avoid getting lost or falling into dead corners. Generally, in order to realize autonomous navigation in an unknown high-dynamic and complex environment, the robot is required to have an efficient environment exploration strategy, a reliable path planning algorithm and a flexible obstacle avoidance capability.

Currently, the commonly used navigation algorithms are map-based methods and learning-based methods. The map-based method implements secure navigation by continuously creating maps and planning tracks, such as the move _ based toolkit in ros (robotics operation system). The method usually plans a global path according to a current map, and then adjusts a local path in real time to achieve the purpose of avoiding obstacles. However, under a high dynamic environment, the effect of the method is difficult to satisfy. Since obstacles moving at high speed will affect the results of mapping and local path planning. In addition, the method based on reinforcement learning can also realize flexible navigation obstacle avoidance, namely, the reinforcement learning is utilized to train the robot to learn how to avoid the obstacle and navigate in the simulator. The method based on learning can greatly improve the obstacle avoidance capability, but the method lacks the capabilities of environmental memory and global planning, and is difficult to deal with the navigation task in a complex scene, namely the navigation task is blocked by a wall or trapped in a dead angle, so that the navigation task fails. In principle, a feasible track is continuously optimized in the existing map and an unknown area based on a map method, but the calculation complexity is high, and the method is difficult to cope with a high dynamic environment; the learning-based method learns to avoid obstacles flexibly through training, but lacks memory capacity, but cannot independently solve the navigation problem of complex scenes in a large environment.

The pure autonomous navigation capability under the unknown high-dynamic complex scene is very important for popularization and application of the robot. The above methods are difficult to solve the problem independently and completely, and usually require user assistance, or require environment mapping in advance, or use other sensors to capture external obstacles, and such limitations may greatly reduce the application prospect of the robot.

Therefore, a novel navigation system is needed, which can independently solve the problems of mapping, positioning, exploring, planning and obstacle avoidance navigation, and enable the robot to have a pure autonomous navigation capability in the high dynamic complex scene without a map.

Disclosure of Invention

Based on this, it is necessary to provide a layered navigation obstacle avoidance system, a method, an apparatus, a computer device, and a storage medium based on heuristic exploration, for solving the navigation problem of a complex scene in a large environment, which is existed in the existing navigation obstacle avoidance method.

In a first aspect, the embodiment of the application provides a layered navigation obstacle avoidance system based on heuristic exploration, the layered navigation obstacle avoidance system comprises an upper module and a lower obstacle avoidance module, the upper module and the lower obstacle avoidance module operate in different processes respectively, the upper module is used for being responsible for exploration and planning in an unfamiliar map-free environment, the lower obstacle avoidance module is used for achieving local navigation obstacle avoidance and has a close-coupled incidence relation between the upper module and the lower obstacle avoidance module, the upper module provides a temporary target point for the lower obstacle avoidance module, and the lower obstacle avoidance module provides a safety heuristic factor for planning a path by the upper module.

In a second aspect, an embodiment of the present application provides a method for performing hierarchical navigation obstacle avoidance by using a hierarchical navigation obstacle avoidance system, where the method includes:

calculating a safety elicitor of a candidate point through a value evaluation network of a lower-layer reinforcement learning obstacle avoidance model according to current state information of the robot and peripheral obstacle distribution information of the environment where the robot is located;

calculating a target heuristic factor of a candidate point according to a current map constructed in real time in an unfamiliar map-free environment, a preset target and the current position of the robot;

calculating a candidate point with the minimum heuristic value according to the safety heuristic factor and the target heuristic factor, and taking the candidate point with the minimum heuristic value as an optimal exploration point;

calculating a plurality of midway points according to the optimal exploration point, sequentially sending the midway points to a lower layer obstacle avoidance module as corresponding temporary target points, and setting the temporary target points as target points of the lower layer reinforcement learning obstacle avoidance model;

and inputting the environmental data of the robot into a navigation obstacle avoidance model, outputting a navigation instruction, and sending the navigation instruction to the robot.

In a third aspect, an embodiment of the present application provides a method for performing hierarchical navigation obstacle avoidance by using a hierarchical navigation obstacle avoidance system, where the method includes:

receiving a plurality of intermediate points sequentially sent by the layered navigation obstacle avoidance system and taking the intermediate points as corresponding temporary target points;

and sequentially driving to the corresponding midway points according to the temporary target points until the optimal exploration point is reached.

In a fourth aspect, an embodiment of the present application provides an apparatus for performing layered navigation obstacle avoidance by using a layered navigation obstacle avoidance system, where the apparatus includes:

the safety elicitation factor calculation module is used for calculating the safety elicitation factors of candidate points through a value evaluation network of a lower-layer reinforcement learning obstacle avoidance model according to the current state information of the robot and the peripheral obstacle distribution information of the environment where the robot is located;

the target elicitation factor calculation module is used for calculating a target elicitation factor of the candidate point according to a current map constructed in real time in an unfamiliar map-free environment, a preset target and the current position of the robot;

the optimal exploration point calculation module is used for calculating a candidate point with the minimum heuristic value according to the safety heuristic factor calculated by the safety heuristic factor calculation module and the target heuristic factor calculated by the target heuristic factor calculation module, and taking the candidate point with the minimum heuristic value as an optimal exploration point;

the target point determining module is used for calculating a plurality of midway points according to the optimal exploration point determined by the optimal exploration point calculating module, sequentially sending the midway points to the lower layer obstacle avoidance module to serve as corresponding temporary target points, and setting the temporary target points as target points of the lower layer reinforcement learning obstacle avoidance model;

the processing module is used for inputting the environmental data of the robot into a navigation obstacle avoidance model and outputting a navigation instruction;

and the sending module is used for sending the navigation instruction obtained by the processing module to the robot.

In a fifth aspect, an embodiment of the present application provides an apparatus for performing layered navigation obstacle avoidance by using a layered navigation obstacle avoidance system, where the apparatus includes:

the receiving module is used for receiving a plurality of intermediate points sequentially sent by the layered navigation obstacle avoidance system and taking the intermediate points as corresponding temporary target points;

and the driving module is used for sequentially driving to the corresponding midway points according to the temporary target points determined by the receiving modules until the optimal exploration point is reached.

In a sixth aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, where the memory stores at least one computer program, and the at least one computer program is loaded and executed by the processor, so as to implement any one of the above-mentioned methods for performing hierarchical navigation and obstacle avoidance by using the hierarchical navigation and obstacle avoidance system.

In a seventh aspect, an embodiment of the present application provides a computer-readable storage medium, where at least one computer program is stored in the computer-readable storage medium, and the at least one computer program is loaded and executed by a processor, so as to implement any one of the above-mentioned methods for performing hierarchical navigation and obstacle avoidance by using the hierarchical navigation and obstacle avoidance system.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

in the embodiment of the application, the layered navigation obstacle avoidance method provided by the embodiment of the application calculates a plurality of midway points according to the optimal exploration point, sequentially sends the midway points to a lower layer obstacle avoidance module as corresponding temporary target points, and sets the temporary target points as target points of a lower layer reinforcement learning obstacle avoidance model; and inputting the environmental data of the robot into the navigation obstacle avoidance model, outputting a navigation instruction, and sending the navigation instruction to the robot. Therefore, by adopting the embodiment of the application, the optimal exploration point can be accurately determined, the plurality of midway points are calculated according to the optimal exploration point, the plurality of midway points are sequentially sent to the lower layer obstacle avoidance module to serve as corresponding temporary target points, the temporary target points are set as target points of the lower layer reinforcement learning obstacle avoidance model, the environmental data of the robot is input into the navigation obstacle avoidance model, the navigation instruction is output, and the navigation instruction is sent to the robot, so that the robot can accurately navigate and avoid the obstacle. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a schematic flow chart of a layered navigation obstacle avoidance method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a layered navigation obstacle avoidance system adopted by the layered navigation obstacle avoidance method in a specific application scenario provided by the embodiment of the present disclosure;

FIG. 3 is a navigation diagram illustrating exploration in an unknown environment based on a heuristic exploration mechanism in a specific application scenario according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram illustrating point selection based on a heuristic exploration mechanism in a specific application scenario according to the embodiment of the present disclosure;

fig. 5 is a schematic flowchart of a layered navigation obstacle avoidance method based on heuristic exploration in a specific application scenario according to an embodiment of the present disclosure;

fig. 6 is another schematic flow chart of a layered navigation obstacle avoidance method according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a layered navigation obstacle avoidance apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of another layered navigation obstacle avoidance device provided in the embodiment of the present disclosure.

Detailed Description

The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.

It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Alternative embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the embodiment of the present disclosure provides a layered navigation obstacle avoidance method, which is applied to a server, and a system used in the layered navigation obstacle avoidance method is a layered navigation obstacle avoidance system based on heuristic exploration, the layered navigation obstacle avoidance system includes an upper module and a lower obstacle avoidance module, the upper module and the lower obstacle avoidance module operate in different processes respectively, the upper module is used for being responsible for exploration and planning in a map-free strange environment, the lower obstacle avoidance module is used for implementing local navigation obstacle avoidance, and the upper module and the lower obstacle avoidance module have a close-coupled association relationship therebetween, the upper module provides a temporary target point for the lower obstacle avoidance module, and the lower obstacle avoidance module provides a safety heuristic factor reference for the upper module to plan a path.

The layered navigation obstacle avoidance method provided by the embodiment of the disclosure specifically comprises the following steps:

s101: calculating a safety elicitor of a candidate point through a value evaluation network of a lower-layer reinforcement learning obstacle avoidance model according to current state information of the robot and peripheral obstacle distribution information of the environment where the robot is located; and the candidate points are all candidate points on the currently constructed map.

S102: calculating a target heuristic factor of a candidate point according to a current map constructed in real time in an unfamiliar map-free environment, a preset target and the current position of the robot; and the candidate points are all candidate points on the currently constructed map.

S103: calculating a candidate point with the minimum heuristic value according to the safety heuristic factor and the target heuristic factor, and taking the candidate point with the minimum heuristic value as an optimal exploration point;

s104: calculating a plurality of midway points according to the optimal exploration point, sequentially sending the midway points to a lower layer obstacle avoidance module to serve as corresponding temporary target points, and setting the temporary target points as target points of a lower layer reinforcement learning obstacle avoidance model;

s105: and inputting the environmental data of the robot into the navigation obstacle avoidance model, outputting a navigation instruction, and sending the navigation instruction to the robot.

In a possible implementation manner, the method for layered navigation obstacle avoidance provided by the embodiment of the present disclosure further includes the following steps:

constructing a value evaluation network and an action strategy network of a lower-layer reinforcement learning obstacle avoidance model;

and training a learning obstacle avoidance model for realizing stress type reinforcement learning obstacle avoidance of the robot based on a preset depth reinforcement learning model to obtain a lower-layer reinforcement learning obstacle avoidance model.

and building a grid map and calculating the current position of the robot by incremental mapping in the unfamiliar map-free environment.

and acquiring a preset condition for reselecting a new optimal exploration point, judging whether the current condition meets the preset condition, and reselecting the new optimal exploration point when judging that the current condition meets the preset condition.

In the embodiment of the present application, the preset conditions include: in the exploration process of the robot, the entropy change of an exploration map is larger than a preset entropy change threshold; or detecting that the state of the current search point changes; or, detecting that the robot has reached the optimal exploration point; alternatively, the target point is detected as a known area.

As shown in fig. 2, a schematic diagram of a layered navigation obstacle avoidance system adopted by the layered navigation obstacle avoidance method in a specific application scenario provided by the embodiment of the present disclosure is shown.

In order to solve the navigation problem under the unknown high-dynamic complex environment, the layered navigation obstacle avoidance system comprises an upper layer and a lower layer, as shown in fig. 2, wherein an upper layer module needs to be responsible for synchronously exploring the unknown environment and realizing path planning, and a lower layer obstacle avoidance module mainly realizes short-distance navigation obstacle avoidance and ensures that the robot safely runs from the current point to a temporary target.

As shown in fig. 2, the upper layer module includes three parts, namely a SLAM-based mapping submodule (SLAM-based mapping), a heuristic exploration submodule (heuristic exploration) and a waypoint planning submodule (waypoint planning) which operate at 30Hz, 0.2Hz and 1Hz respectively. The lower layer obstacle avoidance module is a separately trained reinforcement learning obstacle avoidance module and operates at 30 Hz.

In the layered navigation obstacle avoidance system shown in fig. 2, a stress type obstacle avoidance model based on reinforcement learning is trained separately for the middle and lower layers:

designing reinforcement learning elements: firstly, a compact state space is designed as a state input, and an observation vector of the robot is recorded as:

wherein the robot observes the surrounding environment as a vector O through the mounted 2D laser radar_LAnd the projection vector of the target point relative to the robot under the carrier coordinate system is O_gAnd simultaneously, the laser observation data recorded at the first n moments

The relative motion vector is calculated:

the relative motion vector O_mThe self-movement of the robot and the movement of the external obstacles are integrated. And finally, the self motion vector of the robot in a world coordinate system is marked as O_v. Then to apply the invention to more robotic platforms, the motion vectors are designed to be: [ v ] of_x，v_y，w_z]The motion speed in the x-axis direction, the motion speed in the y-axis direction and the rotation angular speed of the robot are represented respectively.

The robot is trained to realize short-distance obstacle avoidance navigation in a simulator based on a reinforcement learning parallel PPO algorithm, as shown in FIG. 2, a DPPO continuous control deep reinforcement learning algorithm is adopted to set different scenes in a Stage simulator to train the trolley to learn short-distance obstacle avoidance navigation, a reward function is calculated according to the distance between the current position and the target position and whether collision occurs, and the reward function is used for guiding and updating an action strategy network and a value evaluation network in the reinforcement learning algorithm. In order to accelerate the training process, a parallel training mode is adopted to synchronously train in multiple scenes, as shown in fig. 3, the multiple scenes include an open scene, a static obstacle scene, a dynamic obstacle scene, and the like.

In the layered navigation obstacle avoidance system shown in fig. 2, the map building sub-module of the upper module is used for building an environment map in real time, and the process of building the environment map in real time is specifically as follows:

and the upper layer runs a mapping submodule in real time, and a Cartogrhper algorithm running a laser SLAM in the mapping submodule maps and updates the environment in real time to obtain a two-dimensional occupation grid map (occupancy grid map). ) Defining boundary point as the boundary point between known idle point and unknown state point according to the latest network map, and recording as P_boundary。

The mapping submodule needs to be responsible for incremental mapping and positioning, namely, the robot only knows the position coordinates and the target coordinates of the robot at the beginning of completely unknown environment, and no information is known about the environment. After continuous exploration, the mapping submodule continuously maps a new area, and accurately estimates the position P, the attitude Pose and the real-time speed V of the robot synchronously through a Cartogrhper algorithm, angular speed and acceleration data of an inertial device IMU and wheel type odometer information.

Fig. 4 is a schematic diagram illustrating point selection based on a heuristic exploration mechanism in a specific application scenario provided by the embodiment of the present disclosure.

As shown in fig. 4, in both scenarios of fig. 4, the robot is more inclined to select a safer exploration point, namely: dots within the black box.

In the hierarchical navigation method provided in the embodiments disclosed in the present application, a multi-factor heuristic exploration mechanism is adopted, which is specifically as follows:

4.1 the exploration submodule calculates the distance heuristic D^h

Firstly, the exploration submodule extracts edge points of a known area and an unknown area in the map according to the latest map obtained by the mapping submodule, and the edge points P are filtered and clustered by a series of edge points_boundaryAs candidate points.

Secondly, the exploration submodule calculates a real-time latest map, a current robot position posture and a preset final target point according to the mapping module, and calculates a target heuristic factor of each candidate point position

Representing the euclidean distance from the candidate point to the target point,

representing the path length from the current point to the candidate point, which measures the distance between the candidate point and the target and how easily it is to reach the point,

the calculation is realized by adopting an A-path planning algorithm, because the current point to the candidate points are all known on the map, a feasible path can be obtained by the A-path planning method, and the length of the feasible path is the length of the feasible path

4.2 lower-layer reinforcement learning obstacle avoidance module calculates safety heuristic factor V^h

The lower-layer reinforcement learning obstacle avoidance module calculates a safety elicitation factor V corresponding to each candidate point_i ^hThe calculation method is as follows: safety level heuristic V_i ^hThe candidate point is set as a target point O through the calculation of a value evaluation network (criticc model) of a reinforcement learning obstacle avoidance model of a lower layer module_gAnd at the present time observing_LCalculating the output value of critic network and V of the candidate point_i ^h。V_i ^hThe safety degree of the candidate point is measured, and V is the denser the obstacles between the robot and the candidate point are_i ^hThe smaller the value, the more dangerous the candidate point is represented.

4.3 the exploration submodule selects exploration points from the candidate points according to two types of factors: as shown in fig. 2, the hierarchical navigation obstacle avoidance method provided by the embodiment of the present disclosure designs two types of heuristic factors, and a selection mechanism of an exploration point is as follows: calculating the values of the two types of elicitors of all candidate points, normalizing the values by the following formula, and then adding the values:

the coefficient γ represents a weighting coefficient between two terms for balancing the contribution of the two classes of factors, the denominator of which is to normalize it. The calculation formula shows that the heuristic function is calculated by two heuristic factors, and the algorithm finally selects the boundary point with the minimum heuristic function value as the exploration point. And selecting the most suitable boundary point as the next exploration point P according to the heuristic function_exploration

Under the action of the two types of elicitation factors, the robot can select a more close target, a more easily reached area and a safer area to search for next step. As shown in fig. 4, the robot selects the right branch edge area as a search point in the left image test scene, and selects the left branch edge area as a search point in the right image test scene, i.e., selects a safer area as a next search point in both cases. The layered navigation obstacle avoidance method provided by the embodiment of the disclosure introduces a multi-factor heuristic exploration mechanism and combines the characteristics of a criticic network of a lower layer module, and the upper layer and the lower layer are tightly coupled to ensure that the whole system is safer and more reliable.

The waypoint planning submodule plans the waypoints and sends the waypoints to the lower layer obstacle avoidance model to execute, and the method is specifically as follows:

at the selection of a suitable exploration point P_explorationAnd the intermediate point planning module plans a shortest path from the current position to the search point. The path is composed of a series of continuous dense path point sequences, and in order to reduce the communication burden of the upper layer and the lower layer, a plurality of intermediate points P are selected from the path point sequences_waypointAnd connecting the midway point P_waypointSending to the lower module as a temporary target point O_g. The criterion for selecting a waypoint from the planned path sequence is that the farthest point that the robot can reach straight is the waypoint P_waypointI.e. representing that no large obstacle blocks the path from the midway point to the current position. As shown in fig. 3: due to the existence of the boundary of the enclosing wall, the robot cannot directly reach the search point from the current point, and the midway point is selected to be at the edge position of the obstacle.

According to the standard, the intermediate points are sequentially selected in the path point sequence and are sent to the lower layer module one by one to serve as temporary targets O_gAnd guiding the robot to travel to a search point, repeating the calculation steps when the robot reaches the intermediate points in sequence and finally reaches the search point, calculating a new position and the search point under the map, calculating the intermediate point based on the search point and delivering the intermediate point to the lower layer obstacle avoidance model for execution.

And (3) replanning exploration points: when the robot drives to the exploration point step by step, the map is continuously updated, in order to improve the navigation exploration efficiency, whether the exploration point needs to be replanned or not needs to be continuously detected, and the standard of replanning is as follows:

(1) the map is greatly changed in the exploration process, and the information quantity of the map is measured by adopting an entropy value

p(m_i，j) The probability of occupation of the ith row and the jth column in the grid map is, when the entropy change of the map exceeds a certain threshold value, the map is changed a lot, and at this time, the old search point is not necessarily the optimal selection, and the search point should be planned again.

(2) The state of the exploration point changes, i.e. the old exploration point may be occupied by a dynamic obstacle for a long time, and the exploration point should be re-planned at this time.

(3) Robot reaches exploration point P_explorationAt this time, the next exploration point should be planned.

(4) Target point P_goalAppear in known areas of the grid map. At the moment, the robot can directly drive to the target point without searching points.

As shown in fig. 5, a schematic flow chart of a layered navigation obstacle avoidance method in a specific application scenario provided by the embodiment of the present disclosure is shown.

As shown in fig. 5, in a specific application scenario, the layered navigation obstacle avoidance method based on heuristic exploration is applied to a server, and includes the following steps:

s501: and training a learning obstacle avoidance model for realizing stress type reinforcement learning obstacle avoidance of the robot based on the model constructed by the deep reinforcement learning PPO algorithm to obtain a lower-layer reinforcement learning obstacle avoidance model.

In the embodiment of the application, an obstacle avoidance model is trained on a simulator based on a reinforcement learning method, and the specific training process is as follows:

1) designing a plurality of training scenes on a stage simulator:

firstly, a trolley model and an obstacle model are established in a Stage simulator, a plurality of training and testing scenes are designed, and then physical attributes of the trolley, such as mass, speed, acceleration, rigidity and the like, are set according to the configuration of a real robot. And secondly, configuring a 2d laser radar for the trolley in the simulator, and simulating parameters of the radar, such as measurement accuracy, angular resolution, measurement range, scanning frequency and the like according to the civil-level laser radar performance indexes of the real world. And finally, setting a plurality of static obstacle scenes and dynamic obstacles, wherein the size and the shape of the obstacles are different, the maximum moving speed of the obstacles in the environment is at least two times of the maximum speed of the trolley, the obstacles randomly walk in the environment, and the walking speed is randomly sampled in a speed range.

2) Building a PPO-based multi-process reinforcement learning training framework, which comprises a critic network and an actor network:

firstly, building a deep reinforcement learning model, which is realized by adopting a PPO algorithm and is divided into an operator model and a critic model, as shown in FIG. 2, the operator and the critic model of the reinforcement learning algorithm are both built by a deep neural network, a buffer memory history tuple is built, the critic model outputs a value, an advantaged value is calculated, and the operator model calculates action output.

3) Training a reinforcement learning algorithm to realize short-distance obstacle avoidance navigation:

first, four types of reward functions Rg, Rc, Rt, Rr are set, where: targeted reward R_gThe closer each step is to the target, the larger the reward is, the calculation method is as follows:

w_g·(d^t-d^t-1) Wg is a positive weighting coefficient set in advance, representing the difference between the Euclidean distance between the time t and the target compared with the time t-1.

Collision penalty R_cIn a simulation process, when the trolley collides with an obstacle, a certain punishment is given, and the calculation is as follows: w is a_c·15，W_cIs a preset weighting coefficient less than 0.

Reach reward R_rIn the simulation process, the vehicle is rewarded when reaching the terminal point, and the calculation is as follows: w is a_r20, Wr is also a preset positive weight coefficient.

Temporal penalty R_t: on behalf of each simulation, a certain reward or penalty is given as long as the car is still alive, calculated as follows: w is a_t0.01, Wt is also the weight coefficient less than 0 that achieves the preset. This time penalty drives the robot to quickly reach the endpoint.

The reward function R (t) Rg + Rc + Rt + Rr in each simulation process

Secondly, multi-process training is realized in a simulator, N robots are arranged in a training scene, and the observation states of all the robots are collected in each simulation step length

And R (t), and then the data is sent to a buffer memory in a reinforcement learning framework for storage. And calculating N action outputs A (t) by a PPO algorithm in a reinforcement learning frame according to the N observation states, returning the N action outputs A (t) to the N robot trolleys, executing respective actions by each trolley, collecting respective O (t +1) and R (t +1) at a new moment, and sequentially circulating. And when the sample collected by the buffer memory in the reinforcement learning framework exceeds a certain threshold value, updating the operator and critic network parameters. In addition, when a certain trolley collides in the simulator or reaches the maximum time step length, the simulator sends a DONE instruction to the reinforcement learning frame to represent that the simulation is finished, the reinforcement learning frame correspondingly sends a RESET instruction to restart the trolley, the whole parallel training environment is carried out in a synchronous mode, and the scenes at least comprise three types of static scenes, dynamic scenes and open scenes.

And finally, training a flexible short-distance obstacle avoidance strategy by a reinforcement learning algorithm under the guidance of the four types of reward functions through multiple rounds of training.

S502: and constructing a value evaluation network of a lower-layer reinforcement learning obstacle avoidance model, and calculating a safety elicitation factor through the value evaluation network according to the current state information of the robot and the peripheral obstacle distribution information of the environment where the robot is located.

S503: and a map building sub-module of an upper module of the hierarchical navigation obstacle avoidance system builds a grid map and calculates the current position of the robot in real time based on the environment of the robot through a preset mode based on laser 2 d-slam.

In the layered navigation obstacle avoidance method provided in the embodiment of the present application, the map building sub-module implements the implementation and construction process of the environment map as follows:

1) firstly, initializing a circular blank map by taking the position of the starting point of the robot as the center of a circle and taking the length of 1.3 times of the distance from the robot to a target as a radius according to the relative distance and the azimuth of the target point and the starting point of the robot, and then positioning the target point in the map close to the edge area. The states of all points in the map are unknown, and at the moment, the robot only knows the coordinates of the target position and the coordinates of the self position. And (3) operating a Cartograher algorithm of the laser SLAM in the mapping submodule to map the environment in real time and update the environment in real time to obtain a two-dimensional occupation grid map (occupancy grid map). Wherein, each pixel point value in the network map is set to be between 0 and 1, and represents the occupation probability of the network area, which is as follows: the occupancy of this point by an obstacle is set to be greater than 0.52, the ascertained known idle points are set to be less than 0.48, and the unknown state points are set between 0.48 and 0.52. Defining boundary point as the boundary point between known idle point and unknown state point according to the latest network map, and recording as P_boundary。

2) The mapping submodule needs to be responsible for incremental mapping and positioning, namely, the robot only knows the position coordinates and the target coordinates of the robot at the beginning of completely unknown environment, and the robot does not know the environment information, namely, a real environment map is continuously constructed on the initialized blank map. After continuous exploration, the mapping submodule continuously maps a new area, and accurately estimates the position P, the attitude Pose and the real-time speed V of the robot synchronously through a Cartogrhper algorithm, angular speed and acceleration data of an inertial device IMU and wheel type odometer information.

S504: and an exploration submodule of an upper layer module of the layered navigation obstacle avoidance system calculates a target heuristic factor according to a preset target and the current position of the robot.

S505: and the exploration submodule of the navigation system upper layer module selects an optimal exploration point according to the target heuristic factor, the safety heuristic factor and the current latest map.

In the embodiment of the present application, a specific process for selecting the best candidate point as the exploration point based on the multi-factor heuristic exploration mechanism is as follows:

1) preprocessing boundary points as candidate points by screening and clustering:

first, all boundary points P are selected_boundaryThe boundary point is the intersection of the known area and the unknown area of the non-obstacle. Then all boundary points P_boundaryThe sequence is filtered to eliminate false boundaries and extremely narrow boundaries caused by algorithm noise, and the false boundaries and the extremely narrow boundaries are large in drift or missing points of the map in the self-mapping process. The filtering method adopts the point diagram to be subjected to morphological processing, firstly expansion is carried out, then corrosion is carried out, and a more feasible boundary point set is obtained after smaller noise points are eliminated.

Secondly, in order to accelerate the calculation efficiency, all the boundary points are clustered, the clustering number N is a hyper-parameter and is set in advance according to the size of the map and the platform calculation capacity. After N clustering centers are obtained, K points are randomly selected from all the points belonging to the clustering centers according to the principle that the relative distance between the K points is as large as possible, namely the K points are distributed around the clustering centers as scattered as possible. And finally obtaining N × K boundary points as candidate boundary points.

2) Calculating the target heuristic of all candidate points:

the exploration submodule calculates the real-time latest map and the current position of the robot according to the mapping moduleSetting the posture and the preset final target point, and calculating the target heuristic factor of each candidate point position

3) Calculating the safety elicitors of all candidate points:

4) Selecting the best candidate point as an exploration point:

as shown in fig. 2, the hierarchical navigation obstacle avoidance method provided by the embodiment of the present disclosure designs two types of heuristic factors, and a selection mechanism of an exploration point is as follows: calculating the values of the two types of elicitors of all candidate points, normalizing the values by the following formula, and then adding the values:

Setting the coefficient gamma to 0.6, and selecting the candidate boundary point with the minimum heuristic value as the next exploration point P_exploration. Under the action of the two types of elicitation factors, the robot can select a more close target, a more easily reached area and a safer area to search for next step. As shown in fig. 4, the robot selects the right branch edge area as a search point in the left image test scene, and selects the left branch edge area as a search point in the right image test scene, i.e., selects a safer area as a next search point in both cases. The invention introduces a multi-factor heuristic exploration mechanism and combines the characteristics of a criticic network of a lower layer module, and the upper layer and the lower layer are tightly coupled to ensure that the whole system is safer and more reliable.

S506: and a path planning submodule of an upper layer module of the layered navigation obstacle avoidance system calculates a plurality of sparse midway points according to the selected optimal exploration points, and sequentially sends each midway point to a lower layer obstacle avoidance module to serve as a temporary target point, so that the navigation robot sequentially runs to each midway point.

After calculating a plurality of sparse way points according to the selected optimal exploration point, the layered navigation obstacle avoidance method provided by the embodiment of the disclosure further comprises the following steps: calculating sparse path points in sequence and detecting whether the exploration points need to be re-planned or not, wherein the specific process is as follows:

1) planning a shortest path based on the exploration point and the current point:

finding a search point P_explorationThen, as shown in fig. 3, it is necessary to determine whether the search point can be directly reached at this moment, that is, there is no obstacle on the straight line connecting the search point and the current point, and if the search point can be reached, the midway point P is directly set_waypointSet as exploration point P_exploration。

If the path is not a direct straight line, firstly, the shortest path from the current point to the searching point is drawn by adopting the A-x algorithm. The shortest path is a series of consecutive dense point sets.

2) Selecting a number of sparse waypoints based on the planned path point sequence:

processing the point set for planning shortest path, selecting the farthest point which can be reached directly from the current point as the intermediate point P_waypointAnd then sending the midway point to a lower-layer reinforcement learning obstacle avoidance network as a temporary target point, wherein the lower-layer network is responsible for driving the robot to run to the midway point and avoid the moving obstacle. Calculating the next midway point P after reaching the midway point_waypointUntil finally the exploration point is reached. And repeating the selection steps of the search points after the search points are reached.

3) Detecting whether an exploration point needs to be re-planned at any moment:

and in the process of sequentially driving towards the midway points one by one, whether the search points need to be re-planned or not needs to be detected, if the conditions are triggered, the search points are re-planned, and the search point selection step is repeated.

S507: determining whether a new optimal exploration point needs to be recalculated according to the real-time state information of the robot, the latest grid map monitoring data and preset conditions for triggering recalculation of the optimal exploration point; and if the new optimal searching point needs to be recalculated, repeating the steps from S501 to S506 until a plurality of sparse new intermediate points corresponding to the new optimal searching point are calculated, and sending each new intermediate point in sequence to the lower layer obstacle avoidance module to serve as a new temporary target point so that the navigation robot can drive to each new intermediate point in sequence.

As shown in fig. 5, the lower model of the lower obstacle avoidance module needs to calculate the safety heuristic factor V^hAnd sending the data to an exploration submodule of an upper module, and calculating a distance heuristic factor D by the exploration submodule^hAnd the exploration submodule selects a proper exploration point P _ exploration from the map edge candidate points as a next step target based on the two types of heuristic factors and the current latest map.

And the upper layer module selects a plurality of sparse midway points according to the exploration points, and finally sequentially transmits the selected midway points to the lower layer obstacle avoidance module. And the lower layer obstacle avoidance module takes the current midway point as a temporary target point in sequence, drives the current midway point to travel to the target and avoids the obstacle at the same time.

For the detailed description of the above steps, refer to the description of the same or similar parts, and are not repeated herein.

In the embodiment of the disclosure, the layered navigation obstacle avoidance method provided in the embodiment of the present application calculates a plurality of waypoints according to an optimal exploration point, sequentially sends the plurality of waypoints to a lower layer obstacle avoidance module as corresponding temporary target points, and sets the temporary target points as target points of a lower layer reinforcement learning obstacle avoidance model; and inputting the environmental data of the robot into the navigation obstacle avoidance model, outputting a navigation instruction, and sending the navigation instruction to the robot. Therefore, by adopting the embodiment of the application, the optimal exploration point can be accurately determined, the plurality of midway points are calculated according to the optimal exploration point, the plurality of midway points are sequentially sent to the lower layer obstacle avoidance module to serve as corresponding temporary target points, the temporary target points are set as target points of the lower layer reinforcement learning obstacle avoidance model, the environmental data of the robot is input into the navigation obstacle avoidance model, the navigation instruction is output, and the navigation instruction is sent to the robot, so that the robot can accurately navigate and avoid the obstacle.

As shown in fig. 6, an embodiment of the present disclosure provides a layered navigation obstacle avoidance method based on heuristic exploration, where the layered navigation obstacle avoidance method is applied to a robot, and the layered navigation obstacle avoidance method specifically includes the following method steps:

s601: receiving a plurality of intermediate points sequentially sent by the layered navigation obstacle avoidance system and taking the intermediate points as corresponding temporary target points;

s602: and sequentially driving to the corresponding midway points according to the temporary target points until the optimal exploration point is reached.

In the embodiment of the disclosure, a plurality of intermediate points sequentially sent by a layered navigation obstacle avoidance system are received and serve as corresponding temporary target points; and sequentially driving to the corresponding midway points according to the temporary target points until the optimal exploration point is reached. Therefore, by adopting the embodiment of the application, the hierarchical navigation obstacle avoidance system can accurately calculate a plurality of sparse midway points corresponding to the optimal search point after the optimal search point is selected. The method comprises the following steps that a plurality of sparse intermediate points which are received by a robot and sequentially sent by a layered navigation obstacle avoidance system are accurately calculated by the layered navigation obstacle avoidance system, the intermediate points are used as temporary target points, and the navigation robot sequentially drives to the intermediate points until the determined optimal exploration point is reached; therefore, the navigation obstacle avoidance method can enable the robot to perform autonomous navigation in the high-dynamic complex scene without the map.

The following is an embodiment of the layered navigation obstacle avoidance apparatus of the present invention, which can be used to implement the embodiment of the layered navigation obstacle avoidance method of the present invention. For details that are not disclosed in the embodiment of the device for layered navigation and obstacle avoidance of the present invention, please refer to the embodiment of the method for layered navigation and obstacle avoidance of the present invention.

Fig. 7 is a schematic structural diagram of a layered navigation obstacle avoidance apparatus according to an exemplary embodiment of the present invention. The layered navigation obstacle avoidance device is applied to a server and comprises a safety elicitation factor calculation module 701, a target elicitation factor calculation module 702, an optimal exploration point calculation module 703, a target point determination module 704, a processing module 705 and a sending module 706.

Specifically, the safety elicitation factor calculation module 701 is configured to calculate a safety elicitation factor of a candidate point through a value evaluation network of a lower-layer reinforcement learning obstacle avoidance model according to current state information of the robot and peripheral obstacle distribution information of an environment where the robot is located;

a target heuristic factor calculation module 702, configured to calculate a target heuristic factor of a candidate point according to a current map constructed in real time in an unfamiliar map-free environment, a preset target, and a current position of the robot;

an optimal exploration point calculation module 703, configured to calculate a candidate point with a minimum heuristic value according to the security heuristic factor calculated by the security heuristic factor calculation module 701 and the target heuristic factor calculated by the target heuristic factor calculation module 702, and use the candidate point with the minimum heuristic value as an optimal exploration point;

a target point determining module 704, configured to calculate a plurality of intermediate points according to the optimal exploration point determined by the optimal exploration point calculating module 703, sequentially send the plurality of intermediate points to the lower obstacle avoidance module as corresponding temporary target points, and set the temporary target points as target points of the lower reinforcement learning obstacle avoidance model;

the processing module 705 is used for inputting environmental data of the robot into the navigation obstacle avoidance model and outputting a navigation instruction;

a sending module 706, configured to send the navigation instruction obtained by the processing module 705 to the robot.

Optionally, the apparatus further comprises:

a building module (not shown in fig. 7) for building a value evaluation network and an action strategy network of the lower-layer reinforcement learning obstacle avoidance model;

and the training module is used for training a learning obstacle avoidance model for realizing stress type reinforcement learning obstacle avoidance of the robot based on the preset depth reinforcement learning model to obtain a lower-layer reinforcement learning obstacle avoidance model.

Optionally, the apparatus further comprises:

and a construction and calculation module (not shown in fig. 7) for constructing a grid map and calculating the current position of the robot in real time by an incremental mapping method in an unfamiliar map-free environment.

Optionally, the apparatus further comprises:

an acquisition module (not shown in fig. 7) for acquiring preset conditions for reselecting a new best exploration point;

the processing module 705 is configured to determine whether the current condition meets the preset condition obtained by the obtaining module, and reselect a new best exploration point when the current condition meets the preset condition.

Optionally, the preset conditions include:

in the exploration process of the robot, the entropy change of an exploration map is larger than a preset entropy change threshold; or detecting that the state of the current search point changes; or, detecting that the robot has reached the optimal exploration point; alternatively, the target point is detected as a known area.

Fig. 8 is a schematic structural diagram of a layered navigation obstacle avoidance apparatus according to an exemplary embodiment of the present invention. The layered navigation obstacle avoidance device is applied to a robot and comprises a receiving module 801 and a driving module 802.

Specifically, the receiving module 801 is configured to receive a plurality of waypoints sequentially sent by the hierarchical navigation obstacle avoidance system, and use the waypoints as corresponding temporary target points;

and a driving module 802, configured to sequentially drive to corresponding intermediate points according to the temporary target points determined by the receiving modules 801 until an optimal search point is reached.

It should be noted that, when the layered navigation obstacle avoidance apparatus provided in the foregoing embodiment executes the layered navigation obstacle avoidance method, only the division of the functional modules is taken as an example, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the layered navigation obstacle avoidance device and the layered navigation obstacle avoidance method provided by the above embodiments belong to the same concept, and the implementation process is detailed in the layered navigation obstacle avoidance method embodiment, which is not described herein again.

In the embodiment of the disclosure, the receiving module is configured to receive a plurality of waypoints sequentially sent by the hierarchical navigation obstacle avoidance system, and serve as corresponding temporary target points; and the driving module is used for sequentially driving to the corresponding midway points according to the temporary target points determined by the receiving modules until the optimal searching point is reached. Therefore, by adopting the embodiment of the application, the hierarchical navigation obstacle avoidance system can accurately calculate a plurality of sparse midway points corresponding to the optimal search point after the optimal search point is selected. The method comprises the following steps that a plurality of sparse intermediate points which are received by a robot and sequentially sent by a layered navigation obstacle avoidance system are accurately calculated by the layered navigation obstacle avoidance system, the intermediate points are used as temporary target points, and the navigation robot sequentially drives to the intermediate points until the determined optimal exploration point is reached; therefore, the navigation obstacle avoidance device can enable the robot to perform autonomous navigation in the high-dynamic complex scene without the map.

In one embodiment, a computer device is proposed, the computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: calculating a safety elicitor of a candidate point through a value evaluation network of a lower-layer reinforcement learning obstacle avoidance model according to current state information of the robot and peripheral obstacle distribution information of the environment where the robot is located; calculating a target heuristic factor of a candidate point according to a current map constructed in real time in an unfamiliar map-free environment, a preset target and the current position of the robot; calculating a candidate point with the minimum heuristic value according to the safety heuristic factor and the target heuristic factor, and taking the candidate point with the minimum heuristic value as an optimal exploration point; calculating a plurality of midway points according to the optimal exploration point, sequentially sending the midway points to a lower layer obstacle avoidance module to serve as corresponding temporary target points, and setting the temporary target points as target points of a lower layer reinforcement learning obstacle avoidance model; and inputting the environmental data of the robot into the navigation obstacle avoidance model, outputting a navigation instruction, and sending the navigation instruction to the robot.

In one embodiment, a storage medium is provided that stores computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of: calculating a safety elicitor of a candidate point through a value evaluation network of a lower-layer reinforcement learning obstacle avoidance model according to current state information of the robot and peripheral obstacle distribution information of the environment where the robot is located; calculating a target heuristic factor of a candidate point according to a current map constructed in real time in an unfamiliar map-free environment, a preset target and the current position of the robot; calculating a candidate point with the minimum heuristic value according to the safety heuristic factor and the target heuristic factor, and taking the candidate point with the minimum heuristic value as an optimal exploration point; calculating a plurality of midway points according to the optimal exploration point, sequentially sending the midway points to a lower layer obstacle avoidance module to serve as corresponding temporary target points, and setting the temporary target points as target points of a lower layer reinforcement learning obstacle avoidance model; and inputting the environmental data of the robot into the navigation obstacle avoidance model, outputting a navigation instruction, and sending the navigation instruction to the robot.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A layered navigation obstacle avoidance system based on heuristic exploration is characterized in that,

the layered navigation obstacle avoidance system comprises an upper layer module and a lower layer obstacle avoidance module, wherein the upper layer module and the lower layer obstacle avoidance module respectively operate in different processes, the upper layer module is used for being responsible for exploration and planning in an unfamiliar map-free environment, the lower layer obstacle avoidance module is used for achieving local navigation obstacle avoidance, a close-coupled incidence relation is formed between the upper layer module and the lower layer obstacle avoidance module, the upper layer module provides a temporary target point for the lower layer obstacle avoidance module, and the lower layer obstacle avoidance module provides a safety elicitation factor reference for the upper layer module to plan a path.

2. A method for performing layered navigation obstacle avoidance by using the layered navigation obstacle avoidance system of claim 1, the method comprising:

3. The method of claim 2, further comprising:

constructing a value evaluation network and an action strategy network of the lower-layer reinforcement learning obstacle avoidance model;

4. The method of claim 2, further comprising:

and constructing a grid map in real time and calculating the current position of the robot in an unfamiliar map-free environment by an incremental map construction method.

5. The method of claim 2, further comprising:

6. The method according to claim 5, wherein the preset conditions include:

in the exploration process of the robot, the entropy change of an exploration map is larger than a preset entropy change threshold; alternatively, the first and second electrodes may be,

detecting the state change of the current exploration point; alternatively, the first and second electrodes may be,

detecting that the robot has reached the optimal exploration point; alternatively, the first and second electrodes may be,

the target point is detected as a known area.

7. A method for performing layered navigation obstacle avoidance by using the layered navigation obstacle avoidance system of claim 1, the method comprising:

8. A device for performing layered navigation obstacle avoidance by using the layered navigation obstacle avoidance system of claim 1, the device comprising:

9. A device for performing layered navigation obstacle avoidance by using the layered navigation obstacle avoidance system of claim 1, the device comprising:

10. A computer device comprising a processor and a memory, wherein at least one computer program is stored in the memory, and wherein the at least one computer program is loaded and executed by the processor to implement the method for hierarchical navigation and obstacle avoidance according to any one of claims 2-6 or the method for hierarchical navigation and obstacle avoidance according to claim 7.

11. A computer-readable storage medium, wherein at least one computer program is stored in the computer-readable storage medium, and is loaded and executed by a processor to implement the method for performing hierarchical navigation and obstacle avoidance according to any one of claims 2 to 6, or the method for performing hierarchical navigation and obstacle avoidance according to claim 7.