Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Alternative embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the embodiment of the present disclosure provides a layered navigation obstacle avoidance method, which is applied to a server, and a system used in the layered navigation obstacle avoidance method is a layered navigation obstacle avoidance system based on heuristic exploration, the layered navigation obstacle avoidance system includes an upper module and a lower obstacle avoidance module, the upper module and the lower obstacle avoidance module operate in different processes respectively, the upper module is used for being responsible for exploration and planning in a map-free strange environment, the lower obstacle avoidance module is used for implementing local navigation obstacle avoidance, and the upper module and the lower obstacle avoidance module have a close-coupled association relationship therebetween, the upper module provides a temporary target point for the lower obstacle avoidance module, and the lower obstacle avoidance module provides a safety heuristic factor reference for the upper module to plan a path.
The layered navigation obstacle avoidance method provided by the embodiment of the disclosure specifically comprises the following steps:
s101: calculating a safety elicitor of a candidate point through a value evaluation network of a lower-layer reinforcement learning obstacle avoidance model according to current state information of the robot and peripheral obstacle distribution information of the environment where the robot is located; and the candidate points are all candidate points on the currently constructed map.
S102: calculating a target heuristic factor of a candidate point according to a current map constructed in real time in an unfamiliar map-free environment, a preset target and the current position of the robot; and the candidate points are all candidate points on the currently constructed map.
S103: calculating a candidate point with the minimum heuristic value according to the safety heuristic factor and the target heuristic factor, and taking the candidate point with the minimum heuristic value as an optimal exploration point;
s104: calculating a plurality of midway points according to the optimal exploration point, sequentially sending the midway points to a lower layer obstacle avoidance module to serve as corresponding temporary target points, and setting the temporary target points as target points of a lower layer reinforcement learning obstacle avoidance model;
s105: and inputting the environmental data of the robot into the navigation obstacle avoidance model, outputting a navigation instruction, and sending the navigation instruction to the robot.
In a possible implementation manner, the method for layered navigation obstacle avoidance provided by the embodiment of the present disclosure further includes the following steps:
constructing a value evaluation network and an action strategy network of a lower-layer reinforcement learning obstacle avoidance model;
and training a learning obstacle avoidance model for realizing stress type reinforcement learning obstacle avoidance of the robot based on a preset depth reinforcement learning model to obtain a lower-layer reinforcement learning obstacle avoidance model.
In a possible implementation manner, the method for layered navigation obstacle avoidance provided by the embodiment of the present disclosure further includes the following steps:
and building a grid map and calculating the current position of the robot by incremental mapping in the unfamiliar map-free environment.
In a possible implementation manner, the method for layered navigation obstacle avoidance provided by the embodiment of the present disclosure further includes the following steps:
and acquiring a preset condition for reselecting a new optimal exploration point, judging whether the current condition meets the preset condition, and reselecting the new optimal exploration point when judging that the current condition meets the preset condition.
In the embodiment of the present application, the preset conditions include: in the exploration process of the robot, the entropy change of an exploration map is larger than a preset entropy change threshold; or detecting that the state of the current search point changes; or, detecting that the robot has reached the optimal exploration point; alternatively, the target point is detected as a known area.
As shown in fig. 2, a schematic diagram of a layered navigation obstacle avoidance system adopted by the layered navigation obstacle avoidance method in a specific application scenario provided by the embodiment of the present disclosure is shown.
In order to solve the navigation problem under the unknown high-dynamic complex environment, the layered navigation obstacle avoidance system comprises an upper layer and a lower layer, as shown in fig. 2, wherein an upper layer module needs to be responsible for synchronously exploring the unknown environment and realizing path planning, and a lower layer obstacle avoidance module mainly realizes short-distance navigation obstacle avoidance and ensures that the robot safely runs from the current point to a temporary target.
As shown in fig. 2, the upper layer module includes three parts, namely a SLAM-based mapping submodule (SLAM-based mapping), a heuristic exploration submodule (heuristic exploration) and a waypoint planning submodule (waypoint planning) which operate at 30Hz, 0.2Hz and 1Hz respectively. The lower layer obstacle avoidance module is a separately trained reinforcement learning obstacle avoidance module and operates at 30 Hz.
In the layered navigation obstacle avoidance system shown in fig. 2, a stress type obstacle avoidance model based on reinforcement learning is trained separately for the middle and lower layers:
designing reinforcement learning elements: firstly, a compact state space is designed as a state input, and an observation vector of the robot is recorded as:
wherein the robot observes the surrounding environment as a vector O through the mounted 2D laser radar
LAnd the projection vector of the target point relative to the robot under the carrier coordinate system is O
gAnd simultaneously, the laser observation data recorded at the first n moments
The relative motion vector is calculated:
the relative motion vector O
mThe self-movement of the robot and the movement of the external obstacles are integrated. And finally, the self motion vector of the robot in a world coordinate system is marked as O
v. Then to apply the invention to more robotic platforms, the motion vectors are designed to be: [ v ] of
x,v
y,w
z]The motion speed in the x-axis direction, the motion speed in the y-axis direction and the rotation angular speed of the robot are represented respectively.
The robot is trained to realize short-distance obstacle avoidance navigation in a simulator based on a reinforcement learning parallel PPO algorithm, as shown in FIG. 2, a DPPO continuous control deep reinforcement learning algorithm is adopted to set different scenes in a Stage simulator to train the trolley to learn short-distance obstacle avoidance navigation, a reward function is calculated according to the distance between the current position and the target position and whether collision occurs, and the reward function is used for guiding and updating an action strategy network and a value evaluation network in the reinforcement learning algorithm. In order to accelerate the training process, a parallel training mode is adopted to synchronously train in multiple scenes, as shown in fig. 3, the multiple scenes include an open scene, a static obstacle scene, a dynamic obstacle scene, and the like.
In the layered navigation obstacle avoidance system shown in fig. 2, the map building sub-module of the upper module is used for building an environment map in real time, and the process of building the environment map in real time is specifically as follows:
and the upper layer runs a mapping submodule in real time, and a Cartogrhper algorithm running a laser SLAM in the mapping submodule maps and updates the environment in real time to obtain a two-dimensional occupation grid map (occupancy grid map). ) Defining boundary point as the boundary point between known idle point and unknown state point according to the latest network map, and recording as Pboundary。
The mapping submodule needs to be responsible for incremental mapping and positioning, namely, the robot only knows the position coordinates and the target coordinates of the robot at the beginning of completely unknown environment, and no information is known about the environment. After continuous exploration, the mapping submodule continuously maps a new area, and accurately estimates the position P, the attitude Pose and the real-time speed V of the robot synchronously through a Cartogrhper algorithm, angular speed and acceleration data of an inertial device IMU and wheel type odometer information.
Fig. 4 is a schematic diagram illustrating point selection based on a heuristic exploration mechanism in a specific application scenario provided by the embodiment of the present disclosure.
As shown in fig. 4, in both scenarios of fig. 4, the robot is more inclined to select a safer exploration point, namely: dots within the black box.
In the hierarchical navigation method provided in the embodiments disclosed in the present application, a multi-factor heuristic exploration mechanism is adopted, which is specifically as follows:
4.1 the exploration submodule calculates the distance heuristic Dh
Firstly, the exploration submodule extracts edge points of a known area and an unknown area in the map according to the latest map obtained by the mapping submodule, and the edge points P are filtered and clustered by a series of edge pointsboundaryAs candidate points.
Secondly, the exploration submodule calculates a real-time latest map, a current robot position posture and a preset final target point according to the mapping module, and calculates a target heuristic factor of each candidate point position
Representing the euclidean distance from the candidate point to the target point,
representing the path length from the current point to the candidate point, which measures the distance between the candidate point and the target and how easily it is to reach the point,
the calculation is realized by adopting an A-path planning algorithm, because the current point to the candidate points are all known on the map, a feasible path can be obtained by the A-path planning method, and the length of the feasible path is the length of the feasible path
4.2 lower-layer reinforcement learning obstacle avoidance module calculates safety heuristic factor Vh
The lower-layer reinforcement learning obstacle avoidance module calculates a safety elicitation factor V corresponding to each candidate pointi hThe calculation method is as follows: safety level heuristic Vi hThe candidate point is set as a target point O through the calculation of a value evaluation network (criticc model) of a reinforcement learning obstacle avoidance model of a lower layer modulegAnd at the present time observingLCalculating the output value of critic network and V of the candidate pointi h。Vi hThe safety degree of the candidate point is measured, and V is the denser the obstacles between the robot and the candidate point arei hThe smaller the value, the more dangerous the candidate point is represented.
4.3 the exploration submodule selects exploration points from the candidate points according to two types of factors: as shown in fig. 2, the hierarchical navigation obstacle avoidance method provided by the embodiment of the present disclosure designs two types of heuristic factors, and a selection mechanism of an exploration point is as follows: calculating the values of the two types of elicitors of all candidate points, normalizing the values by the following formula, and then adding the values:
the coefficient γ represents a weighting coefficient between two terms for balancing the contribution of the two classes of factors, the denominator of which is to normalize it. The calculation formula shows that the heuristic function is calculated by two heuristic factors, and the algorithm finally selects the boundary point with the minimum heuristic function value as the exploration point. And selecting the most suitable boundary point as the next exploration point P according to the heuristic functionexploration
Under the action of the two types of elicitation factors, the robot can select a more close target, a more easily reached area and a safer area to search for next step. As shown in fig. 4, the robot selects the right branch edge area as a search point in the left image test scene, and selects the left branch edge area as a search point in the right image test scene, i.e., selects a safer area as a next search point in both cases. The layered navigation obstacle avoidance method provided by the embodiment of the disclosure introduces a multi-factor heuristic exploration mechanism and combines the characteristics of a criticic network of a lower layer module, and the upper layer and the lower layer are tightly coupled to ensure that the whole system is safer and more reliable.
The waypoint planning submodule plans the waypoints and sends the waypoints to the lower layer obstacle avoidance model to execute, and the method is specifically as follows:
at the selection of a suitable exploration point PexplorationAnd the intermediate point planning module plans a shortest path from the current position to the search point. The path is composed of a series of continuous dense path point sequences, and in order to reduce the communication burden of the upper layer and the lower layer, a plurality of intermediate points P are selected from the path point sequenceswaypointAnd connecting the midway point PwaypointSending to the lower module as a temporary target point Og. The criterion for selecting a waypoint from the planned path sequence is that the farthest point that the robot can reach straight is the waypoint PwaypointI.e. representing that no large obstacle blocks the path from the midway point to the current position. As shown in fig. 3: due to the existence of the boundary of the enclosing wall, the robot cannot directly reach the search point from the current point, and the midway point is selected to be at the edge position of the obstacle.
According to the standard, the intermediate points are sequentially selected in the path point sequence and are sent to the lower layer module one by one to serve as temporary targets OgAnd guiding the robot to travel to a search point, repeating the calculation steps when the robot reaches the intermediate points in sequence and finally reaches the search point, calculating a new position and the search point under the map, calculating the intermediate point based on the search point and delivering the intermediate point to the lower layer obstacle avoidance model for execution.
And (3) replanning exploration points: when the robot drives to the exploration point step by step, the map is continuously updated, in order to improve the navigation exploration efficiency, whether the exploration point needs to be replanned or not needs to be continuously detected, and the standard of replanning is as follows:
(1) the map is greatly changed in the exploration process, and the information quantity of the map is measured by adopting an entropy value
p(mi,j) The probability of occupation of the ith row and the jth column in the grid map is, when the entropy change of the map exceeds a certain threshold value, the map is changed a lot, and at this time, the old search point is not necessarily the optimal selection, and the search point should be planned again.
(2) The state of the exploration point changes, i.e. the old exploration point may be occupied by a dynamic obstacle for a long time, and the exploration point should be re-planned at this time.
(3) Robot reaches exploration point PexplorationAt this time, the next exploration point should be planned.
(4) Target point PgoalAppear in known areas of the grid map. At the moment, the robot can directly drive to the target point without searching points.
As shown in fig. 5, a schematic flow chart of a layered navigation obstacle avoidance method in a specific application scenario provided by the embodiment of the present disclosure is shown.
As shown in fig. 5, in a specific application scenario, the layered navigation obstacle avoidance method based on heuristic exploration is applied to a server, and includes the following steps:
s501: and training a learning obstacle avoidance model for realizing stress type reinforcement learning obstacle avoidance of the robot based on the model constructed by the deep reinforcement learning PPO algorithm to obtain a lower-layer reinforcement learning obstacle avoidance model.
In the embodiment of the application, an obstacle avoidance model is trained on a simulator based on a reinforcement learning method, and the specific training process is as follows:
1) designing a plurality of training scenes on a stage simulator:
firstly, a trolley model and an obstacle model are established in a Stage simulator, a plurality of training and testing scenes are designed, and then physical attributes of the trolley, such as mass, speed, acceleration, rigidity and the like, are set according to the configuration of a real robot. And secondly, configuring a 2d laser radar for the trolley in the simulator, and simulating parameters of the radar, such as measurement accuracy, angular resolution, measurement range, scanning frequency and the like according to the civil-level laser radar performance indexes of the real world. And finally, setting a plurality of static obstacle scenes and dynamic obstacles, wherein the size and the shape of the obstacles are different, the maximum moving speed of the obstacles in the environment is at least two times of the maximum speed of the trolley, the obstacles randomly walk in the environment, and the walking speed is randomly sampled in a speed range.
2) Building a PPO-based multi-process reinforcement learning training framework, which comprises a critic network and an actor network:
firstly, building a deep reinforcement learning model, which is realized by adopting a PPO algorithm and is divided into an operator model and a critic model, as shown in FIG. 2, the operator and the critic model of the reinforcement learning algorithm are both built by a deep neural network, a buffer memory history tuple is built, the critic model outputs a value, an advantaged value is calculated, and the operator model calculates action output.
3) Training a reinforcement learning algorithm to realize short-distance obstacle avoidance navigation:
first, four types of reward functions Rg, Rc, Rt, Rr are set, where: targeted reward RgThe closer each step is to the target, the larger the reward is, the calculation method is as follows:
wg·(dt-dt-1) Wg is a positive weighting coefficient set in advance, representing the difference between the Euclidean distance between the time t and the target compared with the time t-1.
Collision penalty RcIn a simulation process, when the trolley collides with an obstacle, a certain punishment is given, and the calculation is as follows: w is ac·15,WcIs a preset weighting coefficient less than 0.
Reach reward RrIn the simulation process, the vehicle is rewarded when reaching the terminal point, and the calculation is as follows: w is ar20, Wr is also a preset positive weight coefficient.
Temporal penalty Rt: on behalf of each simulation, a certain reward or penalty is given as long as the car is still alive, calculated as follows: w is at0.01, Wt is also the weight coefficient less than 0 that achieves the preset. This time penalty drives the robot to quickly reach the endpoint.
The reward function R (t) Rg + Rc + Rt + Rr in each simulation process
Secondly, multi-process training is realized in a simulator, N robots are arranged in a training scene, and the observation states of all the robots are collected in each simulation step length
And R (t), and then the data is sent to a buffer memory in a reinforcement learning framework for storage. And calculating N action outputs A (t) by a PPO algorithm in a reinforcement learning frame according to the N observation states, returning the N action outputs A (t) to the N robot trolleys, executing respective actions by each trolley, collecting respective O (t +1) and R (t +1) at a new moment, and sequentially circulating. And when the sample collected by the buffer memory in the reinforcement learning framework exceeds a certain threshold value, updating the operator and critic network parameters. In addition, when a certain trolley collides in the simulator or reaches the maximum time step length, the simulator sends a DONE instruction to the reinforcement learning frame to represent that the simulation is finished, the reinforcement learning frame correspondingly sends a RESET instruction to restart the trolley, the whole parallel training environment is carried out in a synchronous mode, and the scenes at least comprise three types of static scenes, dynamic scenes and open scenes.
And finally, training a flexible short-distance obstacle avoidance strategy by a reinforcement learning algorithm under the guidance of the four types of reward functions through multiple rounds of training.
S502: and constructing a value evaluation network of a lower-layer reinforcement learning obstacle avoidance model, and calculating a safety elicitation factor through the value evaluation network according to the current state information of the robot and the peripheral obstacle distribution information of the environment where the robot is located.
S503: and a map building sub-module of an upper module of the hierarchical navigation obstacle avoidance system builds a grid map and calculates the current position of the robot in real time based on the environment of the robot through a preset mode based on laser 2 d-slam.
In the layered navigation obstacle avoidance method provided in the embodiment of the present application, the map building sub-module implements the implementation and construction process of the environment map as follows:
1) firstly, initializing a circular blank map by taking the position of the starting point of the robot as the center of a circle and taking the length of 1.3 times of the distance from the robot to a target as a radius according to the relative distance and the azimuth of the target point and the starting point of the robot, and then positioning the target point in the map close to the edge area. The states of all points in the map are unknown, and at the moment, the robot only knows the coordinates of the target position and the coordinates of the self position. And (3) operating a Cartograher algorithm of the laser SLAM in the mapping submodule to map the environment in real time and update the environment in real time to obtain a two-dimensional occupation grid map (occupancy grid map). Wherein, each pixel point value in the network map is set to be between 0 and 1, and represents the occupation probability of the network area, which is as follows: the occupancy of this point by an obstacle is set to be greater than 0.52, the ascertained known idle points are set to be less than 0.48, and the unknown state points are set between 0.48 and 0.52. Defining boundary point as the boundary point between known idle point and unknown state point according to the latest network map, and recording as Pboundary。
2) The mapping submodule needs to be responsible for incremental mapping and positioning, namely, the robot only knows the position coordinates and the target coordinates of the robot at the beginning of completely unknown environment, and the robot does not know the environment information, namely, a real environment map is continuously constructed on the initialized blank map. After continuous exploration, the mapping submodule continuously maps a new area, and accurately estimates the position P, the attitude Pose and the real-time speed V of the robot synchronously through a Cartogrhper algorithm, angular speed and acceleration data of an inertial device IMU and wheel type odometer information.
S504: and an exploration submodule of an upper layer module of the layered navigation obstacle avoidance system calculates a target heuristic factor according to a preset target and the current position of the robot.
S505: and the exploration submodule of the navigation system upper layer module selects an optimal exploration point according to the target heuristic factor, the safety heuristic factor and the current latest map.
In the embodiment of the present application, a specific process for selecting the best candidate point as the exploration point based on the multi-factor heuristic exploration mechanism is as follows:
1) preprocessing boundary points as candidate points by screening and clustering:
first, all boundary points P are selectedboundaryThe boundary point is the intersection of the known area and the unknown area of the non-obstacle. Then all boundary points PboundaryThe sequence is filtered to eliminate false boundaries and extremely narrow boundaries caused by algorithm noise, and the false boundaries and the extremely narrow boundaries are large in drift or missing points of the map in the self-mapping process. The filtering method adopts the point diagram to be subjected to morphological processing, firstly expansion is carried out, then corrosion is carried out, and a more feasible boundary point set is obtained after smaller noise points are eliminated.
Secondly, in order to accelerate the calculation efficiency, all the boundary points are clustered, the clustering number N is a hyper-parameter and is set in advance according to the size of the map and the platform calculation capacity. After N clustering centers are obtained, K points are randomly selected from all the points belonging to the clustering centers according to the principle that the relative distance between the K points is as large as possible, namely the K points are distributed around the clustering centers as scattered as possible. And finally obtaining N × K boundary points as candidate boundary points.
2) Calculating the target heuristic of all candidate points:
the exploration submodule calculates the real-time latest map and the current position of the robot according to the mapping moduleSetting the posture and the preset final target point, and calculating the target heuristic factor of each candidate point position
Representing the euclidean distance from the candidate point to the target point,
representing the path length from the current point to the candidate point, which measures the distance between the candidate point and the target and how easily it is to reach the point,
the calculation is realized by adopting an A-path planning algorithm, because the current point to the candidate points are all known on the map, a feasible path can be obtained by the A-path planning method, and the length of the feasible path is the length of the feasible path
3) Calculating the safety elicitors of all candidate points:
the lower-layer reinforcement learning obstacle avoidance module calculates a safety elicitation factor V corresponding to each candidate pointi hThe calculation method is as follows: safety level heuristic Vi hThe candidate point is set as a target point O through the calculation of a value evaluation network (criticc model) of a reinforcement learning obstacle avoidance model of a lower layer modulegAnd at the present time observingLCalculating the output value of critic network and V of the candidate pointi h。Vi hThe safety degree of the candidate point is measured, and V is the denser the obstacles between the robot and the candidate point arei hThe smaller the value, the more dangerous the candidate point is represented.
4) Selecting the best candidate point as an exploration point:
as shown in fig. 2, the hierarchical navigation obstacle avoidance method provided by the embodiment of the present disclosure designs two types of heuristic factors, and a selection mechanism of an exploration point is as follows: calculating the values of the two types of elicitors of all candidate points, normalizing the values by the following formula, and then adding the values:
the coefficient γ represents a weighting coefficient between two terms for balancing the contribution of the two classes of factors, the denominator of which is to normalize it. The calculation formula shows that the heuristic function is calculated by two heuristic factors, and the algorithm finally selects the boundary point with the minimum heuristic function value as the exploration point. And selecting the most suitable boundary point as the next exploration point P according to the heuristic functionexploration
Setting the coefficient gamma to 0.6, and selecting the candidate boundary point with the minimum heuristic value as the next exploration point Pexploration. Under the action of the two types of elicitation factors, the robot can select a more close target, a more easily reached area and a safer area to search for next step. As shown in fig. 4, the robot selects the right branch edge area as a search point in the left image test scene, and selects the left branch edge area as a search point in the right image test scene, i.e., selects a safer area as a next search point in both cases. The invention introduces a multi-factor heuristic exploration mechanism and combines the characteristics of a criticic network of a lower layer module, and the upper layer and the lower layer are tightly coupled to ensure that the whole system is safer and more reliable.
S506: and a path planning submodule of an upper layer module of the layered navigation obstacle avoidance system calculates a plurality of sparse midway points according to the selected optimal exploration points, and sequentially sends each midway point to a lower layer obstacle avoidance module to serve as a temporary target point, so that the navigation robot sequentially runs to each midway point.
After calculating a plurality of sparse way points according to the selected optimal exploration point, the layered navigation obstacle avoidance method provided by the embodiment of the disclosure further comprises the following steps: calculating sparse path points in sequence and detecting whether the exploration points need to be re-planned or not, wherein the specific process is as follows:
1) planning a shortest path based on the exploration point and the current point:
finding a search point PexplorationThen, as shown in fig. 3, it is necessary to determine whether the search point can be directly reached at this moment, that is, there is no obstacle on the straight line connecting the search point and the current point, and if the search point can be reached, the midway point P is directly setwaypointSet as exploration point Pexploration。
If the path is not a direct straight line, firstly, the shortest path from the current point to the searching point is drawn by adopting the A-x algorithm. The shortest path is a series of consecutive dense point sets.
2) Selecting a number of sparse waypoints based on the planned path point sequence:
processing the point set for planning shortest path, selecting the farthest point which can be reached directly from the current point as the intermediate point PwaypointAnd then sending the midway point to a lower-layer reinforcement learning obstacle avoidance network as a temporary target point, wherein the lower-layer network is responsible for driving the robot to run to the midway point and avoid the moving obstacle. Calculating the next midway point P after reaching the midway pointwaypointUntil finally the exploration point is reached. And repeating the selection steps of the search points after the search points are reached.
3) Detecting whether an exploration point needs to be re-planned at any moment:
and in the process of sequentially driving towards the midway points one by one, whether the search points need to be re-planned or not needs to be detected, if the conditions are triggered, the search points are re-planned, and the search point selection step is repeated.
S507: determining whether a new optimal exploration point needs to be recalculated according to the real-time state information of the robot, the latest grid map monitoring data and preset conditions for triggering recalculation of the optimal exploration point; and if the new optimal searching point needs to be recalculated, repeating the steps from S501 to S506 until a plurality of sparse new intermediate points corresponding to the new optimal searching point are calculated, and sending each new intermediate point in sequence to the lower layer obstacle avoidance module to serve as a new temporary target point so that the navigation robot can drive to each new intermediate point in sequence.
As shown in fig. 5, the lower model of the lower obstacle avoidance module needs to calculate the safety heuristic factor VhAnd sending the data to an exploration submodule of an upper module, and calculating a distance heuristic factor D by the exploration submodulehAnd the exploration submodule selects a proper exploration point P _ exploration from the map edge candidate points as a next step target based on the two types of heuristic factors and the current latest map.
And the upper layer module selects a plurality of sparse midway points according to the exploration points, and finally sequentially transmits the selected midway points to the lower layer obstacle avoidance module. And the lower layer obstacle avoidance module takes the current midway point as a temporary target point in sequence, drives the current midway point to travel to the target and avoids the obstacle at the same time.
For the detailed description of the above steps, refer to the description of the same or similar parts, and are not repeated herein.
In the embodiment of the disclosure, the layered navigation obstacle avoidance method provided in the embodiment of the present application calculates a plurality of waypoints according to an optimal exploration point, sequentially sends the plurality of waypoints to a lower layer obstacle avoidance module as corresponding temporary target points, and sets the temporary target points as target points of a lower layer reinforcement learning obstacle avoidance model; and inputting the environmental data of the robot into the navigation obstacle avoidance model, outputting a navigation instruction, and sending the navigation instruction to the robot. Therefore, by adopting the embodiment of the application, the optimal exploration point can be accurately determined, the plurality of midway points are calculated according to the optimal exploration point, the plurality of midway points are sequentially sent to the lower layer obstacle avoidance module to serve as corresponding temporary target points, the temporary target points are set as target points of the lower layer reinforcement learning obstacle avoidance model, the environmental data of the robot is input into the navigation obstacle avoidance model, the navigation instruction is output, and the navigation instruction is sent to the robot, so that the robot can accurately navigate and avoid the obstacle.
As shown in fig. 6, an embodiment of the present disclosure provides a layered navigation obstacle avoidance method based on heuristic exploration, where the layered navigation obstacle avoidance method is applied to a robot, and the layered navigation obstacle avoidance method specifically includes the following method steps:
s601: receiving a plurality of intermediate points sequentially sent by the layered navigation obstacle avoidance system and taking the intermediate points as corresponding temporary target points;
s602: and sequentially driving to the corresponding midway points according to the temporary target points until the optimal exploration point is reached.
In the embodiment of the disclosure, a plurality of intermediate points sequentially sent by a layered navigation obstacle avoidance system are received and serve as corresponding temporary target points; and sequentially driving to the corresponding midway points according to the temporary target points until the optimal exploration point is reached. Therefore, by adopting the embodiment of the application, the hierarchical navigation obstacle avoidance system can accurately calculate a plurality of sparse midway points corresponding to the optimal search point after the optimal search point is selected. The method comprises the following steps that a plurality of sparse intermediate points which are received by a robot and sequentially sent by a layered navigation obstacle avoidance system are accurately calculated by the layered navigation obstacle avoidance system, the intermediate points are used as temporary target points, and the navigation robot sequentially drives to the intermediate points until the determined optimal exploration point is reached; therefore, the navigation obstacle avoidance method can enable the robot to perform autonomous navigation in the high-dynamic complex scene without the map.
The following is an embodiment of the layered navigation obstacle avoidance apparatus of the present invention, which can be used to implement the embodiment of the layered navigation obstacle avoidance method of the present invention. For details that are not disclosed in the embodiment of the device for layered navigation and obstacle avoidance of the present invention, please refer to the embodiment of the method for layered navigation and obstacle avoidance of the present invention.
Fig. 7 is a schematic structural diagram of a layered navigation obstacle avoidance apparatus according to an exemplary embodiment of the present invention. The layered navigation obstacle avoidance device is applied to a server and comprises a safety elicitation factor calculation module 701, a target elicitation factor calculation module 702, an optimal exploration point calculation module 703, a target point determination module 704, a processing module 705 and a sending module 706.
Specifically, the safety elicitation factor calculation module 701 is configured to calculate a safety elicitation factor of a candidate point through a value evaluation network of a lower-layer reinforcement learning obstacle avoidance model according to current state information of the robot and peripheral obstacle distribution information of an environment where the robot is located;
a target heuristic factor calculation module 702, configured to calculate a target heuristic factor of a candidate point according to a current map constructed in real time in an unfamiliar map-free environment, a preset target, and a current position of the robot;
an optimal exploration point calculation module 703, configured to calculate a candidate point with a minimum heuristic value according to the security heuristic factor calculated by the security heuristic factor calculation module 701 and the target heuristic factor calculated by the target heuristic factor calculation module 702, and use the candidate point with the minimum heuristic value as an optimal exploration point;
a target point determining module 704, configured to calculate a plurality of intermediate points according to the optimal exploration point determined by the optimal exploration point calculating module 703, sequentially send the plurality of intermediate points to the lower obstacle avoidance module as corresponding temporary target points, and set the temporary target points as target points of the lower reinforcement learning obstacle avoidance model;
the processing module 705 is used for inputting environmental data of the robot into the navigation obstacle avoidance model and outputting a navigation instruction;
a sending module 706, configured to send the navigation instruction obtained by the processing module 705 to the robot.
Optionally, the apparatus further comprises:
a building module (not shown in fig. 7) for building a value evaluation network and an action strategy network of the lower-layer reinforcement learning obstacle avoidance model;
and the training module is used for training a learning obstacle avoidance model for realizing stress type reinforcement learning obstacle avoidance of the robot based on the preset depth reinforcement learning model to obtain a lower-layer reinforcement learning obstacle avoidance model.
Optionally, the apparatus further comprises:
and a construction and calculation module (not shown in fig. 7) for constructing a grid map and calculating the current position of the robot in real time by an incremental mapping method in an unfamiliar map-free environment.
Optionally, the apparatus further comprises:
an acquisition module (not shown in fig. 7) for acquiring preset conditions for reselecting a new best exploration point;
the processing module 705 is configured to determine whether the current condition meets the preset condition obtained by the obtaining module, and reselect a new best exploration point when the current condition meets the preset condition.
Optionally, the preset conditions include:
in the exploration process of the robot, the entropy change of an exploration map is larger than a preset entropy change threshold; or detecting that the state of the current search point changes; or, detecting that the robot has reached the optimal exploration point; alternatively, the target point is detected as a known area.
Fig. 8 is a schematic structural diagram of a layered navigation obstacle avoidance apparatus according to an exemplary embodiment of the present invention. The layered navigation obstacle avoidance device is applied to a robot and comprises a receiving module 801 and a driving module 802.
Specifically, the receiving module 801 is configured to receive a plurality of waypoints sequentially sent by the hierarchical navigation obstacle avoidance system, and use the waypoints as corresponding temporary target points;
and a driving module 802, configured to sequentially drive to corresponding intermediate points according to the temporary target points determined by the receiving modules 801 until an optimal search point is reached.
It should be noted that, when the layered navigation obstacle avoidance apparatus provided in the foregoing embodiment executes the layered navigation obstacle avoidance method, only the division of the functional modules is taken as an example, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the layered navigation obstacle avoidance device and the layered navigation obstacle avoidance method provided by the above embodiments belong to the same concept, and the implementation process is detailed in the layered navigation obstacle avoidance method embodiment, which is not described herein again.
In the embodiment of the disclosure, the receiving module is configured to receive a plurality of waypoints sequentially sent by the hierarchical navigation obstacle avoidance system, and serve as corresponding temporary target points; and the driving module is used for sequentially driving to the corresponding midway points according to the temporary target points determined by the receiving modules until the optimal searching point is reached. Therefore, by adopting the embodiment of the application, the hierarchical navigation obstacle avoidance system can accurately calculate a plurality of sparse midway points corresponding to the optimal search point after the optimal search point is selected. The method comprises the following steps that a plurality of sparse intermediate points which are received by a robot and sequentially sent by a layered navigation obstacle avoidance system are accurately calculated by the layered navigation obstacle avoidance system, the intermediate points are used as temporary target points, and the navigation robot sequentially drives to the intermediate points until the determined optimal exploration point is reached; therefore, the navigation obstacle avoidance device can enable the robot to perform autonomous navigation in the high-dynamic complex scene without the map.
In one embodiment, a computer device is proposed, the computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: calculating a safety elicitor of a candidate point through a value evaluation network of a lower-layer reinforcement learning obstacle avoidance model according to current state information of the robot and peripheral obstacle distribution information of the environment where the robot is located; calculating a target heuristic factor of a candidate point according to a current map constructed in real time in an unfamiliar map-free environment, a preset target and the current position of the robot; calculating a candidate point with the minimum heuristic value according to the safety heuristic factor and the target heuristic factor, and taking the candidate point with the minimum heuristic value as an optimal exploration point; calculating a plurality of midway points according to the optimal exploration point, sequentially sending the midway points to a lower layer obstacle avoidance module to serve as corresponding temporary target points, and setting the temporary target points as target points of a lower layer reinforcement learning obstacle avoidance model; and inputting the environmental data of the robot into the navigation obstacle avoidance model, outputting a navigation instruction, and sending the navigation instruction to the robot.
In one embodiment, a storage medium is provided that stores computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of: calculating a safety elicitor of a candidate point through a value evaluation network of a lower-layer reinforcement learning obstacle avoidance model according to current state information of the robot and peripheral obstacle distribution information of the environment where the robot is located; calculating a target heuristic factor of a candidate point according to a current map constructed in real time in an unfamiliar map-free environment, a preset target and the current position of the robot; calculating a candidate point with the minimum heuristic value according to the safety heuristic factor and the target heuristic factor, and taking the candidate point with the minimum heuristic value as an optimal exploration point; calculating a plurality of midway points according to the optimal exploration point, sequentially sending the midway points to a lower layer obstacle avoidance module to serve as corresponding temporary target points, and setting the temporary target points as target points of a lower layer reinforcement learning obstacle avoidance model; and inputting the environmental data of the robot into the navigation obstacle avoidance model, outputting a navigation instruction, and sending the navigation instruction to the robot.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.