CN114563011A

CN114563011A - Active auditory localization method for map-free navigation

Info

Publication number: CN114563011A
Application number: CN202210079214.7A
Authority: CN
Inventors: 罗定生; 吴玺宏; 方帅; 张佳男; 林惟凯; 刘天林
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2022-05-31

Abstract

The invention discloses an active auditory positioning method for map-free navigation, which comprises the following steps: 1) training a mobile robot navigation model on a simulation platform by a reinforcement learning method; 2) the mobile robot collects the ranging information of the laser radar at the current moment according to the set time step, and obtains auditory directional information and position and pose information of a mobile robot odometer based on a sound source of a target position; the laser radar is carried on the mobile robot; 3) inputting the ranging information, the auditory sense orientation information and the pose information into the mobile robot navigation model trained in the step 1) to deduce a speed instruction of the current moment, and navigating the mobile robot to a target position according to the speed instruction. The invention adopts a more reliable and effective target positioning mode, and has higher application value for the map-free navigation of a real scene.

Description

Active auditory localization method for map-free navigation

Technical Field

The invention belongs to the field of information science, relates to an auditory localization method, and particularly relates to an active auditory localization method for map-free navigation of a mobile robot.

Background

To date, robots have created great value in the fields of industrial manufacturing, home service, interplanetary exploration, military reconnaissance, and the like. Compared with a visual information mobile robot navigation method, the auditory perception based robot navigation has advantages in privacy protection. Furthermore, when the target is not in the robot field of view or is occluded by an obstacle, auditory localization can provide additional information to help the robot determine the target. The autonomous navigation of the mobile robot means that the mobile robot senses an external environment through a sensor and completes a motion process of reaching a target point without collision by combining the state of the mobile robot. The mobile robot has flexible, efficient and robust navigation capability, and can be better applied to the aspects of industry, service industry, military and the like.

The navigation techniques of robots can be divided into two categories: map-dependent and map-independent. Map-dependent navigation techniques refer to the need for the robot to map the environment as accurately as possible before navigating. The disadvantage of this approach is that it takes a long time for the robot to construct the map and that the map is required to be accurate enough to help the robot locate during navigation. The map-independent navigation technology is also called map-free navigation, and the traditional algorithms include a dynamic window method, a D-star algorithm, a vector histogram algorithm and the like. With the rise of deep learning, the learning-based method gradually becomes a popular research direction of a map-free navigation method, and the main method is to model the navigation process of the robot based on reinforcement learning and simulation learning. However, applying the learned navigation strategy to a real environment, an inevitable problem is how to determine the relative position of the target. Former work shows that the mode based on wifi location and visible light communication is lower in cost, but needs to possess wifi hotspot or LED lamp to the external corresponding receiver of target simultaneously need indoor environment. The positioning mode of the vision-based target has strong flexibility, and various targets can be processed according to semantic types; however, the method has problems of barrier shielding, view range, and the like, and is not good in real-time performance.

To our knowledge, the approach based on auditory active localization has not been introduced into the study of robotic map-less navigation. The positioning mode based on the auditory sense can solve the problem of barrier shielding, does not need a signal receiver, can be applied to an outdoor environment, and can also assist a visual positioning mode.

Disclosure of Invention

The invention aims to provide an active auditory localization method, which is applied to a map-free navigation technology. The navigation model of the robot is trained through a navigation strategy based on reinforcement learning, and a continuously convergent target relative position is obtained in the actual navigation process by adopting an active auditory positioning mode, so that a more accurate and robust navigation model is obtained.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

an active auditory localization method for map-less navigation, comprising the steps of:

1) training a mobile robot navigation model on a simulation platform by a reinforcement learning method;

2) the mobile robot collects the ranging information of the laser radar at the current moment according to the set time step, and obtains auditory directional information and position and pose information of a mobile robot odometer based on a sound source of a target position; the laser radar is carried on the mobile robot;

3) inputting the ranging information, the auditory orientation information and the pose information into the mobile robot navigation model trained in the step 1) to deduce a speed instruction at the current moment, and navigating the mobile robot to a target position according to the speed instruction.

Further, the mobile robot navigation model comprises an Actor network and a Critic network; the Actor network is used for outputting actions capable of maximizing return according to observed states, wherein the states comprise the ranging information, the auditory sense information and the pose information, and the actions are linear speed and angular speed of the mobile robot; and the criticic network is used for outputting the value of < state and action > according to the action information output by the Actor network and the observation information of the current state.

Further, the method for training the navigation model of the mobile robot by the reinforcement learning method comprises the following steps: firstly, different simulation environments are set up, a plurality of obstacles and target points are randomly set in the simulation environments, and then the mobile robot is stimulated to arrive at the target points through a set return formula.

Further, the reward calculation formula is r(s)_t,a_t,s_t+1)＝α₁dis(p_t,p_t+1)+α₂(dis(p_t,p_target)- dis(p_t+1,p_target))+α₃×success+α₄Xcollision; wherein, dis (p)_t,p_t+1) Is to calculate the position point p at the time t_tTo time (t +1) position point p_t+1(di (p) of (d)_t,p_target)-dis(p_t+1,p_target) Is to calculate the position point p at time t +1_t+1Position point p relative to time t_tApproaches the target point p_targetThe success represents the successful arrival at the target point, and the collision represents the occurrence of collision; coefficient alpha₁、α₂、α₃Are all positive numbers, coefficient α₄Is a negative number; s_tIs the state at time t, a_tFor the motion at time t (control of linear and angular velocities), s_t+1The state at time t + 1.

Further, the auditory orientation information includes a 2-dimensional direction vector; the pose information comprises 2-dimensional position information and 1-dimensional angle information; the target location comprises 2-dimensional target location information; the speed instructions include a linear speed and an angular speed.

Further, whether collision occurs is determined according to the ranging information of the laser radar; and if the minimum value in the ranging information is smaller than a set threshold value, determining that the mobile robot collides with the obstacle.

Further, randomizing parameters of the obstacle includes: the shape and type of the obstacle, the position of the obstacle, and the size of the obstacle.

Further, auditory directional information is acquired based on an active auditory localization method.

A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the above method.

A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program realizes the steps of the above-mentioned method when executed by a processor.

The method is characterized in that a mobile robot navigation model is established based on reinforcement learning, the input of the mobile robot navigation model is ranging information of a laser radar carried on a mobile robot, the output of the mobile robot navigation model is a speed instruction required to be executed by the mobile robot based on auditory sense orientation information acquired by a sound source of a target position and pose information of a mobile robot odometer; the training of the model comprises the training of a simulation platform and the training of an actual environment; after the training of the model is completed, the mobile robot collects ranging information, auditory sense orientation information and position and attitude information of the mobile robot odometer of the laser radar at the current moment according to a certain time step, inputs and deduces a speed instruction to be executed according to the information as the model, and finally navigates to a target position.

Further, the method for training the robot navigation model through reinforcement learning in the simulation comprises the following steps: different simulation environments are built according to various real indoor environment layouts, wherein the different simulation environments comprise rectangular environments with the radius of 10m multiplied by 10m,10m multiplied by 5m and 5m multiplied by 5m, and circular areas with the radius of 5m and 10 m. We placed different parameters and different shapes of obstacles in different locations of these environments by a random algorithm, with 20 obstacles in each environment. The parameters to be randomized include the shape of the obstacle, the position of the obstacle, the size of the obstacle, etc. The method includes that a target point which is out of an obstacle is randomly selected in an environment each time, the robot arrives at the target point by exploring the environment each time, and a return formula is designed to stimulate the mobile robot to arrive at the target point without collision.

The immediate reward (reward) function of the present invention is calculated by the formula:

r(s_t,a_t,s_t+1)＝α₁dis(p_t,p_t+1)+α₂(dis(p_t,p_target)-dis(p_t+1,p_target))+α₃× success+α₄×collision；s_t s_t+1the states at time t and time t +1 are represented, and pose information, sensor input and robot speed information of the robot are included. The calculation of the immediate report contains four items, the first item is to calculate the position point p at the moment t_tTo time (t +1) position point p_t+1Dis (p) of each other_t,p_t+1) The second term calculates the position point p at the time t +1_t+1Position point p relative to time t_tApproaches the target point p_targetApproach to target extent (dis (p)_t,p_target)-dis(p_t+1,p_target) Whether the third calculation has successfully arrived at the target point, and whether the fourth calculation has collided; where the coefficients of the first three terms are positive numbers and the coefficients of the last term are negative numbers. Data are collected from an environment with 5 shapes and stored in an experience pool in an asynchronous mode, and a reward-based incentive training control model is trained in an reinforcement learning mode.

Further, the laser radar information includes 360-dimensional ranging information; the information of auditory orientation comprises a 2-dimensional directional vector with a modular length of 1; the information of the odometer includes 2-dimensional position information and 1-dimensional angle information; the target location comprises 2-dimensional target location information; the velocity instructions include linear velocity and angular velocity. We have adopted the most advanced reinforcement learning algorithm TD3, which is an improved version of the DDPG algorithm. The method improves the problems of smooth strategy action, strategy network updating frequency and state cost function overestimation, and has greatly improved performance compared with DDPG. If a collision occurs or a target is reached, the round of training is considered to have ended.

In order to reduce the difficulty of migration from simulation to reality, real indoor conditions and robot configurations are restored in a simulation environment as much as possible. Considering that the complexity of a real environment is difficult to express in a simulation environment, simple dimension reduction is carried out on the ranging information in the simulation training process, namely 10 dimensions of a 360-dimensional laser ranging signal are uniformly selected; similarly, only the 10-dimensional lidar information is used in the real environment. It should be further noted that the training in the simulation environment is slightly different from the training in the real environment. On the calculation of reward, the simulation environment can obtain the unbiased pose of the mobile robot; while on mobile robots it is necessary to calculate the position attitude by mileage, this is biased by the presence of accumulated errors, which are tolerable considering operation in small indoor scenes. In collision detection, the simulation environment can be detected by detecting the intersection condition of the robot and the barrier because of having all information; in an actual scene, the solution is carried out by adopting a mode of setting a collision distance threshold value. Namely, in the 360-dimensional ranging information of the laser radar, if the minimum value is smaller than the threshold value, the collision is considered to occur, and the round of training is stopped.

Furthermore, in the process of target positioning in a real environment, an active auditory positioning mode is adopted. The method reduces the uncertainty of auditory localization by continuously estimating the direction of arrival (DOA) and active movement of the sound source in combination with odometry information. Because the navigation process is a process of continuously avoiding obstacles and approaching a sound source, and the approaching process is a process of amplifying direct sound and reducing reflected sound, the DOA estimation of people is more and more accurate, and finally the navigation in a real environment can be realized.

Compared with the prior art, the invention has the following positive effects:

the invention adopts a more reliable and effective target positioning mode in the mode of acquiring the environmental target information, and the sound generated by human activities is a clue worthy of utilization in the environment facing the service people. Meanwhile, the target positioning method based on active sound source positioning can generate good fusion effect with other positioning methods such as visual positioning. The method has higher application value for the map-free navigation of the real scene.

Drawings

FIG. 1 is a schematic diagram of active auditory navigation of a mobile robot;

FIG. 2 is a schematic view of a spherical microphone array orientation in different directions;

fig. 3 is a schematic diagram of the error results of active auditory localization.

Detailed Description

In order to realize collision-free navigation of the mobile robot in an actual unknown scene, the invention provides an active sound source positioning technology for map-free navigation. The invention provides a target-oriented end-to-end navigation model facing a robot platform through reinforcement learning. The model can learn a complex strategy: and the robot selects a moving mode according to the environment information, wherein the moving mode comprises an original 2D laser ranging result and a target position. Meanwhile, in order to apply a model trained by a simulation environment to a real environment, active auditory localization is set to determine the relative position of a target, and fig. 1 shows a method for continuously determining the position of the target by adjusting the pose of the robot in the navigation process. Fig. 2 shows the measurement error in different directions for determining the target position by means of a spherical microphone array. In order to quantitatively evaluate the performance of active auditory localization, the localization accuracies of different modes are compared, see fig. 3, and it can be seen that the method based on active auditory localization has more accurate localization accuracy.

(1) Data acquisition: in the technical scheme adopted by the invention, navigation data sets are relied on, and as no open-source navigation data set exists at present, a data set of the user needs to be constructed. Under the Gazebo simulation environment, different simulation environments are set up according to various real indoor environment layouts.

(2) Constructing a model: as shown in fig. 3, we adopt TD3 network structures, including an Actor network (policy network) and a Critic network (evaluation network). For the patent, the input is the ranging information of laser radar, the information of auditory sense orientation and the position and posture information of the mobile robot odometer, the processing is carried out through a neural network, and the output is the linear velocity and the angular velocity of the mobile robot (namely the action of maximizing accumulated return report). Wherein the Critic network input is the action information output by the Actor network and the observation information of the current state, and the output is the evaluation of the cost function (i.e. the accumulated return) of the < state, action >. For the patent, the input comprises two parts, one part is the linear velocity and the angular velocity of the mobile robot output by the Actor network, the other part is the observation of the state, the range information of the laser radar, the information of the auditory sense and the pose information of the mobile robot odometer are included, and the output is a fraction value.

(3) Migration of simulation model to physical environment: the method adopts a HOA coding mode to determine the direction of a sound source target, and then the position of the sound source target is continuously determined through the active movement of a robot. The navigation strategy learning based on the reinforcement learning can obtain continuous action instructions according to the target position of the sound source, and finally the robot can navigate to the real target position without collision.

The above embodiments are only intended to illustrate the technical solution of the present invention, but not to limit it, and a person skilled in the art can modify the technical solution of the present invention or substitute it with an equivalent, and the protection scope of the present invention is subject to the claims.

Claims

1. An active auditory localization method for map-less navigation, comprising the steps of:

2) the mobile robot collects the ranging information of the laser radar at the current moment according to the set time step length, and obtains auditory sense orientation information and pose information of a mobile robot odometer based on a sound source of a target position; the laser radar is carried on the mobile robot;

3) inputting the ranging information, the auditory sense orientation information and the pose information into the mobile robot navigation model trained in the step 1) to deduce a speed instruction of the current moment, and navigating the mobile robot to a target position according to the speed instruction.

2. The method of claim 1, wherein the mobile robot navigation model comprises an Actor network and a Critic network; the Actor network is used for outputting actions capable of maximizing return according to observed states, wherein the states comprise the ranging information, the auditory sense orientation information and the pose information, and the actions are linear speed and angular speed of the mobile robot; and the criticic network is used for outputting the value of < state and action > according to the action information output by the Actor network and the observation information of the current state.

3. The method of claim 1 or 2, wherein the method of training the mobile robot navigation model by the reinforcement learning method is: firstly, different simulation environments are set up, a plurality of obstacles and target points are randomly set in the simulation environments, and then the mobile robot is stimulated to arrive at the target points through a set return formula.

4. The method of claim 3, wherein the reward calculation formula is r(s)_t,a_t,s_t+1)＝α₁dis(p_t,p_t+1)+α₂(dis(p_t,p_target)-dis(p_t+1,p_target))+α₃×success+α₄X collision; wherein, dis (p)_t,p_t+1) Is to calculate the position point p at the time t_tTo time (t +1) position point p_t+1(ii) displacement between (dis (p)_t,p_target)-dis(p_t+1,p_target) Is to calculate the position point p at time t +1_t+1Position point p relative to time t_tApproaches the target point p_targetThe approach to target degree of (1), success is represented asWhen the work reaches the target point, collision represents that collision occurs; coefficient alpha₁、α₂、α₃Are all positive numbers, coefficient α₄Is a negative number; s_tIs the state at time t, a_tThe movement at time t, s_t+1The state at time t + 1.

5. The method of claim 1, wherein the auditory directional information comprises a 2-dimensional directional vector; the pose information comprises 2-dimensional position information and 1-dimensional angle information; the target location comprises 2-dimensional target location information; the speed instructions include a linear speed and an angular speed.

6. The method of claim 1, wherein whether a collision occurs is determined based on ranging information of the laser radar; and if the minimum value in the ranging information is smaller than a set threshold value, determining that the mobile robot collides with the obstacle.

7. The method of claim 1, wherein randomizing parameters of the obstacle comprises: the shape and type of the obstacle, the position of the obstacle, and the size of the obstacle.

8. The method of claim 1, wherein the auditory directional information is obtained based on an active auditory localization method.

9. A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method according to any one of claims 1 to 8.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.