WO2022160430A1 - Method for obstacle avoidance of robot in the complex indoor scene based on monocular camera - Google Patents

Method for obstacle avoidance of robot in the complex indoor scene based on monocular camera Download PDF

Info

Publication number
WO2022160430A1
WO2022160430A1 PCT/CN2021/081649 CN2021081649W WO2022160430A1 WO 2022160430 A1 WO2022160430 A1 WO 2022160430A1 CN 2021081649 W CN2021081649 W CN 2021081649W WO 2022160430 A1 WO2022160430 A1 WO 2022160430A1
Authority
WO
WIPO (PCT)
Prior art keywords
robot
obstacle avoidance
laser
data
reinforcement learning
Prior art date
Application number
PCT/CN2021/081649
Other languages
French (fr)
Inventor
Xin Yang
Jianchuan DING
Baocai Yin
Zhenjun DU
Haiyin Piao
Yang Sun
Original Assignee
Dalian University Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University Of Technology filed Critical Dalian University Of Technology
Publication of WO2022160430A1 publication Critical patent/WO2022160430A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0246Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
    • G05D1/0253Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means extracting relative motion information from a plurality of images taken successively, e.g. visual odometry, optical flow
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Definitions

  • the disclosure belongs to the field of navigation and obstacle avoidance in the field of robots, and the specific realization result is autonomous navigation and obstacle avoidance of robots, and particularly relates to a method for fully and effectively perceiving complex obstacles.
  • the obstacle avoidance task of the robot is in a more complex scene, the robot can navigate to the target point autonomously without any collision with the obstacle, which has great practical application value.
  • robot obstacle avoidance-related tasks such as sweeping robots, unmanned driving, smart warehouses, and smart logistics, have achieved significant performance improvements.
  • Some new work abandons the Lidar sensor, use the depth map and color map captured by the RGB-D camera as input, directly map to the action, and carry out the end-to-end training of reinforcement learning.
  • images have rich semantic information, but they also have a lot of redundant information that is not helpful for obstacle avoidance, which makes it difficult for reinforcement learning algorithms to converge, difficult to train, and cause a large gap between virtual and reality and difficult migration of strategies.
  • the depth camera appears a lot of noise in an indoor environment with sunlight, and it almost fails.
  • the traditional method of using depth map to point cloud mapping to remove ground interference information cannot perceive low obstacles on the ground such as clothing and swimming pools. Therefore, the method based on RGB-D end-to-end learning also has many problems, unable to fully perceive the complex indoor environment, and even unable to safely navigate and avoid obstacles.
  • the present disclosure is based on the investigation and analysis of the existing obstacle avoidance navigation technology, by combining the advantages of Lidar and RGB camera, while abandoning the disadvantages of the two, constructs "pseudo laser" data, thereby realizing autonomous navigation and navigation in complex scenes.
  • Obstacle avoidance task The input of the method is the image taken by the monocular RGB camera on the robot platform, and the output is the action to be taken by the robot, including linear velocity and angular velocity. This method can effectively perceive different types of complex obstacles in indoor scenes, thereby helping the reinforcement learning module to perform efficient learning and decision-making.
  • the purpose of the present disclosure is to realize an efficient obstacle avoidance method for robots by mapping "pseudo laser" data for monocular RGB images in complex scenes.
  • the method includes an environment perception stage and a control decision stage.
  • the environment perception stage includes a depth prediction module, a semantic segmentation module, and a depth slicing module;
  • the control decision stage includes a feature extraction guidance module and a reinforcement learning module.
  • the method of the present disclosure is suitable for complex obstacles of various shapes and sizes.
  • a method for obstacle avoidance of robot in the complex indoor scene based on monocular camera includes the following steps:
  • Step 1 Loading robot simulation model and building a training and testing simulation environment
  • Step 2 Getting semantic depth map
  • RGB image from monocular camera carried by the TurtleBot-ROS robot, and inputting the RGB image into Fastdepth depth prediction network to obtain the depth map under current field of view; selecting lower half of the depth map as intermediate result; ground pixel information in the intermediate result will interfere obstacle avoidance, resulting in obstacle avoidance failure, so inputtingthe RGB image into CCNet semantic segmentation model to obtain a two-class semantic segmentation mask, where 0 represents the ground pixel and 1 represents the background.
  • the semantic segmentation mask and the depth map are multiplypixel by pixel to obtain semantic depth map, where value of each pixel in the semantic depth map is depth distance of current viewing angle, and at the same time removing disturbing ground depth value;
  • Step 3 Deep slice and data enhancement module
  • Pooling window size is (240, 1) and step size is 1. Selecting the minimum value in the window for each pooling operation as output object, and performing all pooling operationson each column of the image, and result obtained is "pseudo laser" data;
  • Step 4 Controlling decision stage
  • the deep reinforcement learning module adopts PPO algorithm, and network structure is composed of 3 layers of convolutional layers and 3 layers of fully connected layers; in order to make the experimental robot reach target position steadily and safely, input of state includes three parts: observation data, target point distance and speed; the observation data is the "pseudo-laser" data obtained in step 3, the distance and speed of the target point are obtained by onboard odometer of the robot; proposing a feature extraction guidance layer, and extracting and fusing data features of three modes by three layers of convolution, and then obtaining feature mask through sigmoid activationand multiplying the "pseudo laser” observation data, sending result obtained to the deep reinforcement learning module; extracting information that is helpful to obstacle avoidance strategy from multi-modal data, and then combining the information with the "pseudo laser” observation data to make the subsequent feature extraction process more targeted and speed up convergence of the network;
  • Step 5 Forming a monocular obstacle avoidance navigation network and output decision results
  • Splicing steps 2, 3, and 4 to obtain the input image from the monocular RGB camera. After processing, obtaining the depth map and the semantic segmentation mask, and multiplying the dots and cropping. After the dynamic minimum pooling operation, obtaining the "pseudo laser" observation data. Inputting three consecutive frames of "pseudo-laser” observation data into the deep reinforcement learning module together with the distance and speed of the target point. After the feature extraction guidance layer, different attention is paid to each dimension of the "pseudo-laser” observation data. After multi-layer convolution, pooling, and full connection, the LSTM layer is used to increase the timing correlation for the entire path, and finally, the decision-making action of the robot at the current moment is output, so as to achieve the effect of autonomous obstacle avoidance and navigation.
  • the disclosure solves the difficulty of fully perceiving complex obstacles (non-convex irregular obstacles, ferrous metals, complex ground obstacles) in the obstacle avoidance task of the robot in the indoor environment, which leads to the difficulty of obstacle avoidance failure, and helps the robot to use the semantic information of the environment to remove the interference of redundant pixels enables efficient reinforcement learning training and decision-making.
  • the present disclosure proposes a reinforcement learning mapping method from a single RGB image directly to the robot's obstacle avoidance navigation action. The method relies on "pseudo laser" data and performs efficient decision-making by encoding semantic information into the laser data. And the accuracy of the method is proved through comparative experiments. In the comparative experiment, the method obtained the best performance in the average success rate and average time of all two commonly used indicators and has great advantages in complex scenarios.
  • the disclosure is suitable for obstacle avoidance and navigation tasks in different complex indoor scenes (a) scenes containing non-convex irregular obstacles; (b) scenes containing black metal smooth material obstacles; (c) containing messy clothing on the ground, obstacle scenes such as glass and swimming pool. The effectiveness and applicability of this method in different scenarios are proved.
  • Figure 1 is the network structure of the present disclosure.
  • Figure 2 is the visualization result of the experiment of the embodiment of the present disclosure.
  • the state includes "pseudo-laser" data, the distance from the target point, and the velocity at the previous moment; the action is composed of the linear velocity and angular velocity of the wheeled robot; the reward function includes the distance state from the target at each moment (the closer to the target, the positive return, and vice versa) , if there is a collision, it is -15, and if it reaches the target point, it is 15.
  • the robot is encouraged to not take too much action at each step, that is, it cannot exceed the previous one. 1.7 times the angular velocity at the moment.
  • the reinforcement learning algorithm is implemented in Pytorch. Stochastic gradient descent is used in the reinforcement learning network. Its momentum value is 0.9, weight attenuation is 1e-4, the learning rate is set to 5e-5, the attenuation factor is 0.99, the KL divergence parameter is 15e-4, and the maximum step size is 150.
  • the learning process is terminated after 1.5 million training paths, and it takes about 40 hours to train the strategy on the computer equipped with an i7-7700 CPU and an NVIDIA GTX 1080Ti GPU.
  • it is compared with the traditional method ORCA and the latest learning method multi-robot distributed obstacle avoidance strategy to verify the effectiveness of the disclosure. And perform ablation experiments on all the modules proposed in the network to prove the effectiveness of each part.
  • Figure 1 is the network structure of the monocular obstacle avoidance navigation network.
  • the network is composed of an environment perception stage and a control decision stage, which specifically includes a depth prediction module, a semantic mask module, a depth slicing module, a feature extraction guidance module, a reinforcement learning module, and data enhancement.
  • the network takes monocular RGB images as input, and after obtaining the semantic depth map, it performs a dynamic minimization operation to obtain "pseudo-laser" data, which is used as the state input of reinforcement learning to generate the final robot decision-making action.
  • Figure 2 is the process visualization result of the monocular visual obstacle avoidance navigation framework, in which (A) is listed as a chair obstacle scene; (B) is listed as a table obstacle scene; (C) is listed as a clothing obstacle scene; (D) is listed as a glass obstacle Scenes.
  • the monocular camera on the robot platform captures the RGB image, predicts the semantic depth map, and then slices it to generate "pseudo laser” data.
  • the comparison between the last two rows of "pseudo laser” data and Lidar data shows that the "pseudo laser” data "Can capture more complete environmental information, so as to carry out efficient reinforcement learning training and better environmental interaction.
  • a method for obstacle avoidance of robot in the complex indoor scene based on monocular camera includes the following steps:
  • Step 1 Load the robot simulation model and build a training and testing simulation environment
  • the URDF model of the TurtleBot-ROS robot is used as the experimental robot; Block, Crossing, and Passing in ROS-Stage are used as the training environment, and 24 identical TurtleBot-ROS robots are deployed for distributed control decision module training; use the cafe environment in ROS-Gazebo as the background of the test scene, and manually add complex obstacles (tables, chairs, wardrobes, moving pedestrians, etc. ) in Gazebo to test the effectiveness of the entire vision system;
  • Step 2 Get semantic depth map
  • the RGB image is input into the CCNet semantic segmentation model to obtain a two-class semantic segmentation mask, where 0 represents the ground pixel and 1 represents the background.
  • the semantic segmentation mask and the depth map are multiplypixel by pixel to obtain the semantic depth map, where the value of each pixel in the semantic depth map is the depth distance of the current viewing angle, and at the same time remove the disturbing ground depth value;
  • Step 3 Deep slice and data enhancement module
  • the pooling window size is (240, 1)
  • the step size is 1, and each pooling operation selects the minimum value in the window as the output object, and pooling 640 times, data with a size of (1, 640) is obtained, which is "pseudo laser" data.
  • "Pseudo-laser” not only retains the advantages of simple, easy-to-learn, and easy-to-transfer Lidar data, but also retains the semantic information in the visual image. Since the data is obtained from a two-dimensional image through a minimum pooling operation, it can fully perceive the complex obstacles in the environment, encode semantics into the laser of each dimension, and support the subsequent efficient reinforcement learning and the implementation of safe obstacle avoidance strategies.
  • the sensor data obtained in a virtual environment is often perfect, but in a real environment, if some part of an object occludes another object, the observation value usually has an observation error near the boundary of the object. Larger noise will reduce the accuracy of the algorithm or even fail. Therefore, a data enhancement method is introduced, and noise interference is applied to the observation data of the virtual environment during training.
  • noise interference is applied to the observation data of the virtual environment during training.
  • replace the values around two adjacent endpoints by linear interpolation with a window size of (1, 8) .
  • Gaussian white noise with a variance of 0.08 is adaptively added. This data enhancement method enables it to be directly transferred and adapted to real scenes full of noise even if it is trained in a virtual environment.
  • Step 4 Control decision stage
  • the deep reinforcement learning module adopts the PPO algorithm, and the network structure is composed of 3 layers of convolutional layers and 3 layers of fully connected layers.
  • the state input includes three parts: observation data, target distance, and speed.
  • the observation data is the "pseudo laser" data obtained in step 3, and the distance and speed of the target point are obtained by the onboard odometer of the robot.
  • direct fusion and indirect fusion are two commonly used methods: direct fusion and indirect fusion.
  • direct fusion in the channel is not conducive to learning obstacle avoidance strategies.
  • blind indirect extraction leads to ignoring useful information in the observation data and capturing useless information.
  • a feature extraction guidance layer is proposed.
  • the data features of the three modalities are extracted and fused by three layers of convolution, and then the feature mask is obtained by sigmoid activation, and the "pseudo laser" observation data is multiplied, and the result is sent to the deep reinforcement learning module. It combines the advantages of the previous method.
  • the information that is helpful for obstacle avoidance strategies is extracted from the multi-modal data, and then it is combined with the observation data so that the subsequent feature extraction process is more targeted and the convergence of the network is accelerated.
  • the robot Because the monocular RGB camera is used as the sensor, the robot has a small forward perspective of 60°. Therefore, the second fully connected layer of the reinforcement learning network structure is modified to the LSTM layer to increase the timing correlation of the reinforcement learning module so that the robot can Make decisions based on all observations in the entire path.
  • Step 5 Form a monocular obstacle avoidance navigation network and output decision results
  • Steps 2, 3, and 4 are stitched together to obtain the input image from the monocular RGB camera, and the depth map and semantic mask are obtained after processing, and then cropped after dot multiplication.
  • the "pseudo laser” data is obtained.
  • the “pseudo laser” of the frame is input into the reinforcement learning network together with the distance and speed of the target point.
  • a different degree of attention is applied to each dimension in the “pseudo laser” data.
  • the LSTM is used to increase the timing correlation for the entire path, and finally, the decision-making action of the robot at the current moment is output, so as to achieve the effect of autonomous obstacle avoidance and navigation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Electromagnetism (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a method for avoiding obstacles in a robot indoor complex scene based on the monocular camera, which belongs to the field of robot navigation and obstacle avoidance. The monocular obstacle avoidance navigation network of the present invention is composed of an environment perception stage and a control decision-making stage, and specifically includes a depth prediction module, a semantic mask module, a depth slicing module, a feature extraction guidance module, a reinforcement learning module, and data enhancement. The network uses monocular RGB images as input, and after obtaining the semantic depth map, it performs a dynamic minimization operation to obtain "pseudo-laser" data, which is used as the state input of reinforcement learning to generate the final robot decision-making action. The invention solves the difficulty of fully perceiving complex obstacles in the obstacle avoidance task of the robot in the indoor environment, which leads to the difficulty of obstacle avoidance failure, and helps the robot to use the semantic information of the environment to remove the interference of redundant pixels, thereby performing efficient reinforcement learning training and Decision-making has validity and applicability in different scenarios.

Description

Method for obstacle avoidance of robot in the complex indoor scene based on monocular camera Technical Field
The disclosure belongs to the field of navigation and obstacle avoidance in the field of robots, and the specific realization result is autonomous navigation and obstacle avoidance of robots, and particularly relates to a method for fully and effectively perceiving complex obstacles.
Background
The obstacle avoidance task of the robot is in a more complex scene, the robot can navigate to the target point autonomously without any collision with the obstacle, which has great practical application value. With the rapid development of artificial intelligence technology, robot obstacle avoidance-related tasks, such as sweeping robots, unmanned driving, smart warehouses, and smart logistics, have achieved significant performance improvements.
However, indoor obstacle avoidance scenes often have some complex obstacles, such as non-convex irregular objects such as tables and chairs, black metal objects, clothing, and other obstacles lying flat on the ground. These objects will affect the traditional one-dimensional laser. The radar obstacle avoidance strategy has a serious impact, but there is no relevant research to deal with such objects. The existence of such complex obstacles will cause the Lidar system to be unable to fully perceive the environment, and thus make the navigation obstacle avoidance system ineffective. Specifically, for irregular objects, such as tables, one-dimensional Lidar can only perceive the legs of the table, which will make the robot mistakenly believe that the gap between the legs of the table can pass through, but when the robot is taller, it will interact with the table. For black metal objects, it will seriously interfere with the Lidar and absorb the emitted laser light, making it completely invalid; for complex obstacles on the ground, traditional methods cannot perceive lower ground obstacles, even embedded like swimming pools to the ground, but impassable obstacles. Therefore, fully and efficiently perceiving complex obstacles is an urgent task in the field of robot obstacle avoidance.
Most of the existing robot obstacle avoidance navigation methods use deep  reinforcement learning as a learning method, which is popular because it can learn independently without manually collecting labeled data sets. Reinforcement learning is a "trial and error" process, which is often learned in a virtual environment and then transferred to a real scene. In order to narrow the gap between virtual and reality, Lidar data with the simple data format and easy to learn is usually used. However, the Lidar data is not fully aware of complex obstacles and cannot implement efficient obstacle avoidance strategies.
Some new work abandons the Lidar sensor, use the depth map and color map captured by the RGB-D camera as input, directly map to the action, and carry out the end-to-end training of reinforcement learning. For laser data, images have rich semantic information, but they also have a lot of redundant information that is not helpful for obstacle avoidance, which makes it difficult for reinforcement learning algorithms to converge, difficult to train, and cause a large gap between virtual and reality and difficult migration of strategies. In addition, the depth camera appears a lot of noise in an indoor environment with sunlight, and it almost fails. The traditional method of using depth map to point cloud mapping to remove ground interference information cannot perceive low obstacles on the ground such as clothing and swimming pools. Therefore, the method based on RGB-D end-to-end learning also has many problems, unable to fully perceive the complex indoor environment, and even unable to safely navigate and avoid obstacles.
Therefore, the present disclosure is based on the investigation and analysis of the existing obstacle avoidance navigation technology, by combining the advantages of Lidar and RGB camera, while abandoning the disadvantages of the two, constructs "pseudo laser" data, thereby realizing autonomous navigation and navigation in complex scenes. Obstacle avoidance task. The input of the method is the image taken by the monocular RGB camera on the robot platform, and the output is the action to be taken by the robot, including linear velocity and angular velocity. This method can effectively perceive different types of complex obstacles in indoor scenes, thereby helping the reinforcement learning module to perform efficient learning and decision-making.
Summary of the Disclosure
The purpose of the present disclosure is to realize an efficient obstacle avoidance method for robots by mapping "pseudo laser" data for monocular RGB images in  complex scenes. The method includes an environment perception stage and a control decision stage. The environment perception stage includes a depth prediction module, a semantic segmentation module, and a depth slicing module; the control decision stage includes a feature extraction guidance module and a reinforcement learning module. The method of the present disclosure is suitable for complex obstacles of various shapes and sizes.
The technical solution of the present disclosure is:
A method for obstacle avoidance of robot in the complex indoor scene based on monocular camera, the method includes the following steps:
Step 1, Loading robot simulation model and building a training and testing simulation environment
In order to solve obstacle avoidance problem in complex scenes, using URDF model of TurtleBot-ROS robot as experimental robot; usingBlock, Crossing, and Passing in ROS-Stage as training environment, and deploying24 identical TurtleBot-ROS robots for distributed control decisionmodule training; using cafe environment in ROS-Gazebo as background of test scene, and manually adding complex obstacles in Gazebo to test effectiveness of entire visual system;
Step 2, Getting semantic depth map
Obtaining RGB image from monocular camera carried by the TurtleBot-ROS robot, and inputting the RGB image into Fastdepth depth prediction network to obtain the depth map under current field of view; selecting lower half of the depth map as intermediate result; ground pixel information in the intermediate result will interfere obstacle avoidance, resulting in obstacle avoidance failure, so inputtingthe RGB image into CCNet semantic segmentation model to obtain a two-class semantic segmentation mask, where 0 represents the ground pixel and 1 represents the background. The semantic segmentation mask and the depth map are multiplypixel by pixel to obtain semantic depth map, where value of each pixel in the semantic depth map is depth distance of current viewing angle, and at the same time removing disturbing ground depth value;
Step 3, Deep slice and data enhancement module
Performing a dynamic minimum pooling operation on depth value pixels in the semantic depth map. Pooling window size is (240, 1) and step size is 1. Selecting the minimum value in the window for each pooling operation as output object, and  performing all pooling operationson each column of the image, and result obtained is "pseudo laser" data;
By introducing a data enhancement method, applying noise interference to observation data of virtual environment during training; in order to identify noise boundary from training laser measurement, assuming that if the difference between two adjacent values in vector is greater than threshold 0.5, there will be a boundary; and replacing values around two adjacent endpoints by linear interpolation with a window size of (1, 8) ; at the same time, for all laser observation data, adaptively adding Gaussian white noise with a variance of 0.08;
Step 4, Controlling decision stage
After obtaining the "pseudo laser" data, placing the "pseudo laser" for three consecutive moments in three channels, and using formed tensor as input of deep reinforcement learning module, so that the experimental robot can effectively perceive optical flow effect of dynamic obstacles in a short period of time, so as to make correct decisions on dynamic obstacles;
The deep reinforcement learning module adopts PPO algorithm, and network structure is composed of 3 layers of convolutional layers and 3 layers of fully connected layers; in order to make the experimental robot reach target position steadily and safely, input of state includes three parts: observation data, target point distance and speed; the observation data is the "pseudo-laser" data obtained in step 3, the distance and speed of the target point are obtained by onboard odometer of the robot; proposing a feature extraction guidance layer, and extracting and fusing data features of three modes by three layers of convolution, and then obtaining feature mask through sigmoid activationand multiplying the "pseudo laser" observation data, sending result obtained to the deep reinforcement learning module; extracting information that is helpful to obstacle avoidance strategy from multi-modal data, and then combining the information with the "pseudo laser" observation data to make the subsequent feature extraction process more targeted and speed up convergence of the network;
Modifying second fully connected layer of the deep reinforcement learning module to an LSTM layer, increasing timing correlation of the deep reinforcement learning module, and enabling the experimental robot to make decisions based on all observations in entire path;
Step 5, Forming a monocular obstacle avoidance navigation network and output decision results
Splicing  steps  2, 3, and 4 to obtain the input image from the monocular RGB camera. After processing, obtaining the depth map and the semantic segmentation mask, and multiplying the dots and cropping. After the dynamic minimum pooling operation, obtaining the "pseudo laser" observation data. Inputting three consecutive frames of "pseudo-laser" observation data into the deep reinforcement learning module together with the distance and speed of the target point. After the feature extraction guidance layer, different attention is paid to each dimension of the "pseudo-laser" observation data. After multi-layer convolution, pooling, and full connection, the LSTM layer is used to increase the timing correlation for the entire path, and finally, the decision-making action of the robot at the current moment is output, so as to achieve the effect of autonomous obstacle avoidance and navigation.
The beneficial effects of the present disclosure:
(1) Obstacle avoidance test results and efficiency
The disclosure solves the difficulty of fully perceiving complex obstacles (non-convex irregular obstacles, ferrous metals, complex ground obstacles) in the obstacle avoidance task of the robot in the indoor environment, which leads to the difficulty of obstacle avoidance failure, and helps the robot to use the semantic information of the environment to remove the interference of redundant pixels enables efficient reinforcement learning training and decision-making. The present disclosure proposes a reinforcement learning mapping method from a single RGB image directly to the robot's obstacle avoidance navigation action. The method relies on "pseudo laser" data and performs efficient decision-making by encoding semantic information into the laser data. And the accuracy of the method is proved through comparative experiments. In the comparative experiment, the method obtained the best performance in the average success rate and average time of all two commonly used indicators and has great advantages in complex scenarios.
(2) Broad applicability
The disclosure is suitable for obstacle avoidance and navigation tasks in different complex indoor scenes (a) scenes containing non-convex irregular obstacles; (b) scenes containing black metal smooth material obstacles; (c) containing messy clothing on the ground, obstacle scenes such as glass and swimming pool. The  effectiveness and applicability of this method in different scenarios are proved.
Description of the Drawings
Figure 1 is the network structure of the present disclosure.
Figure 2 is the visualization result of the experiment of the embodiment of the present disclosure.
Detailed description
The specific embodiments of the present disclosure will be further described below in conjunction with the drawings and technical solutions.
This method uses PPO as the framework of deep reinforcement learning. The state includes "pseudo-laser" data, the distance from the target point, and the velocity at the previous moment; the action is composed of the linear velocity and angular velocity of the wheeled robot; the reward function includes the distance state from the target at each moment (the closer to the target, the positive return, and vice versa) , if there is a collision, it is -15, and if it reaches the target point, it is 15. The robot is encouraged to not take too much action at each step, that is, it cannot exceed the previous one. 1.7 times the angular velocity at the moment.
The reinforcement learning algorithm is implemented in Pytorch. Stochastic gradient descent is used in the reinforcement learning network. Its momentum value is 0.9, weight attenuation is 1e-4, the learning rate is set to 5e-5, the attenuation factor is 0.99, the KL divergence parameter is 15e-4, and the maximum step size is 150. In the embodiment of the present disclosure, the learning process is terminated after 1.5 million training paths, and it takes about 40 hours to train the strategy on the computer equipped with an i7-7700 CPU and an NVIDIA GTX 1080Ti GPU. In order to verify the effectiveness of the network, it is compared with the traditional method ORCA and the latest learning method multi-robot distributed obstacle avoidance strategy to verify the effectiveness of the disclosure. And perform ablation experiments on all the modules proposed in the network to prove the effectiveness of each part.
Figure 1 is the network structure of the monocular obstacle avoidance navigation network. The network is composed of an environment perception stage and a control decision stage, which specifically includes a depth prediction module, a semantic mask module, a depth slicing module, a feature extraction guidance module, a reinforcement learning module, and data enhancement. The network takes monocular RGB images as input, and after obtaining the semantic depth map, it performs a  dynamic minimization operation to obtain "pseudo-laser" data, which is used as the state input of reinforcement learning to generate the final robot decision-making action.
Figure 2 is the process visualization result of the monocular visual obstacle avoidance navigation framework, in which (A) is listed as a chair obstacle scene; (B) is listed as a table obstacle scene; (C) is listed as a clothing obstacle scene; (D) is listed as a glass obstacle Scenes. The monocular camera on the robot platform captures the RGB image, predicts the semantic depth map, and then slices it to generate "pseudo laser" data. The comparison between the last two rows of "pseudo laser" data and Lidar data shows that the "pseudo laser" data "Can capture more complete environmental information, so as to carry out efficient reinforcement learning training and better environmental interaction.
A method for obstacle avoidance of robot in the complex indoor scene based on monocular camera, the method includes the following steps:
Step 1 Load the robot simulation model and build a training and testing simulation environment
In order to solve the obstacle avoidance problem in complex scenes, the URDF model of the TurtleBot-ROS robot is used as the experimental robot; Block, Crossing, and Passing in ROS-Stage are used as the training environment, and 24 identical TurtleBot-ROS robots are deployed for distributed control decision module training; use the cafe environment in ROS-Gazebo as the background of the test scene, and manually add complex obstacles (tables, chairs, wardrobes, moving pedestrians, etc. ) in Gazebo to test the effectiveness of the entire vision system;
Step 2 Get semantic depth map
Obtain the RGB image from the monocular camera carried by the TurtleBot-ROS robot, and input it into the Fastdepth depth prediction network to obtain the depth map under the current field of view; select the lower half of the depth map as the intermediate result; the ground pixel information in the result will interfere Obstacle avoidance, resulting in obstacle avoidance failure, so the RGB image is input into the CCNet semantic segmentation model to obtain a two-class semantic segmentation mask, where 0 represents the ground pixel and 1 represents the background. The semantic segmentation mask and the depth map are multiplypixel by pixel to obtain the semantic depth map, where the value of each pixel in the semantic  depth map is the depth distance of the current viewing angle, and at the same time remove the disturbing ground depth value;
Step 3 Deep slice and data enhancement module
Perform a dynamic minimum pooling operation on the depth value pixels in the semantic depth map, the pooling window size is (240, 1) , the step size is 1, and each pooling operation selects the minimum value in the window as the output object, and pooling 640 times, data with a size of (1, 640) is obtained, which is "pseudo laser" data. "Pseudo-laser" not only retains the advantages of simple, easy-to-learn, and easy-to-transfer Lidar data, but also retains the semantic information in the visual image. Since the data is obtained from a two-dimensional image through a minimum pooling operation, it can fully perceive the complex obstacles in the environment, encode semantics into the laser of each dimension, and support the subsequent efficient reinforcement learning and the implementation of safe obstacle avoidance strategies.
The sensor data obtained in a virtual environment is often perfect, but in a real environment, if some part of an object occludes another object, the observation value usually has an observation error near the boundary of the object. Larger noise will reduce the accuracy of the algorithm or even fail. Therefore, a data enhancement method is introduced, and noise interference is applied to the observation data of the virtual environment during training. In order to identify the noise boundary from the training laser measurement, it is assumed that if the difference between two adjacent values in the vector is greater than the threshold of 0.5, there will be a boundary. And replace the values around two adjacent endpoints by linear interpolation with a window size of (1, 8) . At the same time, for all laser observation data, Gaussian white noise with a variance of 0.08 is adaptively added. This data enhancement method enables it to be directly transferred and adapted to real scenes full of noise even if it is trained in a virtual environment.
Step 4 Control decision stage
After obtaining the "pseudo laser" data, place the "pseudo laser" at three consecutive moments in the three channels to form a tensor with a size of (3, 640) as the input of the deep reinforcement learning module, which can make the robot effectivelyperceive the optical flow effect of dynamic obstacles in a short period of time, so as to make correct decisions on dynamic obstacles.
The deep reinforcement learning module adopts the PPO algorithm, and the network structure is composed of 3 layers of convolutional layers and 3 layers of fully connected layers. In order for the robot to reach the target position steadily and safely, the state input includes three parts: observation data, target distance, and speed. The observation data is the "pseudo laser" data obtained in step 3, and the distance and speed of the target point are obtained by the onboard odometer of the robot. Currently, there are two commonly used methods: direct fusion and indirect fusion. However, because the information comes from different modes, direct fusion in the channel is not conducive to learning obstacle avoidance strategies. On the other hand, blind indirect extraction leads to ignoring useful information in the observation data and capturing useless information. To this end, a feature extraction guidance layer is proposed. The data features of the three modalities are extracted and fused by three layers of convolution, and then the feature mask is obtained by sigmoid activation, and the "pseudo laser" observation data is multiplied, and the result is sent to the deep reinforcement learning module. It combines the advantages of the previous method. The information that is helpful for obstacle avoidance strategies is extracted from the multi-modal data, and then it is combined with the observation data so that the subsequent feature extraction process is more targeted and the convergence of the network is accelerated.
Because the monocular RGB camera is used as the sensor, the robot has a small forward perspective of 60°. Therefore, the second fully connected layer of the reinforcement learning network structure is modified to the LSTM layer to increase the timing correlation of the reinforcement learning module so that the robot can Make decisions based on all observations in the entire path.
Step 5 Form a monocular obstacle avoidance navigation network and output decision results
Steps  2, 3, and 4 are stitched together to obtain the input image from the monocular RGB camera, and the depth map and semantic mask are obtained after processing, and then cropped after dot multiplication. After the dynamic minimum pooling operation, the "pseudo laser" data is obtained. The “pseudo laser” of the frame is input into the reinforcement learning network together with the distance and speed of the target point. After the feature extraction guidance layer, a different degree of attention is applied to each dimension in the “pseudo laser” data. After  transformation and full connection, the LSTM is used to increase the timing correlation for the entire path, and finally, the decision-making action of the robot at the current moment is output, so as to achieve the effect of autonomous obstacle avoidance and navigation.

Claims (1)

  1. A method for obstacle avoidance of robot in the complex indoor scene based on monocular camera, wherein the method includes following steps of:
    step 1, loading robot simulation model and building a training and testing simulation environment
    in order to solve obstacle avoidance problem in complex scenes, using URDF model of TurtleBot-ROS robot as experimental robot; using Block, Crossing, and Passing in ROS-Stage as training environment, and deploying 24 identical TurtleBot-ROS robots for distributed control decisionmodule training; using cafe environment in ROS-Gazebo as background of test scene, and manually adding complex obstacles in Gazebo to test effectiveness of entire visual system;
    step 2, getting semantic depth map
    obtaining RGB image from monocular camera carried by the TurtleBot-ROS robot, and inputtingthe RGB image into the Fastdepth depth prediction network to obtain the depth map under current field of view; selecting lower half of the depth map as intermediate result; ground pixel information in the intermediate result will interfere obstacle avoidance, resulting in obstacle avoidance failure, so inputtingthe RGB image is into CCNet semantic segmentation model to obtain a two-class semantic segmentation mask, where 0 represents the ground pixel and 1 represents the background, the semantic segmentation mask and the depth map are multiply pixel by pixelto obtain semantic depth map, where value of each pixel in the semantic depth map is depth distance of current viewing angle, and at the same time removing disturbing ground depth value;
    step 3, deep slice and data enhancement module
    performing a dynamic minimum pooling operation on depth value pixels in the semantic depth map, pooling window size is (240, 1) and step size is 1, selecting the minimum value in the window for each pooling operation as output object, and performing all pooling operations on each column of the image, and result obtained is "pseudo laser" data; by introducing a data enhancement method, applying noise interference to observation data ofvirtual environment during training; in order to identify noise boundary from training laser measurement, assuming that if the difference between two adjacent values in vector is greater than threshold 0.5, there will be a boundary; and replacing values around two adjacent endpoints by linear interpolation with a window size of (1, 8) ; at the same time, for all laser observation  data, adaptively adding Gaussian white noise with a variance of 0.08;
    step 4, controlling decision stage
    after obtaining the "pseudo laser" data, placing the "pseudo laser" for three consecutive moments in three channels, and using formed tensor as input of the deep reinforcement learning module, so that the experimental robot can effectively perceive optical flow effect of dynamic obstacles in a short period of time, so as to make correct decisions on dynamic obstacles;
    the deep reinforcement learning module adopts PPO algorithm, and network structure is composed of 3 layers of convolutional layers and 3 layers of fully connected layers; in order to make the experimental robot reach target position steadily and safely, input of state includes three parts: observation data, target point distance and speed; the observation data is the "pseudo-laser" data obtained in step 3, the distance and speed of the target point are obtained by onboard odometer of the robot; proposing a feature extraction guidance layer, and extracting and fusing data features of three modes by three layers of convolution, and then obtaining feature mask through sigmoid activationand multiplying "pseudo-laser" observation data, sending result obtained to the deep reinforcement learning module; extracting information that is helpful for obstacle avoidance strategy from multi-modal data, and then combiningthe information with the "pseudo laser" observation data to make the subsequent feature extraction process more targeted and speed up convergence of the network;
    modifying second fully connected layer of the deep reinforcement learning module to an LSTM layer, increasing the timing correlation of the deep reinforcement learning module, and enabling the experimental robot to make decisions based on all observations in entire path;
    step 5, forming a monocular obstacle avoidance navigation network and output decision results
    splicing steps 2, 3, and 4 to obtain the input image from the monocular RGB camera; After processing, obtaining the depth map and the semantic segmentation mask, and multiplying the dots and cropping; after the dynamic minimum pooling operation, obtaining the "pseudo laser" observation data; inputting three consecutive frames of "pseudo-laser" observation data into the deep reinforcement learning module together with the distance and speed of the target point; after the feature  extraction guidance layer, different attention is paid to each dimension of the "pseudo-laser" observation data; after multi-layer convolution, pooling, and full connection, the LSTM layer is used to increase the timing correlation for the entire path, and finally, the decision-making action of the robot at the current moment is output, so as to achieve the effect of autonomous obstacle avoidance and navigation.
PCT/CN2021/081649 2021-01-27 2021-03-19 Method for obstacle avoidance of robot in the complex indoor scene based on monocular camera WO2022160430A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110106801.6 2021-01-27
CN202110106801.6A CN112767373B (en) 2021-01-27 2021-01-27 Robot indoor complex scene obstacle avoidance method based on monocular camera

Publications (1)

Publication Number Publication Date
WO2022160430A1 true WO2022160430A1 (en) 2022-08-04

Family

ID=75705880

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/081649 WO2022160430A1 (en) 2021-01-27 2021-03-19 Method for obstacle avoidance of robot in the complex indoor scene based on monocular camera

Country Status (2)

Country Link
CN (1) CN112767373B (en)
WO (1) WO2022160430A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115416047A (en) * 2022-09-02 2022-12-02 北京化工大学 Blind assisting system and method based on multi-sensor quadruped robot
CN116089798A (en) * 2023-02-07 2023-05-09 华东理工大学 Decoding method and device for finger movement
CN117593517A (en) * 2024-01-19 2024-02-23 南京信息工程大学 Camouflage target detection method based on complementary perception cross-view fusion network
CN117670184A (en) * 2024-01-31 2024-03-08 埃罗德智能科技(辽宁)有限公司 Robot scene simulation method and system applied to digital robot industrial chain
CN117707204A (en) * 2024-01-30 2024-03-15 清华大学 Unmanned aerial vehicle high-speed obstacle avoidance system and method based on photoelectric end-to-end network
CN117697769A (en) * 2024-02-06 2024-03-15 成都威世通智能科技有限公司 Robot control system and method based on deep learning
CN117830991A (en) * 2024-03-04 2024-04-05 山东大学 Multimode fusion-based four-foot robot complex scene sensing method and system

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240723A (en) * 2021-05-18 2021-08-10 中德(珠海)人工智能研究院有限公司 Monocular depth estimation method and device and depth evaluation equipment
CN113593035A (en) * 2021-07-09 2021-11-02 清华大学 Motion control decision generation method and device, electronic equipment and storage medium
CN114037050B (en) * 2021-10-21 2022-08-16 大连理工大学 Robot degradation environment obstacle avoidance method based on internal plasticity of pulse neural network
CN114581684B (en) * 2022-01-14 2024-06-18 山东大学 Active target tracking method, system and equipment based on semantic space-time representation learning
CN114526738B (en) * 2022-01-25 2023-06-16 中国科学院深圳先进技术研究院 Mobile robot visual navigation method and device based on deep reinforcement learning
CN115805595B (en) * 2023-02-09 2023-12-26 白杨时代(北京)科技有限公司 Robot navigation method and device and sundry cleaning robot
CN116382267B (en) * 2023-03-09 2023-09-05 大连理工大学 Robot dynamic obstacle avoidance method based on multi-mode pulse neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107553490A (en) * 2017-09-08 2018-01-09 深圳市唯特视科技有限公司 A kind of monocular vision barrier-avoiding method based on deep learning
CN109871011A (en) * 2019-01-15 2019-06-11 哈尔滨工业大学(深圳) A kind of robot navigation method based on pretreatment layer and deeply study
WO2020056299A1 (en) * 2018-09-14 2020-03-19 Google Llc Deep reinforcement learning-based techniques for end to end robot navigation
US20200166896A1 (en) * 2018-11-26 2020-05-28 Uber Technologies, Inc. Deep reinforcement learning based models for hard-exploration problems
CN111367282A (en) * 2020-03-09 2020-07-03 山东大学 Robot navigation method and system based on multimode perception and reinforcement learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104299244B (en) * 2014-09-26 2017-07-25 东软集团股份有限公司 Obstacle detection method and device based on monocular camera
CN107478220B (en) * 2017-07-26 2021-01-15 中国科学院深圳先进技术研究院 Unmanned aerial vehicle indoor navigation method and device, unmanned aerial vehicle and storage medium
CN111611869B (en) * 2020-04-25 2021-06-01 哈尔滨理工大学 End-to-end monocular vision obstacle avoidance method based on serial deep neural network
CN111311685B (en) * 2020-05-12 2020-08-07 中国人民解放军国防科技大学 Motion scene reconstruction unsupervised method based on IMU and monocular image
CN111949032A (en) * 2020-08-18 2020-11-17 中国科学技术大学 3D obstacle avoidance navigation system and method based on reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107553490A (en) * 2017-09-08 2018-01-09 深圳市唯特视科技有限公司 A kind of monocular vision barrier-avoiding method based on deep learning
WO2020056299A1 (en) * 2018-09-14 2020-03-19 Google Llc Deep reinforcement learning-based techniques for end to end robot navigation
US20200166896A1 (en) * 2018-11-26 2020-05-28 Uber Technologies, Inc. Deep reinforcement learning based models for hard-exploration problems
CN109871011A (en) * 2019-01-15 2019-06-11 哈尔滨工业大学(深圳) A kind of robot navigation method based on pretreatment layer and deeply study
CN111367282A (en) * 2020-03-09 2020-07-03 山东大学 Robot navigation method and system based on multimode perception and reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LINGPING GAO: "Agent Environment Perception and Control Decision Based on Deep Reinforcement Learning", MASTER THESIS, 13 May 2020 (2020-05-13), pages 1 - 60, XP055957664, DOI: 10.26991/d.cnki.gdllu.2020.002070 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115416047A (en) * 2022-09-02 2022-12-02 北京化工大学 Blind assisting system and method based on multi-sensor quadruped robot
CN116089798A (en) * 2023-02-07 2023-05-09 华东理工大学 Decoding method and device for finger movement
CN117593517A (en) * 2024-01-19 2024-02-23 南京信息工程大学 Camouflage target detection method based on complementary perception cross-view fusion network
CN117593517B (en) * 2024-01-19 2024-04-16 南京信息工程大学 Camouflage target detection method based on complementary perception cross-view fusion network
CN117707204A (en) * 2024-01-30 2024-03-15 清华大学 Unmanned aerial vehicle high-speed obstacle avoidance system and method based on photoelectric end-to-end network
CN117670184A (en) * 2024-01-31 2024-03-08 埃罗德智能科技(辽宁)有限公司 Robot scene simulation method and system applied to digital robot industrial chain
CN117670184B (en) * 2024-01-31 2024-05-03 埃罗德智能科技(辽宁)有限公司 Robot scene simulation method and system applied to digital robot industrial chain
CN117697769A (en) * 2024-02-06 2024-03-15 成都威世通智能科技有限公司 Robot control system and method based on deep learning
CN117697769B (en) * 2024-02-06 2024-04-30 成都威世通智能科技有限公司 Robot control system and method based on deep learning
CN117830991A (en) * 2024-03-04 2024-04-05 山东大学 Multimode fusion-based four-foot robot complex scene sensing method and system
CN117830991B (en) * 2024-03-04 2024-05-24 山东大学 Multimode fusion-based four-foot robot complex scene sensing method and system

Also Published As

Publication number Publication date
CN112767373B (en) 2022-09-02
CN112767373A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
WO2022160430A1 (en) Method for obstacle avoidance of robot in the complex indoor scene based on monocular camera
Ruan et al. Mobile robot navigation based on deep reinforcement learning
US10824947B2 (en) Learning method for supporting safer autonomous driving without danger of accident by estimating motions of surrounding objects through fusion of information from multiple sources, learning device, testing method and testing device using the same
EP3690727B1 (en) Learning method and learning device for sensor fusion to integrate information acquired by radar capable of distance estimation and information acquired by camera to thereby improve neural network for supporting autonomous driving, and testing method and testing device using the same
Tai et al. Towards cognitive exploration through deep reinforcement learning for mobile robots
JP7112752B2 (en) Method for detecting pseudo 3D bounding box, method for testing the same, device using method for detecting pseudo 3D bounding box, and device for testing the same
Ou et al. Autonomous quadrotor obstacle avoidance based on dueling double deep recurrent Q-learning with monocular vision
CN109964237A (en) Picture depth prediction neural network
CN108230361A (en) Enhance target tracking method and system with unmanned plane detector and tracker fusion
Sales et al. Adaptive finite state machine based visual autonomous navigation system
Han et al. Deep reinforcement learning for robot collision avoidance with self-state-attention and sensor fusion
Asadi et al. Building an integrated mobile robotic system for real-time applications in construction
EP3686776B1 (en) Method for detecting pseudo-3d bounding box to be used for military purpose, smart phone or virtual driving based on cnn capable of converting modes according to conditions of objects
CN105014675A (en) Intelligent mobile robot visual navigation system and method in narrow space
Dionísio et al. Nereon-an underwater dataset for monocular depth estimation
Zhu et al. Autonomous reinforcement control of visual underwater vehicles: Real-time experiments using computer vision
Mishra et al. A review on vision based control of autonomous vehicles using artificial intelligence techniques
Li et al. Driver drowsiness behavior detection and analysis using vision-based multimodal features for driving safety
Khalil et al. Integration of motion prediction with end-to-end latent RL for self-driving vehicles
Souza et al. Template-based autonomous navigation in urban environments
CN114326821A (en) Unmanned aerial vehicle autonomous obstacle avoidance system and method based on deep reinforcement learning
Wang et al. Autonomous docking of the USV using deep reinforcement learning combine with observation enhanced
Adiuku et al. Mobile robot obstacle detection and avoidance with NAV-YOLO
Xie et al. Multiple autonomous robotic fish collaboration
Dong Visual Guidance for Unmanned Aerial Vehicles with Deep Learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922029

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922029

Country of ref document: EP

Kind code of ref document: A1