CN111367282A - Robot navigation method and system based on multimode perception and reinforcement learning - Google Patents

Robot navigation method and system based on multimode perception and reinforcement learning Download PDF

Info

Publication number
CN111367282A
CN111367282A CN202010157337.9A CN202010157337A CN111367282A CN 111367282 A CN111367282 A CN 111367282A CN 202010157337 A CN202010157337 A CN 202010157337A CN 111367282 A CN111367282 A CN 111367282A
Authority
CN
China
Prior art keywords
robot
network
reinforcement learning
perception
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010157337.9A
Other languages
Chinese (zh)
Other versions
CN111367282B (en
Inventor
邓寒
黄学钦
张伟
宋然
李贻斌
顾建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202010157337.9A priority Critical patent/CN111367282B/en
Publication of CN111367282A publication Critical patent/CN111367282A/en
Application granted granted Critical
Publication of CN111367282B publication Critical patent/CN111367282B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0257Control of position or course in two dimensions specially adapted to land vehicles using a radar
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Electromagnetism (AREA)
  • Traffic Control Systems (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses a robot navigation method and system based on multimode perception and reinforcement learning, comprising the following steps: the method comprises the steps of obtaining RGB pictures of a scene observed by a robot at a set moment, and converting the RGB pictures into binary segmentation pictures by adopting a trained segmentation network; respectively collecting the laser radar data at the set moment and the speed measurement data of the robot; and inputting the binary segmentation map, the laser radar data and the speed measurement data of the robot into a trained multimode fusion depth network model to obtain an optimal operation strategy of the robot. The invention adopts a multimode mechanism to ensure more complete perception of the environment, and the RL-based method can directly learn a navigation strategy optimized around the surrounding environment in an infinite search space through online interaction, thereby generating flexible actions and improving the capability of avoiding collision.

Description

Robot navigation method and system based on multimode perception and reinforcement learning
Technical Field
The invention relates to the technical field of robot navigation, in particular to a robot navigation method and system based on multimode perception and reinforcement learning.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Autonomous navigation is a very important function of mobile robots, and some existing methods have demonstrated good performance in structured environments. However, it remains challenging to design a robot navigation system that is reliable for unstructured real-world environments, which often contain dynamic obstacles with unpredictable trajectories. This requires the robot to intelligently handle various interactions with obstacles in real time.
There are some efforts that rely on Deep Learning (DL) to address the challenges of robotic navigation in complex environments. However, DL-based approaches typically focus more on the perception of the environment, without explicit learned navigation strategies. A few DL-based approaches use offline annotation for direct learning strategies in an actual structured environment, but such annotations are not only time consuming and laborious, especially in large scale generation in unstructured environments, but are also constrained by a set of fixed, finite and discrete action states. Therefore, in a dynamic complex environment of the real world, the learned strategy may not meet the requirement of navigation.
In contrast, Reinforcement Learning (RL) directly learns the optimal strategy of the current environment through a reward mechanism. In fact, this is more consistent with human decision making, where a policy is formulated by interacting with the surrounding environment and the policy model is directly modified by trial and error based on the immediate response of the environment. Furthermore, the RL does not require supervised learning based specifically on policy annotations provided by human subjects, as it finds the best policy by maximizing the expected long-term rewards.
The inventors have found that the prior art uses radar data in the RL to learn the obstacle avoidance strategy. However, the sparse point cloud of the radar can only sense information of a specific height, cannot process a complex environment containing obstacles with any height and shape, and is not enough for training a robot navigation strategy model in the actual complex environment.
The prior art studies vision-based RL as an alternative method, however, the image obtained from the vision sensor cannot provide depth information unambiguously, and in vision-based navigation work, the gap between the simulated environment and the real environment is unavoidable. No matter how powerful the simulation engine is, the rendered image may not perfectly simulate the real world, and therefore a navigation system trained in a simulated environment may not perform as well in the real world, especially when it contains dynamic obstacles such as vehicles and pedestrians.
Disclosure of Invention
In view of the above, the invention provides a robot navigation method and system based on multimode sensing and reinforcement learning, which can realize reliable navigation and collision avoidance in a high-dynamic and crowded real-world environment by fusing knowledge obtained from RGB images and radar data through a deep reinforcement learning framework.
In order to achieve the above purpose, in some embodiments, the following technical solutions are adopted:
a robot navigation method based on multimode perception and reinforcement learning comprises the following steps:
the method comprises the steps of obtaining RGB pictures of a scene observed by a robot at a set moment, and converting the RGB pictures into binary (road and non-road) segmentation pictures by adopting a trained segmentation network;
respectively collecting the laser radar data at the set moment and the speed measurement data of the robot;
and inputting the binary segmentation map, the laser radar data and the speed measurement data of the robot into a trained multimode fusion depth network model to obtain an optimal operation strategy of the robot, thereby realizing the navigation of the robot.
The invention takes the segmentation map as the intermediate representation of the RGB image, the segmentation map ignores the disturbance of low-level image details, and keeps high consistency in simulation and reality environments all the time; the RGB image segmentation graph and the radar data are fused to serve as input characteristic data, and reliable navigation and collision avoidance in a high-dynamic and crowded real-world environment can be achieved.
In other embodiments, the following technical solutions are adopted:
a robot navigation system based on multi-mode perception and reinforcement learning comprises:
the device is used for acquiring RGB pictures of a scene observed by the robot at a set moment and converting the RGB pictures into binary segmentation pictures by adopting a trained segmentation network;
the device is used for respectively acquiring the laser radar data at the set moment and the speed measurement data of the robot;
and the device is used for inputting the binary segmentation map, the laser radar data and the speed measurement data of the robot into the trained multimode fusion depth network model to obtain the optimal operation strategy of the robot.
In other embodiments, the following technical solutions are adopted:
a robot, comprising: the robot comprises a robot body and a controller, wherein the controller is configured to execute the robot autonomous navigation method based on multimode perception and deep reinforcement learning, and realize navigation of a robot running path.
A computer readable storage medium, wherein a plurality of instructions are stored, the instructions are suitable for being loaded by a processor of a terminal device and executing the robot autonomous navigation method based on multimode perception and deep reinforcement learning.
Compared with the prior art, the invention has the beneficial effects that:
(1) integrity is perceived. The present invention employs a multi-modal mechanism to ensure a more complete perception of the environment than in a single-modal mode, since the image and radar data are complementary in various scenarios. This is crucial for RL-based policy modules to learn the correct navigation policy in a complex environment, since their learning process relies only on online perception.
(2) Model portability. Using RGB images directly may encounter the problem of transferring models learned in an emulated simulation environment composed of non-realistic renderings to a real-world environment. The present invention uses the segmentation map as an intermediate representation of the RGB image, which has a consistent appearance in both simulated and real scenes. Thus, it is possible to easily transfer from simulation to the real world without additional fine-tuning.
(3) And (6) strategy optimization. The DL-based approach is essentially to predict a potentially suitable strategy through offline training, which may not be the best choice for the current environment, since the set of action states for the search strategy is limited. In contrast, the RL-based approach of the present invention can directly learn a navigation strategy optimized around the surrounding environment in an infinite search space through online interaction, thereby generating flexible actions and improving its ability to avoid collisions.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a schematic view of a navigation model based on multi-modal perception and deep reinforcement learning according to an embodiment of the present invention;
FIG. 2 is a schematic view of a navigation framework based on multi-modal perception and deep reinforcement learning according to an embodiment of the present invention;
FIG. 3 is a flowchart of a robot autonomous navigation method based on multi-mode sensing and deep reinforcement learning according to an embodiment of the present invention;
FIGS. 4(a) - (d) are schematic diagrams of examples of simulation and real scenes, respectively, in an embodiment of the present invention;
FIG. 5 shows semantic segmentation results of a simulation scene and a real scene according to an embodiment of the present invention;
FIGS. 6(a) - (b) are the average rewards in a two-stage training scenario, respectively, in an embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular is intended to include the plural unless the context clearly dictates otherwise, and it should be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of features, steps, operations, devices, components, and/or combinations thereof.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example one
In one or more embodiments, a robot navigation method based on multi-modal perception and reinforcement learning is disclosed, as shown in fig. 2 and 3, and includes the following steps:
(1) the method comprises the steps of obtaining RGB pictures of a scene observed by a robot at a set moment, and converting the RGB pictures into binary segmentation pictures by adopting a trained segmentation network;
(2) respectively collecting the laser radar data at the set moment and the speed measurement data of the robot;
(3) and inputting the binary segmentation map, the laser radar data and the speed measurement data of the robot into a trained multimode fusion depth network model to obtain an optimal operation strategy of the robot.
Specifically, the embodiment of the invention designs a deep reinforcement learning framework, and fuses knowledge obtained from RGB images and radar data; referring to fig. 1, the method of the embodiment of the invention is trained in a simulated environment, and can reliably navigate and avoid collision in a high dynamic and crowded real world environment.
The method of the present embodiment will be described in detail below.
The implementation process of the method of the embodiment is divided into two parts: a perception part and a policy part.
The sensing part converts the RGB image into a semantic segmentation image, normalizes radar data, and then takes the measurement of the speed of the robot as the input of a strategy part based on reinforcement learning. The final outputs are the linear and angular velocities, which will be input into the robot controller, enabling navigation of the robot travel path.
(1) First, reinforcement learning will be explained:
1) the problem is expressed as follows: the reinforcement learning problem and the symbols to be used in the rest of the embodiment are first defined. The optimization of the navigation strategy is formulated as a markov decision process (POMDP) observable in the limited field of view portion. At each discrete time step t, the RL robot observes the current state st∈ S and at∈ A after a time step, the robot receives a reward r(s)t,at) And converted into a new state st+1∈ S. define the task as a limited field of view problem with Episode length of T time steps therefore, the process continues until T time steps or encounters an early termination signal, such as a collision or driving onto a sidewalkt|st;θπ):
Figure BDA0002404547060000061
Here the discount factor 0< γ < 1.
There are several classical algorithms for deep reinforcement learning. Such as DQN, DDPG, A3C and PPO. As a popular deep reinforcement learning algorithm for processing continuous actions in complex tasks, the near-end policy optimization (PPO) algorithm searches for the best policy by maximizing a surrogate objective function:
Figure BDA0002404547060000062
Figure BDA0002404547060000063
wherein
Figure BDA0002404547060000064
Is an estimator of the merit functionAnd epsilon is a hyperparameter. In this work, we use the parallel PPO algorithm to integrate the multi-modal fusion paradigm to train the robot.
2) And (3) reinforcement learning setting:
a state space. State stThe medicine consists of three parts: segmentation chart st1Radar state st2And a measurement state st3. The segmentation map is generated by a perception module. The radar state consists of the last three consecutive frames of radar data, while the measurement state represents the current linear velocity v and angular velocity w
The use of discrete movements to indicate movement may lead to easier training procedures, however, non-uniformities in both speed and steering make real world movement control more impractical, therefore, continuous values are used for v ∈ [0,2] m/s and w ∈ [ -1,1] rad/s.
And (4) reward design. Our goal is to avoid collisions while navigating in crowded environments (containing various static and dynamic objects) and to minimize the number of times the robot moves to other lanes or sidewalks. The robot should follow the right traffic rules and other traffic rules will be ignored.
To guide the robot to achieve this goal, the following six reward functions are designed.
First, to ensure that the robot is driving according to the right-hand driving rule, a negative reward is given when the robot walks to the opposite lane:
rlane1, otherwise rlane=0 (4)
Second, to encourage the robot to travel as quickly as possible, we will award rvArranged in proportion to the speed of travel of the robot, factor cv1.8. To improve the running smoothness of the robot, we will also award rwSet to be inversely proportional to the square of its angular velocity, factor cw-0.5. Thus, a large turning of the robot during driving will be heavily penalized.
rv=cv×v,
rw=cw×w2, (5)
Then, when the robot collides with other static or moving objects in the environment (e.g., roadblocks, pedestrians, or vehicles), it will be subjected to rcPenalty of (2):
rcnot more than-10, otherwise rc0 (6) furthermore, once the robot has driven onto the sidewalk, we will apply a large negative reward roff
roffNot more than-10, otherwise roff=0 (7)
Finally, to prevent the robot from getting stuck, we give a small constant penalty r at each decisiontimeSet to-0.1.
Thus, the total reward r (st, at) is defined as the sum of the above six terms:
r(st,at)=rlane+rv+rw+rc+roff+rtime(8)
and (4) terminating the conditions. There are three termination conditions: firstly, the robot collides with any obstacle; secondly, the robot runs to a sidewalk; and thirdly, the time step T of the current Episode is accumulated to 2000.
(2) Sensing part
The perception module aims to perceive the surrounding environment and alleviate the gap between the simulation environment and the real world. The main reason why the transfer of the learned strategy modules from the simulation environment to reality often fails is that it is difficult to accurately simulate certain factors in reality, such as texture, lighting, sensor noise, in the simulation environment. These factors result in large differences in image detail between the simulated and real-world environments. To address this issue, the present embodiment takes the segmentation map as a visual representation of the intermediate level and uses it as an input to the policy module. This is because the segmentation map ignores the perturbation of low-level image details and maintains high consistency throughout the simulation and reality environment. Thus, by monitoring the scope of learningAnd (3) training a segmentation network in a formula mode, and converting the original RGB image into a binary segmentation graph, namely a road and a non-road. Can be expressed as st1=fseg(Ot;θseg)
Here, fsegRepresents a segmentation model and thetasegA parameter representing it, OtIs an RGB picture of the scene observed by the robot at time t.
Theoretically, the segmentation model fsegA deep split network, such as GSCNN, is required. However, such networks typically do not utilize the on-board computing resources of the robot to enable real-time navigation. An alternative approach is to use a lightweight split network, such as ERFNet. However, such lightweight networks are difficult to generalize to complex environments due to the lack of sufficient training data. Therefore, a teacher student model is adopted to embed GSCNs into the training process of ERFNet so as to solve the problem.
In an implementation, the teacher network GSCNN is first trained using the public cityscaps dataset. The unlabeled RGB images collected in the real-world environment are then fed into the network to generate the segmentation map. The segmentation map generated by the GSCNN is used as a label for the unlabeled RGB image, which is combined with other labeled data to train the student network ERFNet. Finally, ERFNet is used for onboard image segmentation in both simulated and real-world environments.
Of course, in the present embodiment, the teacher network GSCNN and the student network ERFNet are only an example, and those skilled in the art may select other split networks as needed.
Semantic segmentation relying only on RGB images cannot provide accurate depth information important for autonomous navigation. Thus, the laser radar data st2A perception module is introduced. Compared to RGB images, the information obtained from radar is relatively robust in terms of differences between simulated and actual environments, since they are not sensitive to texture and illumination. And normalizing the radar data, and introducing three recent historical frames as the input of a strategy module. To obtain real-time feedback from the robot, measurements including linear velocity v and angular velocity w of the proxy are introduced as another sensory input st3. Finally, the perception module synthesizes the state stOutput to a policy module, where st=(st1,st2,st3) Representing a state consisting of a segmentation map, radar data and measurements.
In essence, the multimodal awareness module extracts structural representations of complex unstructured environments through image segmentation, which helps the reinforcement learning robot to better understand the high-level semantics of the environment and expedite the search for optimal strategies. In addition, lidar provides accurate depth information of the environment, and the combination of segmentation maps and radar data complements the representation of the environment, so they ensure that a full perception is obtained using richer information, thus significantly benefiting the policy module.
(3) Policy component
The strategy module aims to find the optimal navigation strategy through reinforcement learning. In many cases, strategies learned through single-modality data are not robust enough due to the inherent limitations of each sensing technique. Thus, as shown in FIG. 2, multimodal data is utilized in which learned multimodal features are fused into a deep network. Firstly, the strategy module takes the segmentation map and radar data provided by the sensing module as input and respectively extracts the characteristics of the segmentation map and radar data. And then fusing the characteristics and the measured value, and outputting the strategy pi through a fully-connected network. Finally, the strategy pi is optimized using a parallel PPO algorithm. May be expressed as pi ═ ffus(Ft1,Ft2,st3;θf)
Wherein Ft1And Ft2Outputs representing two channels of segmentation maps and radar data in the processing strategy module, respectively, andfusrepresenting having a learnable parameter thetafThe full connection layer and the ReLU activation layer, which are fused to output the policy pi.
Notably, policy networks based on multimodal fusion schemes are difficult to learn due to the expansion of the state space. To solve this problem, the present embodiment proposes a mode separation learning method as an auxiliary training tool. We note that radar feature extraction in a simulated environment is highly consistent with the real world.
Therefore, an obstacle avoidance network is introduced, and a radar-based obstacle avoidance strategy is trained in a simulation environment. In the embodiment, the powerful learning robot is trained on the Stage simulator by utilizing the radar-based network model disclosed in the prior art. As shown in fig. 4(a), a new training scenario is established on the simulator. In this simulation environment, the robot is first trained using a radar-based obstacle avoidance network. Then, the corresponding layer network parameters for extracting radar data features are migrated to the multi-mode fusion model and fixed. In this way, the size of the feature to be learned is greatly reduced.
In addition, the present embodiment divides the navigation task into two subtasks: obey traffic regulations and avoid obstacles. Because it is very difficult for the reinforcement learning robot to directly learn the driving strategy in a complex environment containing a large amount of information (such as placing static and/or dynamic objects at the same time), in order to ensure that the multi-mode fusion strategy can effectively learn the above tasks, the present embodiment adopts a simple to complex course learning training paradigm.
Such an example enables the robot to learn quickly through a relatively simple driving task.
In a simulation environment, no obstacle is added on a road, and the robot continuously tries and learns to quickly learn a simple driving task by applying a navigation model of multi-mode sensing and deep reinforcement learning based on data such as real-time collected images, laser radars and the like; for example, how to drive along roads and understand traffic regulations, etc.; after the robot has achieved reliable performance, the simple task is stopped from training and then a phase of complex tasks is entered where a large number of vehicles, pedestrians and roadblocks are added. The reinforcement learning robot is continuously trained to learn the optimal driving strategy so that the robot can avoid potential collision in the driving process.
(4) Results of the experiment
The embodiment performs robot navigation experiments in simulation and real-world environments to prove the superiority of the method of the embodiment over the baseline method.
1) Details of the Experimental setup and implementation
① simulation environment Stage simulator is an open source mobile robot simulator providing a virtual world consisting of mobile robots and sensors.A radar based obstacle avoidance strategy was first trained in Stage, where robots 2.3 meters long and 1.4 meters wide were used.additionally, only 8 robots were trained, as shown in FIG. 4(a), where each laser-bearing rectangle represents one robot.6000 Episodes were trained for a total of 12 hours. the RL framework proposed in this example with the multimode fusion scheme was finally trained in CARLA, which is an open source city scene simulator for autopilot based on Unreal game engine.CARLA currently provides seven highly realistic complex city scenes, as shown in FIG. 4 (b). in our experiments, Town 2 scenes under clear day conditions were used for strategy training, while Town 1 scenes were used as an environment not seen by the robots to evaluate it for the perception of the network, the use of the autonomy image collection and the corresponding collection of the remote control images of the robot drift by remote control of RGB frequencies.5. Carla, the collection of the corresponding to the remote control of the robot drift frequency.
② real environment fig. 4(c) and 4(D) show real outdoor and indoor environment respectively.a teacher student model is used to convert the image of the real world scene into semantic segmentation labels when the experiment is performed in the real world, a raspberry-based RGB camera with height of 1.2 meters, downward tilt of 12 ° and field of view of 60 ° is installed on the robot, 1.9K raw RGB images are finally collected by a remote controller at a frequency of 3 HZ.
Table 1: hyper-parameter settings
Parameter(s) Value of
Discount(γ) 0.99
GAE parameter(λ) 0.95
Clipping(ε) 0.1
Horizon(T) 256
Entropy coeff 0.01
Minibatch size 256 (simple phase), 1024 (complex phase)
Num.epochs 4
Learning rate 3e-4
Implementation details for the perception part the goal of this embodiment is to obtain "road" and "non-road" regions, so the labels of the semantic segmentation are divided into two categories, road and non-road.then, the public cityscaps data set and the simulation data are combined with the real scene data to train erfnet.resize all image data to 84 × 84, batch size set to 48, model is trained for 500 iterations.adam optimizer is used with initial learning rate of 0.001 and after 150 iterations it is reduced to 0.0001. weight decay set to 0.0002. fig. 5 shows the semantic segmentation results of the simulation and real scenes, where the first row in fig. 5 is the simulation scene in cara, the second row is the indoor scene of the real world, the third row is the outdoor scene of the real world.
2) Baseline method and assessment index
① baseline method in the experiment, the method of this example was compared to 4 baseline methods as follows:
DroNet: this is an existing DL-based approach, using only RGB image predictive control strategies. Although it has been applied to the task of robot navigation in real-world environments, it suffers from the problem of hesitation since motion is usually terminated when an obstacle is encountered.
DroNet + LiDAR: LiDAR is currently the best obstacle avoidance method based on RL that relies only on radar data and ensures that a moving robot can avoid colliding with an obstacle. Therefore, we provide a baseline in combination with DroNet and LiDA that can enable autonomous navigation and obstacle avoidance in a real-world environment.
SEG + LiDAR-ini: this is an ablative version of the method of this embodiment, where the radar feature extraction part is trained from scratch, in contrast to the model, which is pre-trained and fixed by the obstacle avoidance network in the method of this embodiment.
SEG-only: another ablation version of the method of the present embodiment is shown, which uses only the segmentation map as a network input, without taking radar data into account. It is therefore essentially a single-mode RL method.
② evaluation index the robot starts at a random location in each test Episode with termination conditions of 1) any collision, 2) maximum time to reach (100 seconds for simulation scenarios; 150 seconds for outdoor scenarios, 50 seconds for indoor scenarios), and 3) termination of the robot navigation when it reaches a specified area.
The distance metric shows the length of the robot drive in Episode. The total time records how long the robot has been driven in the Episode. The average speed of the robot is also reported, which reflects whether it can effectively bypass an obstacle. Furthermore, the off-lane time is recorded, which is defined as the percentage of time that the robot appears on other lanes or sidewalks throughout the driving activity.
3) Evaluation in a simulation environment
① ablation study of training procedure:
simple stage without obstacle. Fig. 6(a) shows the learning curves of three versions of the method of the present embodiment in a simple phase from a simple to a complex learning paradigm. At this stage, since the surroundings of the robot have no obstacles, the learning of the segmentation map that divides the RGB image into "road" and "non-road" areas provides the most important clue for the robot navigation. Thus, in fig. 6(a), the SEG-only version was observed to perform well. The full version of the method of this embodiment also has considerable performance compared to SEG-only. This is because in this version, the layers used to learn depth features from the LiDAR data have fixed parameters, and thus the entire network is ultimately trained to learn the segmentation maps to the greatest extent possible. In contrast, training of SEG + LiDAR-ini ends with a network that is not optimal for learning segmented graphs, but can impair the learning of LiDAR data. Thus, a control strategy learned through the SEG + LiDAR-ini network cannot achieve a high average return in an unobstructed environment.
Complex stages with static and/or dynamic obstacles.
Fig. 6(b) shows the learning curve for a complex phase where the robot often encounters static and/or dynamic obstacles in the scene. We observed that the performance of the complete version of the method of this embodiment is significantly better than SEG + LiDAR-ini in terms of training speed and average reward. Such results also demonstrate the benefit of a modality-separated learning scheme in which a collision avoidance network is first pre-trained based only on LiDAR data, and then the learning parameters of the network are transferred into a multi-modal fusion model. In practice, navigation provided by SEG + LiDAR-ini presents a series of problems, including moving to the left of the road, an uneven travel path, and unexpected turns, among others. The full version of the method of this embodiment is also superior to the ablative version of SEG-only as indicated by the training speed. In fact, the robot velocity v output by SEG-only is much worse because it follows a positive state distribution with a mean of 0 and a variance of 9. This means that the robot must frequently stop and/or significantly slow down when encountering obstacles. In contrast, the average value of the v output distribution of the complete method based on the multi-mode fusion scheme of the present embodiment is 2, and the variance is 0.1, which indicates that navigation is ideal because the robot can quickly bypass obstacles and safely avoid potential collisions.
② quantitative comparison to baseline:
the suggested method is compared to a baseline method by a city driving simulator. To test whether the autonomous navigation system can handle crowded scenarios involving static and dynamic objects, different tasks were devised to evaluate the robot's reaction patterns in different scenarios.
Task: 1) there are no obstacles in the scene. In this task, the robot follows traffic rules to the right while driving; 2) various static obstacles are included in the scene, including stationary vehicles, boxes, bins and trash bins. In the task, the robot is trained to avoid collision in driving; 3) the scene contains various dynamic obstacles including vehicles and pedestrians. The start position settings of these three tasks are the same. Each experiment was performed 3 times and the average results are reported.
As a result: as shown in Table 2, experiments have shown that the method and baseline of the present embodiment can be extended in distance without an obstruction in the training environment (i.e., task 1). Even though the driving strategies of DroNet and DroNet + LiDAR are relatively conservative because their average speed is low. In an invisible test environment, the drive speed of a robot using DroNet is very slow, as it typically takes time to identify shadows on the road (e.g., shadows of street lights, buildings, and the robot itself). Furthermore, as DroNet is generally oriented to adapt to the training site, the time that the robot appears on other lanes is greatly increased when testing in an invisible environment.
TABLE 2 quantitative comparison in a simulation Environment
Figure BDA0002404547060000161
In task 2, the method of the present embodiment can better avoid collisions with static obstacles in training and testing environments, while DroNet is largely unable to avoid collisions when encountering obstacles, typically resulting in stationary or accidental collisions. While DroNet + LiDAR may improve obstacle avoidance performance, it is still not as effective as the method of this embodiment. Thus, the present embodiment method corresponds to a longer travel distance and a higher average speed.
In task 3, the robot is not always in a standstill state because the vehicle and the pedestrian are dynamic. For the DroNet method, the average distance of robot motion is longer than in task 2. However, since the dynamic scene is more complicated, the possibility of collision increases, and the average driving time of the robot becomes shorter.
4) Assessment in real environments
Here, a strategic model trained using only the cara simulator is deployed directly into the real world to verify its robustness and generalization capability.
① quantitative comparisons were made in outdoor scenarios:
the methods of DroNet and DroNet + LiDAR are compared across multiple campus lanes of a challenging surrounding environment (including tight turns). Some of the test protocols are shown at the bottom of figure 5. As shown in table 3, in the results of table 3, the maximum linear velocity of the actual robot was 1 m/s; let the robot run 150 seconds in outdoor scenes and 50 seconds in indoor scenes, respectively, and report the navigation distance that the robot eventually covered.
The method of this embodiment and both baselines can accomplish the task in a simple scheme with little dynamic obstruction, while the method of this embodiment outperforms the baseline method in navigation distance. The distance the robot can travel in a fixed time. In a crowded environment, neither DroNet nor DroNet + LiDAR can drive a long distance path as with the method of the present embodiment. This is mainly because the method of the present embodiment can find a safe navigation path in a scene with highly dynamic objects, while a robot using either baseline method moves slowly due to the high probability of collision.
TABLE 3 quantitative comparison in real world
Figure BDA0002404547060000181
② quantitative comparison in indoor scenes:
to demonstrate that the multi-modal approach can separate the perception part from the strategy part, experiments were performed in an indoor environment, and the middle row of fig. 5 shows some test scenarios. This is challenging because the training scenario is based on a simulated outdoor environment cara. In implementation, the ERFNet segmentation model is retrained using the indoor corridor images. When testing is performed in an indoor environment, the semantic segmentation model used in the above experiment is replaced with a retrained model, and the strategy model remains unchanged. The results show that although the indoor environment is very different from the scenario used to train the policy model, the multimodal policy model of the present embodiment is still superior to the baseline in the indoor environment, especially in crowded hallways where space is limited.
In summary, the present embodiment proposes a multi-mode fusion scheme in the policy section to utilize both image and radar data. However, due to the larger state space, multi-modal strategy learning is more difficult than single-modal strategy learning. Moreover, the RL has difficulty learning effective strategies in real-world environments with various dynamic barriers. Therefore, the difficulty of multi-modal strategy learning is reduced in three ways.
First, the policy section learns the image and radar data separately. And training an obstacle avoidance strategy based on the radar, then transferring the component for extracting the radar features to the multimode strategy module, and simultaneously directly inputting a semantic segmentation graph derived from the RGB image into the multimode strategy module.
Second, training is performed from simple to complex. In a simple phase, training in an environment without obstacles, so the RL robot can quickly and efficiently learn about various driving tasks and traffic regulations. Then, at a complex stage, the robot is focused on learning reliable collision avoidance strategies in crowded scenes (including static and dynamic obstacles).
Thirdly, six reward functions are designed based on traffic rules, collision punishment and speed smoothness.
Considering that the robot is equipped with limited computing resources, for the perception part, a lightweight neural network is employed, which can reliably divide the RGB image into road and non-road regions. The generated segmentation map may be viewed as a medium-level visual feature of the scene and shows a consistent appearance in both the simulated scene and the real scene. However, training networks require sufficiently large data sets that otherwise cannot be generalized well due to the diversity of real-world environments. Therefore, teacher student models are adopted to extract segmentation knowledge, and generalization capability of the network is improved.
Example two
In one or more embodiments, disclosed is a multi-modal perception and reinforcement learning-based robot navigation system, comprising:
the device is used for acquiring RGB pictures of a scene observed by the robot at a set moment and converting the RGB pictures into binary segmentation pictures by adopting a trained segmentation network;
the device is used for respectively acquiring the laser radar data at the set moment and the speed measurement data of the robot;
and the device is used for inputting the binary segmentation map, the laser radar data and the speed measurement data of the robot into the trained multimode fusion depth network model to obtain the optimal operation strategy of the robot.
The specific implementation manner of the device adopts the method disclosed in the first embodiment, and details are not described again.
EXAMPLE III
In one or more embodiments, a terminal device is disclosed, which includes a server, where the server includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the program, the robot autonomous navigation method based on multimode sensing and deep robust learning disclosed in the first embodiment is implemented, and details are not repeated for the sake of brevity.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. The memory may also store information of the device type, for example.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.
The method can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software modules may reside in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the elements of the various examples, i.e., the algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or in combination with computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A robot navigation method based on multimode perception and reinforcement learning is characterized by comprising the following steps:
the method comprises the steps of obtaining RGB pictures of a scene observed by a robot at a set moment, and converting the RGB pictures into binary segmentation pictures by adopting a trained segmentation network;
respectively collecting the laser radar data at the set moment and the speed measurement data of the robot;
and inputting the binary segmentation map, the laser radar data and the speed measurement data of the robot into a trained multimode fusion depth network model to obtain an optimal operation strategy of the robot, thereby realizing the navigation of the robot.
2. The robot navigation method based on multi-modal perception and reinforcement learning as claimed in claim 1, wherein the training process for the segmentation network specifically comprises:
training a teacher network using the published semantically segmented data set;
feeding unlabeled RGB images collected in a real environment into a trained teacher network to generate a binary segmentation map;
training a student network by using the generated binary segmentation graph as a label of the unmarked RGB image and adding a data set collected by a simulation environment and a public semantic segmentation data set;
and taking the trained student network as a final segmentation network to segment the RGB picture.
3. The robot navigation method based on multi-mode perception and reinforcement learning of claim 2, wherein the teacher network selects a GSCNN network, and the student network selects an ERFNet network.
4. The robot navigation method based on multi-mode perception and reinforcement learning as claimed in claim 1, wherein for the training process of the multi-mode fusion deep network model, specifically:
introducing a radar obstacle avoidance network, and training an obstacle avoidance strategy based on laser radar data in a simulation environment;
migrating the trained obstacle avoidance network parameters to the multi-mode fusion depth network model and fixing;
in a simulation environment, a multi-mode fusion deep network model is trained by adopting a simple to complex training process.
5. The robot navigation method based on multi-mode perception and reinforcement learning as claimed in claim 4, wherein in a simulation environment, a simple to complex training process is adopted to train the multi-mode fusion deep network model, and the specific process is as follows:
in a simulation environment, no barrier is added on a road, and the robot continuously tries and mistakes by a reinforcement learning method to quickly learn a simple driving task;
after the robot reaches the set performance, dynamic and static interference factors are added in the simulation environment, and the robot is continuously trained to learn the optimal driving strategy to avoid potential collision.
6. The multi-modal awareness and reinforcement learning-based robot navigation method of claim 5, wherein the simple driving task comprises: travel along roads and understand traffic regulations.
7. The robot navigation method based on multimode perception and reinforcement learning as claimed in claim 1, wherein the speed metric data of the robot comprises: linear and angular velocities.
8. A robot navigation system based on multimode perception and reinforcement learning is characterized by comprising:
the device is used for acquiring RGB pictures of a scene observed by the robot at a set moment and converting the RGB pictures into binary segmentation pictures by adopting a trained segmentation network;
the device is used for respectively acquiring the laser radar data at the set moment and the speed measurement data of the robot;
and the device is used for inputting the binary segmentation map, the laser radar data and the speed measurement data of the robot into the trained multimode fusion depth network model to obtain the optimal operation strategy of the robot.
9. A robot, comprising: the robot body and the controller are characterized in that the controller is configured to execute the robot autonomous navigation method based on the multimode perception and the deep reinforcement learning according to any one of claims 1-7, and realize navigation of a robot running path.
10. A computer-readable storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor of a terminal device and to execute the method for robot autonomous navigation based on multi-modal perception and deep reinforcement learning of any one of claims 1-7.
CN202010157337.9A 2020-03-09 2020-03-09 Robot navigation method and system based on multimode perception and reinforcement learning Active CN111367282B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010157337.9A CN111367282B (en) 2020-03-09 2020-03-09 Robot navigation method and system based on multimode perception and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010157337.9A CN111367282B (en) 2020-03-09 2020-03-09 Robot navigation method and system based on multimode perception and reinforcement learning

Publications (2)

Publication Number Publication Date
CN111367282A true CN111367282A (en) 2020-07-03
CN111367282B CN111367282B (en) 2022-06-07

Family

ID=71208662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010157337.9A Active CN111367282B (en) 2020-03-09 2020-03-09 Robot navigation method and system based on multimode perception and reinforcement learning

Country Status (1)

Country Link
CN (1) CN111367282B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111988220A (en) * 2020-08-14 2020-11-24 山东大学 Multi-target disaster backup method and system among data centers based on reinforcement learning
CN111975769A (en) * 2020-07-16 2020-11-24 华南理工大学 Mobile robot obstacle avoidance method based on meta-learning
CN112114592A (en) * 2020-09-10 2020-12-22 南京大学 Method for realizing autonomous crossing of movable frame-shaped barrier by unmanned aerial vehicle
CN112130570A (en) * 2020-09-27 2020-12-25 重庆大学 Blind guiding robot of optimal output feedback controller based on reinforcement learning
CN112270306A (en) * 2020-11-17 2021-01-26 中国人民解放军军事科学院国防科技创新研究院 Unmanned vehicle track prediction and navigation method based on topological road network
CN112304314A (en) * 2020-08-27 2021-02-02 中国科学技术大学 Distributed multi-robot navigation method
CN112965081A (en) * 2021-02-05 2021-06-15 浙江大学 Simulated learning social navigation method based on feature map fused with pedestrian information
CN112966591A (en) * 2021-03-03 2021-06-15 河北工业职业技术学院 Knowledge map deep reinforcement learning migration system for mechanical arm grabbing task
CN113093779A (en) * 2021-03-25 2021-07-09 山东大学 Robot motion control method and system based on deep reinforcement learning
CN113848750A (en) * 2021-09-14 2021-12-28 清华大学 Two-wheeled robot simulation system and robot system
WO2022160430A1 (en) * 2021-01-27 2022-08-04 Dalian University Of Technology Method for obstacle avoidance of robot in the complex indoor scene based on monocular camera
CN114859940A (en) * 2022-07-05 2022-08-05 北京建筑大学 Robot movement control method, device, equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024019107A1 (en) * 2022-07-22 2024-01-25 ソニーグループ株式会社 Multiple-robot control method, multiple-robot control device, and multiple-robot control system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105157697A (en) * 2015-07-31 2015-12-16 天津大学 Indoor mobile robot pose measurement system and measurement method based on optoelectronic scanning
CN109087303A (en) * 2018-08-15 2018-12-25 中山大学 The frame of semantic segmentation modelling effect is promoted based on transfer learning
CN109506658A (en) * 2018-12-26 2019-03-22 广州市申迪计算机系统有限公司 Robot autonomous localization method and system
CN109764876A (en) * 2019-02-21 2019-05-17 北京大学 The multi-modal fusion localization method of unmanned platform
CN110006435A (en) * 2019-04-23 2019-07-12 西南科技大学 A kind of Intelligent Mobile Robot vision navigation system method based on residual error network
CN110245567A (en) * 2019-05-16 2019-09-17 深圳前海达闼云端智能科技有限公司 Barrier-avoiding method, device, storage medium and electronic equipment
CN110243370A (en) * 2019-05-16 2019-09-17 西安理工大学 A kind of three-dimensional semantic map constructing method of the indoor environment based on deep learning
CN110320883A (en) * 2018-03-28 2019-10-11 上海汽车集团股份有限公司 A kind of Vehicular automatic driving control method and device based on nitrification enhancement
CN110781976A (en) * 2019-10-31 2020-02-11 重庆紫光华山智安科技有限公司 Extension method of training image, training method and related device
CN110795821A (en) * 2019-09-25 2020-02-14 的卢技术有限公司 Deep reinforcement learning training method and system based on scene differentiation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105157697A (en) * 2015-07-31 2015-12-16 天津大学 Indoor mobile robot pose measurement system and measurement method based on optoelectronic scanning
CN110320883A (en) * 2018-03-28 2019-10-11 上海汽车集团股份有限公司 A kind of Vehicular automatic driving control method and device based on nitrification enhancement
CN109087303A (en) * 2018-08-15 2018-12-25 中山大学 The frame of semantic segmentation modelling effect is promoted based on transfer learning
CN109506658A (en) * 2018-12-26 2019-03-22 广州市申迪计算机系统有限公司 Robot autonomous localization method and system
CN109764876A (en) * 2019-02-21 2019-05-17 北京大学 The multi-modal fusion localization method of unmanned platform
CN110006435A (en) * 2019-04-23 2019-07-12 西南科技大学 A kind of Intelligent Mobile Robot vision navigation system method based on residual error network
CN110245567A (en) * 2019-05-16 2019-09-17 深圳前海达闼云端智能科技有限公司 Barrier-avoiding method, device, storage medium and electronic equipment
CN110243370A (en) * 2019-05-16 2019-09-17 西安理工大学 A kind of three-dimensional semantic map constructing method of the indoor environment based on deep learning
CN110795821A (en) * 2019-09-25 2020-02-14 的卢技术有限公司 Deep reinforcement learning training method and system based on scene differentiation
CN110781976A (en) * 2019-10-31 2020-02-11 重庆紫光华山智安科技有限公司 Extension method of training image, training method and related device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王大方: "基于深度强化学习的机器人导航研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111975769A (en) * 2020-07-16 2020-11-24 华南理工大学 Mobile robot obstacle avoidance method based on meta-learning
CN111988220B (en) * 2020-08-14 2021-05-28 山东大学 Multi-target disaster backup method and system among data centers based on reinforcement learning
CN111988220A (en) * 2020-08-14 2020-11-24 山东大学 Multi-target disaster backup method and system among data centers based on reinforcement learning
CN112304314A (en) * 2020-08-27 2021-02-02 中国科学技术大学 Distributed multi-robot navigation method
CN112114592B (en) * 2020-09-10 2021-12-17 南京大学 Method for realizing autonomous crossing of movable frame-shaped barrier by unmanned aerial vehicle
CN112114592A (en) * 2020-09-10 2020-12-22 南京大学 Method for realizing autonomous crossing of movable frame-shaped barrier by unmanned aerial vehicle
CN112130570A (en) * 2020-09-27 2020-12-25 重庆大学 Blind guiding robot of optimal output feedback controller based on reinforcement learning
CN112130570B (en) * 2020-09-27 2023-03-28 重庆大学 Blind guiding robot of optimal output feedback controller based on reinforcement learning
CN112270306A (en) * 2020-11-17 2021-01-26 中国人民解放军军事科学院国防科技创新研究院 Unmanned vehicle track prediction and navigation method based on topological road network
CN112270306B (en) * 2020-11-17 2022-09-30 中国人民解放军军事科学院国防科技创新研究院 Unmanned vehicle track prediction and navigation method based on topological road network
WO2022160430A1 (en) * 2021-01-27 2022-08-04 Dalian University Of Technology Method for obstacle avoidance of robot in the complex indoor scene based on monocular camera
CN112965081A (en) * 2021-02-05 2021-06-15 浙江大学 Simulated learning social navigation method based on feature map fused with pedestrian information
CN112965081B (en) * 2021-02-05 2023-08-01 浙江大学 Simulated learning social navigation method based on feature map fused with pedestrian information
CN112966591A (en) * 2021-03-03 2021-06-15 河北工业职业技术学院 Knowledge map deep reinforcement learning migration system for mechanical arm grabbing task
CN113093779A (en) * 2021-03-25 2021-07-09 山东大学 Robot motion control method and system based on deep reinforcement learning
CN113848750A (en) * 2021-09-14 2021-12-28 清华大学 Two-wheeled robot simulation system and robot system
CN114859940A (en) * 2022-07-05 2022-08-05 北京建筑大学 Robot movement control method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111367282B (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN111367282B (en) Robot navigation method and system based on multimode perception and reinforcement learning
US20220212693A1 (en) Method and apparatus for trajectory prediction, device and storage medium
Ma et al. Artificial intelligence applications in the development of autonomous vehicles: A survey
Le Mero et al. A survey on imitation learning techniques for end-to-end autonomous vehicles
Sauer et al. Conditional affordance learning for driving in urban environments
Van Brummelen et al. Autonomous vehicle perception: The technology of today and tomorrow
Hu et al. Safe local motion planning with self-supervised freespace forecasting
KR20210074366A (en) Autonomous vehicle planning and forecasting
Haavaldsen et al. Autonomous vehicle control: End-to-end learning in simulated urban environments
Sharma et al. Pedestrian intention prediction for autonomous vehicles: A comprehensive survey
Zhao et al. Autonomous driving system: A comprehensive survey
Wang et al. Imitation learning of hierarchical driving model: from continuous intention to continuous trajectory
US11556126B2 (en) Online agent predictions using semantic maps
Zhu et al. Learning autonomous control policy for intersection navigation with pedestrian interaction
Youssef et al. Comparative study of end-to-end deep learning methods for self-driving car
CN116448134B (en) Vehicle path planning method and device based on risk field and uncertain analysis
Chen Extracting cognition out of images for the purpose of autonomous driving
Tippannavar et al. SDR–Self Driving Car Implemented using Reinforcement Learning & Behavioural Cloning
Souza et al. Template-based autonomous navigation and obstacle avoidance in urban environments
Souza et al. Vision-based autonomous navigation using neural networks and templates in urban environments
EP4124995A1 (en) Training method for training an agent for controlling a controlled device, control method for controlling the controlled device, computer program(s), computer readable medium, training system and control system
US20210383213A1 (en) Prediction device, prediction method, computer program product, and vehicle control system
CN111975775B (en) Autonomous robot navigation method and system based on multi-angle visual perception
Li et al. RDDRL: a recurrent deduction deep reinforcement learning model for multimodal vision-robot navigation
Schörner et al. Towards Multi-Modal Risk Assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant