CN111857107A

CN111857107A - Auxiliary mobile robot navigation control system and method based on learning component library

Info

Publication number: CN111857107A
Application number: CN202010522452.1A
Authority: CN
Inventors: 孙长银; 何子辰; 董璐; 陈启军; 王嘉伟
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2020-10-30
Anticipated expiration: 2040-06-10
Also published as: CN111857107B

Abstract

The invention discloses a system and a method for controlling navigation of an auxiliary mobile robot based on a learning component library, wherein the system comprises the learning component library, and the learning component library comprises: the system comprises an initialization component, an environment modeling component, a path planning component, a core algorithm component, a testing component, an optimization component and a visualization component. The components in the invention are mutually interacted and flexibly called, and a plurality of simulation or practically applied reinforcement learning training and visualization closed-loop learning systems which are suitable for different algorithm strategies of navigation task scenes can be quickly constructed according to the type of the mobile robot in the using process.

Description

Auxiliary mobile robot navigation control system and method based on learning component library

Technical Field

The invention relates to an auxiliary mobile robot navigation control system and method based on a learning component library, and belongs to the technical field of robot control.

Background

In recent years, with the development of robotics, function-assisted mobile robots have been widely used in various fields such as agriculture, commerce, logistics, medical assistance, and military industry. For example, during the new coronavirus epidemic situation in China, the auxiliary mobile robot plays an important role in the aspects of hospitals, community disinfection, express logistics distribution, body temperature detection, intelligent inquiry of isolation areas and the like by virtue of the autonomy of the auxiliary mobile robot, and the epidemic prevention and counseling process of China is promoted.

The auxiliary mobile robot is an integrated system which integrates environment perception, autonomous positioning, path planning, bottom layer navigation control and specific auxiliary function execution. Taking a mobile robot for executing a public place disinfection task in an epidemic situation as an example, in the process of executing disinfection work, the mobile robot acquires regional environment information needing disinfection through various external sensors carried by the mobile robot, such as a monocular vision camera, a binocular vision camera, a laser radar, a millimeter wave radar, an ultrasonic sensor and the like; then, estimating the global position and attitude information of the self in the current area by combining the internal sensors of the self, such as an inertial sensor, a GPS and the like; on the basis of the two steps, a path planning algorithm such as an artificial potential field method, a heuristic fast expansion random tree and the like is used in combination with specific task requirements to plan an optimal path from an initial position to a target position; and finally, combining self dynamics and kinematics characteristics, actuator characteristics and chassis driving configuration, and carrying out accurate navigation tracking control on the planned track through a bottom layer navigation controller so that the mobile robot runs according to a pre-planned path.

However, the traditional navigation control method at present lacks a specific simulation platform of an auxiliary mobile robot, and has complex and tedious configuration training process and lacks systematicness; and each reinforcement learning navigation control algorithm is established on a specific robot and a specific scene at the present stage, and the reinforcement learning environment establishment method of the simulation scene is different from that of the actual scene, so that the flexibility is lacked.

Disclosure of Invention

Aiming at the technical problems in the traditional system-assisted mobile robot navigation control method, the invention provides the learning component library-based auxiliary mobile robot navigation control method, which is convenient for a user to rapidly build a reinforcement learning closed-loop control system according to the self requirement and is convenient for parameter debugging and performance optimization.

The invention adopts the following technical scheme.

In one aspect, the present invention provides an auxiliary mobile robot navigation control system based on a learning component library, including the learning component library, where the learning component library includes: the system comprises an initialization component, an environment modeling component, a path planning component, a core algorithm component, a testing component, an optimization component and a visualization component; the initialization component is used for completing initialization of a state space and an action space corresponding to a specific mobile robot type and setting up a reward function; the environment modeling component is used for reading and processing sensor data carried by the mobile robot, determining global position data of the positioning robot and establishing a virtual environment interacting with the mobile robot when a simulation task is performed; the path planning component is used for providing a selectable path planning algorithm to realize optimal navigation path planning; the core algorithm component is used for providing a plurality of reinforcement learning algorithms for selection, outputting a controller instruction by matching with a bottom-layer control algorithm component or directly outputting the controller instruction, and obtaining current information again through the environment modeling component after action so as to complete reinforcement learning closed-loop control; the testing component is used for providing a selected perturbation method so as to test the performance of the reinforcement learning algorithm determined by the core algorithm component; the optimization component is used for providing a selected optimization algorithm to adjust the selected parameters of the reinforcement learning algorithm determined by the core algorithm component so as to improve the performance of the navigation control algorithm; the visualization component is used for visualizing the output numerical values of the core algorithm component and the test component.

Furthermore, the core component library comprises a same strategy module, a different strategy module and a comprehensive strategy module, wherein the same strategy module is used for packaging a reinforcement learning algorithm of the same strategy, and the different strategy is used for packaging a reinforcement learning algorithm of the different strategy; the comprehensive strategy module is used for packaging a comprehensive strategy algorithm, and the comprehensive strategy algorithm is a data-driven strengthening algorithm for synthesizing the same strategy and the different strategies. The comprehensive strategy algorithm comprises the following steps: the adaptive capacity of the reinforcement learning algorithm is optimized by feeding back the new learned strategy to the mobile robot system in time and collecting specific system data; meanwhile, the original characteristics of the system are considered, the data collected again is combined with the experience data played back in the past, and the reinforcement learning algorithm is finally determined through learning again.

Further, the system further comprises: the bottom layer control algorithm component can be directly used for providing a reference component for comparison with the reinforcement learning algorithm, and can also be combined with the upper layer reinforcement learning algorithm to build a closed-loop control reinforcement learning system from a state directly to an actuator instruction. Further, the environmental modeling component includes: the system comprises a sensor data processing module, a mobile robot positioning module and a reinforcement learning environment modeling module, wherein the sensor data processing module is used for reading and processing sensor data carried by a mobile robot, and the mobile robot positioning module is used for positioning global position data of the robot in real time; the reinforcement learning environment modeling module is used for establishing a virtual environment interacting with the mobile robot when a simulation task is carried out.

Further, the optimization component provides alternative optimization algorithms including regularization algorithms including L1 and L2 regularization algorithms, entropy regularization algorithms, and/or early-stop algorithms.

Furthermore, evaluation function modules are respectively arranged in the path planning component and the core algorithm component, and are used for providing performance evaluation functions to realize the performance evaluation of parameter adjustment and algorithm selection of the path planning component and the core algorithm component.

In a second aspect, the present invention provides a method for controlling navigation of an auxiliary mobile robot based on a learning component library, the method being based on the system for controlling navigation of an auxiliary mobile robot based on a learning component library according to the above technical solution, the method comprising the following steps: selecting a state space and an action space corresponding to a specific mobile robot type from a pre-constructed initialization component, and setting a reinforcement learning reward function to complete initialization;

constructing a reinforcement learning simulation environment by utilizing a pre-constructed environment modeling component; acquiring the relative position of the barrier and the position of the mobile robot through an environment modeling component, and selecting a required path planning algorithm by utilizing a pre-constructed path planning component to plan an optimal navigation path; adjusting a reward function of a navigation control algorithm according to a path planning result;

Selecting and determining a reinforcement learning algorithm from a pre-constructed core algorithm component, combining a defined action space, a state space, a reward function and a reinforcement learning environment, and selecting a core algorithm module for training; the bottom layer control module or the direct controller instruction output is used for acting, then the relative position of the barrier and the position of the mobile robot are obtained through the environment modeling component again, and the steps are repeated to finish the controller instruction output and finish the reinforcement learning closed-loop control;

selecting a perturbation method from the test component, and testing the performance of the reinforcement learning algorithm selected and determined from the core algorithm component;

selecting and determining an optimization algorithm from the optimization components to adjust selected parameters of the reinforcement learning algorithm determined by the core algorithm component so as to improve the performance of the navigation control algorithm;

and visualizing the output numerical values of the core algorithm component and the test component by utilizing the visualization component.

In a third aspect, the present invention provides a method for controlling navigation of an auxiliary mobile robot based on a learning component library, where the method is based on a system for controlling navigation of an auxiliary mobile robot based on a learning component library, the system includes a learning component library, and the learning component library includes: the system comprises an initialization component, an environment modeling component, a path planning component, a core algorithm component, a testing component, an optimization component, a visualization component and a bottom control algorithm component; the initialization component is used for completing initialization of a state space and an action space corresponding to a specific mobile robot type and setting up a reward function; the path planning component is used for providing a selectable path planning algorithm to realize optimal navigation path planning; the core algorithm component is used for providing a plurality of reinforcement learning algorithms for selection, so that the instruction of the output controller is used for finishing reinforcement learning closed-loop control; the testing component is used for providing a selected perturbation method so as to test the performance of the reinforcement learning algorithm determined by the core algorithm component; the optimization component is used for providing a selected optimization algorithm to adjust the selected parameters of the reinforcement learning algorithm determined by the core algorithm component so as to improve the performance of the navigation control algorithm; the visualization component is used for visualizing the output numerical values of the core algorithm component and the test component; the underlying control algorithm component is used for providing a benchmark component which is compared with a reinforcement learning algorithm;

The method comprises the following steps:

selecting a state space and an action space corresponding to a specific mobile robot type from a pre-constructed initialization component, and setting a reinforcement learning reward function to complete initialization;

calling an environment modeling component to obtain sensor data carried by the mobile robot and global position data where the mobile robot is located;

combining the defined action space, state space, reward function, sensor data carried by the mobile robot and global position data of the mobile robot, selecting and determining a reinforcement learning algorithm from a pre-constructed core algorithm component, matching with a bottom layer control component or directly enabling a controller instruction to be output, after action, repeating the process through each sensor value of an environment modeling component to complete reinforcement learning closed-loop control;

selecting a disturbance method from the test assembly, performing algorithm evaluation and test by using the test assembly, feeding back an output state observation value of the sensor processing module in real time, and judging whether the control requirement is met;

selecting and determining an optimization algorithm from the optimization component, and adjusting the selected parameters of the reinforcement learning algorithm determined by the core algorithm component until the mobile robot obtains a preset execution effect in the navigation control task; and visualizing the output numerical values of the core algorithm component and the test component by utilizing the visualization component.

Furthermore, performance evaluation function modules are respectively arranged in the path planning assembly and the core algorithm assembly, the method further comprises the steps of determining a performance evaluation function by using the evaluation function modules, carrying out performance evaluation on parameter adjustment and algorithm selection of the path planning assembly and the core algorithm assembly, and visualizing the evaluation result of the performance evaluation function by using the visualization assembly.

The invention has the following beneficial technical effects: in the invention, all components are mutually interacted and flexibly called, and a plurality of simulation or practical application reinforcement learning training and visualization closed-loop learning systems suitable for navigation task scenes of the mobile robot can be quickly constructed according to the type of the mobile robot in the using process; the stability, robustness and generalization capability of the configuration algorithm can be tested through the testing component, if the algorithm needs to be optimized, the optimization component in the component library can be learned, parameter optimization and regularization operation can be conveniently and quickly carried out, algorithm overfitting is avoided, and algorithm performance is improved; meanwhile, if the configuration or the driving configuration of the sensor of the mobile robot needs to be changed, the whole navigation control algorithm workflow does not need to be built again, and the corresponding component module is directly replaced, so that the method has good flexibility and universality.

The auxiliary type mobile robot navigation and control method based on the learning component can be applied to actual mobile robot control through complete workflow among the components, and can also be used for carrying out simulation test on the navigation control algorithm effect of the mobile robot by using a simulation environment; the method has good flexibility and universality, and can be conveniently applied to navigation control tasks of the auxiliary mobile robot carrying various sensor schemes and driving configurations. In the actual application process, the mobile robot can be controlled to perform a navigation task through a traditional control algorithm, and the navigation control based on reinforcement learning can also be performed through rapidly setting up a reinforcement learning environment. When the method is used, a path planning algorithm, a reward function construction method, a core learning algorithm module and the like can be changed in a modularized mode, various evaluation indexes of each algorithm can be monitored conveniently, and a reference algorithm used for comparison is established.

On the other hand, the navigation control learning component library provides a bottom layer control algorithm component containing a main flow control algorithm, so that performance comparison verification can be conveniently carried out on the learning algorithm.

Drawings

FIG. 1 is a general architecture of a learning component-based auxiliary mobile robot navigation control system according to an embodiment of the present invention;

FIG. 2 is a first construction method of a navigation control method of an auxiliary mobile robot based on a learning component according to an embodiment of the present invention;

FIG. 3 is a second construction method of a navigation control method of an auxiliary mobile robot based on a learning component according to an embodiment of the present invention;

FIG. 4 is a diagram of a comprehensive policy algorithm architecture in an embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and specific examples.

In one embodiment, an auxiliary mobile robot navigation control system based on a learning component library includes a pre-established learning component library for navigation control of an auxiliary mobile robot, where the learning component library includes: the system comprises an initialization component, an environment modeling component, a path planning component, a core algorithm component, a testing component, an optimization component and a visualization component;

the initialization component is used for completing initialization of a state space and an action space corresponding to a specific mobile robot type and setting up a reward function;

The environment modeling component is used for reading and processing sensor data carried by the mobile robot, determining global position data of the positioning robot and establishing a virtual environment interacting with the mobile robot when a simulation task is performed;

the path planning component is used for providing a selectable path planning algorithm to realize optimal navigation path planning; the core algorithm component is used for providing a plurality of reinforcement learning algorithms for selection, so that the instruction of the output controller is used for finishing reinforcement learning closed-loop control; the optimization component is to provide a regularization method such that optimization of a reinforcement learning algorithm is achieved; the testing component is used for providing a selected perturbation method so as to test the performance of the reinforcement learning algorithm determined by the core algorithm component; the optimization component is used for providing a selected regularization algorithm to adjust selected parameters of the reinforcement learning algorithm determined by the core algorithm component so as to improve the performance of the navigation control algorithm; the visualization component is used for visualizing the output numerical values of the core algorithm component and the test component.

The learning component library for navigation control is a computer database, and is a standardized computer software module for navigation control of the mobile robot. And the navigation control learning component library calls a pre-packaged algorithm and module according to the input information, and finally obtains the returned result of each component. The learning component library provided by the invention can be directly applied to the navigation control of an actual mobile robot, and can also take a core reinforcement learning algorithm as an upper control link in a control closed loop, so as to learn the complex behavior of the mobile robot and output the reference quantity of a bottom controller.

In the navigation control problem of tracking a planned path by a mobile robot, the components of the embodiment mainly have the following types:

the initialization component comprises an action space, a state space and a reward function design of the mobile robot with different driving configurations; the environment modeling component comprises various modules required for building an environment model; a path planning component comprising different path planning algorithms; the core algorithm component comprises different same strategies, different strategies and comprehensive strategy algorithms; the optimization component comprises a regularization module which improves the robustness and the generalization of a control algorithm and avoids overfitting; contains a test component for testing the performance of the algorithm. Including a visualization component that enables visualization of various performance parameters.

For example, in the path planning component, the environment information output by the previous environment modeling component is input, the position of the robot and the position of the target point are input, and the planned path can be obtained after a required path planning algorithm is selected.

For another example, in the core algorithm component, the selected algorithm type is input, the algorithm module is called to train, and the performance evaluation parameters in the training process are returned in real time for monitoring the algorithm performance in the training process.

In a specific embodiment, optionally, the initialization component mainly includes a state space design module, an action space design module, and a reward function design module. The environment modeling component mainly comprises a vision sensor processing module, a laser radar sensor processing module, a robot positioning sensor processing module and a reinforcement learning environment modeling module. The path planning component comprises a heuristic path planning module, an artificial potential field path planning module, a machine learning path planning module and the like. The core algorithm component comprises a same strategy algorithm module, a different strategy algorithm module and a comprehensive strategy algorithm module. The optimization component comprises a hyper-parameter optimization module, a regularization module and the like, wherein a commonly used regularization algorithm, such as an L1/L2 regularization algorithm, an entropy regularization algorithm, an early-stop algorithm and the like, is packaged in the regularization module, and can be added according to needs to improve the generalization performance of the reinforcement learning algorithm. The disturbance assembly comprises a dynamic obstacle disturbance module, a wind power disturbance module, a water flow disturbance module and the like. The visualization component comprises a learning curve visualization module, a navigation control error visualization module, an actuator numerical value visualization module and the like.

The invention provides various components in a learning component library, which are standardized computer software modules of a learning algorithm for navigation control of an auxiliary mobile robot. And calling the pre-packaged algorithm and module by each component according to the input information, and finally obtaining the returned result of each component. Based on the system architecture provided by the present invention, those skilled in the art can implement the construction of each component and the call between components by using the prior art according to the requirements of practical applications, that is, self-package an intelligent algorithm and convert it into a standard module, and add it into a component including a corresponding function. The components in the learning component library can interact with each other and can be called mutually.

The navigation control learning component library in this embodiment can be applied to actual mobile robot control through a complete workflow between components, and can also be used for performing simulation test on the navigation control algorithm effect of the mobile robot by using a simulation environment. When the mobile robot is subjected to simulation control, a virtual environment can be built through a reinforcement learning environment modeling module in the environment modeling assembly, a user can directly test the overall algorithm performance of a core algorithm library, and a reinforcement learning algorithm designed by the user can be butted, so that the building of an algorithm training environment is quickly completed. Meanwhile, the loss of time and hardware cost caused by direct training on the actual robot is avoided.

If the use scene is relatively simple, the method can be directly applied to actual mobile robot control, on one hand, the strategy network outputs the actuator instruction through the observed value of each sensor, and meanwhile, in order to prevent the actuator from being damaged, the threshold value of each actuator instruction is limited, and the safety in actual operation is ensured. On the other hand, the core reinforcement learning algorithm can be used as an upper control link in the closed control loop based on the navigation control learning component library, and is used for learning complex mobile robot behaviors and outputting the reference quantity of the bottom controller, so that the performance of the final algorithm is ensured by combining the advantages of reinforcement learning in the traditional closed loop controller.

Preferably, in the navigation control learning component library, parameter adjustment and algorithm selection of each component can be determined by a performance evaluation function of each component, and each performance evaluation function can be visualized through a visualization module, so that algorithm performance can be monitored and evaluated conveniently.

And in the path planning component, evaluating a final path planning result according to the time index and the energy consumption index.

Optionally, in the core algorithm component, the final navigation control effect is evaluated by a final learning curve, tracking accuracy and an actuator numerical variation curve.

Preferably, in the assembly, whether each module is used or not is determined by task requirements; and each module can be flexibly added, deleted and replaced according to different task conditions. For example, it is desirable to compare the navigation control effects of the depth certainty strategy gradient algorithm and the comprehensive strategy algorithm, and only the core algorithm component needs to be replaced and compared according to the final evaluation index.

In a second embodiment, on the basis of the first embodiment, the present embodiment provides an auxiliary mobile robot navigation control system based on a learning component library, and the system further includes: an underlying control algorithm component for providing a baseline component as a comparison to a reinforcement learning algorithm. The bottom layer control algorithm component is an optional component and comprises a plurality of common control algorithm modules, and the component is mainly used as a comparison reference or matched with a core algorithm component to construct a navigation control closed-loop system. The bottom layer control algorithm component can be directly used for providing a reference component for comparison with a reinforcement learning algorithm on one hand, and can be combined with an upper layer reinforcement learning algorithm on the other hand to build a closed-loop control reinforcement learning system from a state directly to an actuator instruction. And (3) taking the reinforcement learning output (x, y, psi.) upper layer command as an input of a controller, and the controller outputs an actuator command for tracking by calling a bottom layer control algorithm component. The hierarchical architecture can effectively reduce the data latitude of reinforcement learning and improve the efficiency.

The third embodiment provides an auxiliary mobile robot navigation control system (as shown in fig. 3) based on a learning component library, where the core component library includes a same policy module, a different policy module, and a comprehensive policy module, the same policy module is used for encapsulating a reinforcement learning algorithm of the same policy, and the different policy is used for encapsulating a reinforcement learning algorithm of the different policy; the comprehensive strategy module is used for packaging a comprehensive strategy algorithm, and the comprehensive strategy algorithm is a data-driven strengthening algorithm for synthesizing the same strategy and the different strategies.

In the conventional navigation control method, feature extraction, fusion and state estimation are performed on data read by a sensor, and then actuator control on the bottom layer or task control on the upper layer is performed. The method has some limitations to limit the application of the mobile robot in complex scenes, such as feedback linearization control, linear quadratic control, model prediction control, backstepping control and the like. For example, it is difficult to accurately describe the dynamics of a complex system for the linearization of a motion model; in addition, some non-linear control methods rely on an accurate mathematical physical model of a controlled object, which often requires a large amount of prior knowledge and expert experience, and the design process of the controller is tedious and time-consuming. With the rapid development of artificial intelligence technology, deep reinforcement learning is widely applied in the field of intelligent agent control, complicated data processing procedures are directly avoided based on the control of the deep reinforcement learning, and actions to be executed are directly output through a strategy network according to the observed values of sensors. However, the conventional deep reinforcement learning algorithm still has the following problems in terms of robot control: 1. the mainstream reinforcement learning algorithm with the same strategy has strong adaptability to environmental changes, but depends on a large amount of real-time data, so that huge computing resources are required to be consumed, and the convergence rate is slow; 2. although the reinforcement learning algorithm of the different strategies has better calculation efficiency, the reinforcement learning algorithm has weak adaptability to environmental changes due to repeated sampling of original state sequence data; 3. the algorithm tends to be overfitting to a specific task and is weak in generalization ability.

By providing the comprehensive strategy algorithm, the embodiment realizes the advantages of the comprehensive same strategy and different strategy algorithms and can improve the generalization capability of the reinforcement learning control algorithm. The comprehensive strategy algorithm comprises the following steps: the adaptive capacity of the reinforcement learning algorithm is optimized by feeding back the new learned strategy to the mobile robot system in time and collecting specific system data; meanwhile, the original characteristics of the system are considered, the data collected again is combined with the experience data played back in the past, and the reinforcement learning algorithm is finally determined through learning again. The concrete description is as follows:

the comprehensive strategy algorithm architecture diagram in this embodiment is described in fig. 4. The mainstream reinforcement learning algorithm is based on the same strategy or different strategies, and both the two methods have the above problems, and the comprehensive strategy algorithm provided by the embodiment combines the advantages of the two methods, which are specifically described as follows:

for example, the most typical abnormal strategy reinforcement Learning algorithm, Q-Learning, the update process of the Q value of the action state is as follows:

wherein R(s) is a reward function and (s ', a') is an optimal state action pair; the algorithm always uses the optimal Q value to select the optimal action when calculating the expected profit for the next state, but the current strategy does not necessarily select the optimal action, so it does not care what the strategy is. The strategy for generating the sample is different from the strategy during learning, and is called a different strategy mechanism. The different strategies have the advantages that the global optimum can be obtained, the general performance is strong, but the training process is tortuous, and the convergence speed is low.

For another example, the most typical reinforcement learning algorithm with the same strategy, SARSA algorithm, the Q value updating process is as follows

It can be found that the strategy used in updating the network parameters is the same as the strategy used in generating the samples in the typical strategy reinforcement learning algorithm, which is relatively direct and has a fast calculation speed, but because the strategy only utilizes the currently known optimal selection, the optimal solution may not be learned, and the strategy falls into local optimization.

The comprehensive strategy algorithm encapsulated by the comprehensive strategy module in the embodiment is established on the basis of the two strategy algorithms, and the main flow is as follows:

s41, initializing operation such as state and action; s42, executing initial action, and filling an experience pool similar to different strategy algorithms such as DQN and the like;

s43 reinforcement learning algorithm theme part, carries on strategy evaluation and strategy optimization, and judges whether convergence is reached through S44, if not, then, S45 is carried out, the difference of the comprehensive strategy algorithm except that the experience pool filling is carried out after the reward is obtained by the normal execution action, the specific data is extracted from the last state sequence and filled into the experience pool to form new sampling data, and then the above steps are repeated again until convergence, thus not only reducing the correlation degree between the data, but also utilizing the previous useful data, combining the algorithm advantages of different strategies and the same strategies, thus improving the convergence performance.

Through the comprehensive strategy module provided by the core algorithm component provided by the embodiment, compared with a different strategy algorithm, the comprehensive strategy module has stronger adaptability to the change of the environment through a comprehensive strategy mechanism; on the other hand, compared with the same strategy algorithm, the method has better calculation efficiency and convergence performance.

The comprehensive strategy module of the embodiment integrates a data-driven reinforcement learning algorithm of a comprehensive strategy, so that the method can be applied to not only general auxiliary mobile robot application scenes, but also complex scenes with strong nonlinearity and environment change. Meanwhile, the stability, robustness and generalization capability of the algorithm can be conveniently tested through the testing component, and the algorithm can be conveniently adjusted to be optimized by optimizing the linkage of the component.

Other features and advantages of the present invention will become more apparent from the detailed description of the embodiments of the present invention when taken in conjunction with the accompanying drawings.

As can be seen from fig. 1, the auxiliary mobile robot navigation control system based on the learning component library has eight components in total, and S11 is an initialization component, which includes an S111 state space design module, an S112 behavior space design module, and an S113 reward function design module; the state space design and the behavior space design are the first steps in the reinforcement learning workflow, and can be designed into discrete space or continuous space according to task requirements; the reward function design needs to be carried out by combining a specific path planning route, and the reward function design mainly comprises the forms of final state reward, single step reward, continuous reward, nonlinear reward and the like.

S12 is an environment modeling component, which comprises an S121 sensor data processing module, an S122 mobile robot positioning module and an S123 reinforcement learning environment modeling module; the sensor data processing module is used for reading and processing sensor data carried by the mobile robot, for example, for a visual sensor, observation information can be enhanced through processing algorithms such as noise reduction and defogging; the mobile robot positioning module is used for positioning the global position of the robot in real time; the reinforcement learning environment modeling module is used for establishing a virtual environment interacting with the intelligent agent when a simulation task is carried out.

S13 is a path planning component, aiming at planning an optimal moving path of the mobile robot in real time according to the requirements of target tasks by utilizing environmental information; the system mainly comprises a plurality of commonly used path planning algorithm modules, such as a heuristic path planning module of S131, an artificial potential field path planning module of S132, a machine learning path planning module of S133 and the like.

The S14 is a core algorithm component which mainly comprises three modules, namely an S141 same-strategy algorithm module, an S142 different-strategy algorithm module and an S143 comprehensive strategy algorithm module, and the component comprises the encapsulation of various reinforcement learning algorithms and integrates a data-driven comprehensive strategy reinforcement learning algorithm which combines the advantages of the same strategy and the different strategies for processing the task scene with strong nonlinearity and strong adaptability to the environmental change.

S15 is a visualization component, which includes an S151 path planning visualization module, an S152 learning curve visualization module, an S153 navigation control error visualization module, and an S154 actuator numerical visualization module.

S16 is an optimization component and is mainly used for optimizing the stability and robustness of the algorithm and improving the generalization capability; for example, an S161 parameter optimization module, an S162 regularization module, etc. are included.

S17 is a testing component which tests the performance of the algorithm by adding disturbance to the environment, such as S171 dynamic obstacle module, S172 wind disturbance module, S173 water disturbance module, etc.

S18 is a bottom layer control algorithm component which can be used as a reference component for comparison with a reinforcement learning algorithm, can be directly used for actual mobile robot control, and can also be used for improving the algorithm performance of an actual mobile robot by being combined with the reinforcement learning algorithm component; the system mainly comprises an S181 linear quadratic optimization control module, an S182 model prediction control module and an S183 feedback linearization control module.

In a fourth embodiment, a method for controlling navigation of an auxiliary mobile robot based on a learning component library is based on a system for controlling navigation of an auxiliary mobile robot based on a learning component library, where the method includes: establishing a navigation control learning component library for the auxiliary mobile robot; selecting state spaces and action spaces corresponding to different mobile robot types from the initialization component according to the characteristics of different auxiliary mobile robot driving configurations, sensor schemes and the like; according to the real use scene requirements, a simulation environment is constructed, a required path planning algorithm is selected, and an optimal navigation path is planned; setting up a reward function according to the characteristics of the actual task; selecting one of the same strategy, different strategies or a comprehensive strategy algorithm combining the advantages of the same strategy and the different strategies as a learning algorithm from the algorithm component, and configuring the hyper-parameters of the algorithm; selecting a required regularization method from the optimization component according to the condition of using scene disturbance; checking the training effect, and automatically adding a disturbance component according to the requirements of the use scene to test the stability, robustness and generalization capability of the selected algorithm; the optimization component can be used to adjust the primary component parameters to improve the navigation control algorithm performance according to the control requirements.

The method provided by the embodiment is directly based on a core algorithm component of the learning component library and directly outputs a controller instruction. This is described below with reference to figure 2 of the specification. This example will be described by taking as an example a mobile disinfection robot equipped with a vision sensor, a laser radar sensor, and a positioning sensor, which autonomously moves to a target position and disinfects in an indoor public place.

As shown in fig. 2, the present invention directly outputs controller instructions through the core algorithm component of the learning component library to complete the construction steps of the reinforcement learning closed-loop control:

step S21, according to the chassis configuration and driving mode of the mobile disinfection robot, combining with a motion model, initializing the state space and the action space of the robot by utilizing a pre-constructed environment modeling component;

step S22, two situations are divided, one is that the simulation research of the navigation control algorithm of the disinfection robot is carried out, and at the moment, the next step can be carried out only by establishing a reinforcement learning environment according to the motion model;

If the method is applied to an actual control scene, the sensor data processing module and the positioning module need to be called to obtain the environment information and the state observation value of the disinfection mobile robot, and simultaneously obtain a positioning signal and update the state information;

step S23, obtaining the relative position of the barrier and the self position of the disinfection mobile robot through an environment modeling component according to the environment information, and calling a path planning component by using a pre-constructed path planning component to obtain an optimal path;

step 24-A, adjusting a reward function of a navigation control algorithm set up according to the environment modeling component according to a path planning result;

step 24-B, combining the defined action space, state space, reward function and reinforcement learning environment, selecting a core algorithm module from a pre-constructed core algorithm group to select and determine a reinforcement learning algorithm, training, and finishing reinforcement learning closed-loop control through a bottom layer control module or directly outputting a controller instruction;

step S25, which is an optional step, selecting a mainstream control algorithm reference for final comparison using the bottom layer control algorithm component;

step S26 is to test and evaluate S24, for example, in a simulation environment, the behavior of the disinfection robot when encountering a pedestrian can be tested by testing the component dynamic barrier module;

S27, whether the navigation control effect meets the requirement is verified, otherwise, the reinforcement learning algorithm determined by the core algorithm component can be optimized through the optimization component, and the task execution effect is improved.

In the same task, S24-S28 are repeatedly performed until the sterilizing mobile robot obtains a desired execution effect in the navigation control task. And the visualization component is used for realizing visualization of the output values of the core algorithm component and the test component so as to monitor the learning training process in real time.

An embodiment six provides an auxiliary mobile robot navigation control method based on a learning component library, and based on the auxiliary mobile robot navigation control system based on the learning component library provided by the embodiment three, as shown in fig. 3, the invention provides a construction step in which a core algorithm of the learning component library is combined with a traditional control method to construct a closed-loop control learning component system:

step S31, initializing the state space and the action space of the robot by using a pre-constructed initialization component;

step S32, invoking an environment modeling component to obtain sensor data carried by the mobile robot and global position data where the mobile robot is located (optionally, the environment modeling component includes a sensor data processing module and a mobile robot positioning module, the sensor data carried by the mobile robot is obtained through the sensor data processing module, and the global position data where the mobile robot is located is obtained through the mobile robot positioning module);

Step S33, combining task requirements, designing a reward function by utilizing an environment modeling component, determining a core algorithm through a core algorithm component, and concentrating on learning a motion strategy of the mobile robot by inputting an observation value as reference input of a traditional controller;

step S34 and step S35, adding a traditional controller, constructing a navigation control closed loop, combining a bottom-layer control algorithm component or directly outputting a controller instruction, obtaining the current mobile robot information through an environment modeling module again after executing the instruction, and repeating the steps to complete the reinforcement learning closed loop control;

meanwhile, the test component is used for carrying out algorithm evaluation and test, and feeding back the state observation value output by the sensor processing module in real time; step S36, determining whether the control requirement is met;

and step S37, on the basis of the previous step, if the optimization needs to be continued, calling an optimization component to perform parameter optimization and regularization to perform algorithm optimization, and improving the task execution effect.

Also, S33-S38 are repeatedly performed until the sterilizing mobile robot obtains a desired execution effect in the navigation control task. And visualizing the output numerical values of the core algorithm component and the test component by utilizing the visualization component.

The method for combining the bottom layer control algorithm components comprises the following steps: and (3) taking the reinforcement learning output (x, y, psi.) upper layer command as an input of a controller, and the controller outputs an actuator command for tracking by calling a bottom layer control algorithm component. The hierarchical architecture can effectively reduce the data latitude of reinforcement learning and improve the efficiency. The algorithms in which the underlying control algorithm components are encapsulated may include LQR, PID, MPC, or Backstepping.

In addition, it should be additionally noted that, compared with the first method, in the second construction method, the bottom layer control component in fig. 2 is only used as a link for comparing performance, and the bottom layer control algorithm component in fig. 3 is used as a loop in the whole learning closed loop for outputting a bottom layer control instruction, so that the advantages of traditional control and reinforcement learning can be combined, the dimensions of a state space and an action space are reduced, the reinforcement learning is concentrated on learning a complex behavior strategy, and the advantages of a traditional mainstream control algorithm are combined, so that the stability and the algorithm performance of the algorithm are improved, but on the other hand, the traditional control algorithm is introduced, so that the overall algorithm complexity is increased. In particular, with the components in the control learning component library, a variety of closed-loop learning control systems other than the two above can be constructed as required by the task.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An auxiliary type mobile robot navigation control system based on a learning component library is characterized by comprising the learning component library, wherein the learning component library comprises: the system comprises an initialization component, an environment modeling component, a path planning component, a core algorithm component, a testing component, an optimization component and a visualization component; the initialization component is used for completing initialization of a state space and an action space corresponding to a specific mobile robot type and setting up a reward function; the environment modeling component is used for reading and processing sensor data carried by the mobile robot, determining global position data of the positioning robot and establishing a virtual environment interacting with the mobile robot when a simulation task is performed; the path planning component is used for providing a selectable path planning algorithm to realize an optimal navigation path; the core algorithm component is used for providing a reinforcement learning algorithm with a plurality of strategies, so that the instruction of the output controller is used for finishing the reinforcement learning closed-loop control; the testing component is used for providing a disturbance method in a simulation environment for selection so as to test the performance of the reinforcement learning algorithm determined by the core algorithm component; the optimization component is used for providing a selected optimization algorithm to adjust the selected parameters of the reinforcement learning algorithm determined by the core algorithm component so as to improve the performance of the navigation control algorithm; the visualization component is used for visualizing the output values of the core algorithm component and the test component during simulation or actual learning tasks.

2. The system of claim 1, wherein the core component library comprises a same strategy module, a different strategy module and a comprehensive strategy module, the same strategy module is used for encapsulating reinforcement learning algorithms of the same strategy, and the different strategy module is used for encapsulating reinforcement learning algorithms of the different strategies; the comprehensive strategy module is used for packaging a comprehensive strategy algorithm, and the comprehensive strategy algorithm is a data-driven strengthening algorithm for synthesizing the same strategy and the different strategies.

3. The system of claim 2, wherein the integrated strategy algorithm comprises: the new learning strategy is fed back to the mobile robot system, and specific system data is collected to optimize the adaptability of the reinforcement learning algorithm; and combining the newly collected data with the experience data played back in the past, and learning again to finally determine the reinforcement learning algorithm.

4. The system of claim 1, wherein the system further comprises: the bottom layer control algorithm component can be directly used for providing a reference component for comparison with the reinforcement learning algorithm, and can also be combined with the upper layer reinforcement learning algorithm to build a closed-loop control reinforcement learning system from a state directly to an actuator instruction.

5. The system of claim 1, wherein the environment modeling component comprises: the system comprises a sensor data processing module, a mobile robot positioning module and a reinforcement learning environment modeling module, wherein the sensor data processing module is used for reading and processing sensor data carried by a mobile robot, and the mobile robot positioning module is used for positioning global position data of the robot in real time; the reinforcement learning environment modeling module is used for establishing a virtual environment interacting with the mobile robot when a simulation task is carried out.

6. A learning component library-based auxiliary mobile robot navigation control system as claimed in claim 1 wherein the optimization component provides alternative optimization algorithms including regularization algorithms L1 and L2, entropy regularization algorithms and/or early stop algorithms.

7. The navigation control system of an auxiliary mobile robot based on a learning component library as claimed in claim 1, wherein the path planning component and the core algorithm component are respectively provided with a performance evaluation function module for providing a performance evaluation function to realize performance evaluation of parameter adjustment and algorithm selection of the path planning component and the core algorithm component.

8. An auxiliary mobile robot navigation control method based on a learning component library is characterized in that the method is based on an auxiliary mobile robot navigation control system of the learning component library; the system includes a learning component library, the learning component library including: the system comprises an initialization component, an environment modeling component, a path planning component, a core algorithm component, a testing component, an optimization component, a visualization component and a bottom control algorithm component; the initialization component is used for completing initialization of a state space and an action space corresponding to a specific mobile robot type and setting up a reward function; the environment modeling component is used for reading and processing sensor data carried by the mobile robot, determining global position data of the positioning robot and establishing a virtual environment interacting with the mobile robot when a simulation task is performed; the path planning component is used for providing a selectable path planning algorithm to realize optimal navigation path planning; the core algorithm component is used for providing a plurality of reinforcement learning algorithms for selection, so that the instruction of the output controller is used for finishing reinforcement learning closed-loop control; the test component is used for providing a selected perturbation method for performance test under different working conditions so as to test the performance of the reinforcement learning algorithm determined by the core algorithm component; the optimization component is used for providing a selected optimization algorithm to adjust the selected parameters of the reinforcement learning algorithm determined by the core algorithm component so as to improve the performance of the navigation control algorithm; the visualization component is used for visualizing the output numerical values of the core algorithm component and the test component; the underlying control algorithm component is used for providing a benchmark component which is compared with a reinforcement learning algorithm;

The method comprises the following steps:

selecting and determining a reinforcement learning algorithm from a pre-constructed core algorithm component, combining a defined action space, a state space, a reward function and a reinforcement learning simulation environment, and selecting a core algorithm module for training; the bottom layer control module or the controller instruction is directly output to act, then the relative position of the barrier and the position of the mobile robot are obtained through the environment modeling component again, and the steps are repeated to complete the reinforcement learning closed-loop control;

and the visualization component is used for realizing visualization of the output values of the core algorithm component and the test component so as to monitor the learning training process in real time.

9. A method for controlling navigation of an auxiliary mobile robot based on a learning component library, wherein the method is based on a system for controlling navigation of an auxiliary mobile robot based on a learning component library, the system comprises a learning component library, and the learning component library comprises: the system comprises an initialization component, an environment modeling component, a path planning component, a core algorithm component, a testing component, an optimization component, a visualization component and a bottom control algorithm component; the initialization component is used for completing initialization of a state space and an action space corresponding to a specific mobile robot type and setting up a reward function; the environment modeling component is used for reading and processing sensor data carried by the mobile robot, determining global position data of the positioning robot and establishing a virtual environment interacting with the mobile robot when a simulation task is performed; the path planning component is used for providing a selectable path planning algorithm to realize optimal navigation path planning; the core algorithm component is used for providing a plurality of reinforcement learning algorithms for selection, so that the instruction of the output controller is used for finishing reinforcement learning closed-loop control; the optimization component is configured to provide an optimization method to enable optimization of a reinforcement learning algorithm; the test component is used for providing a selected perturbation method for performance test under different working conditions so as to test the performance of the reinforcement learning algorithm determined by the core algorithm component; the optimization component is used for providing a selected optimization algorithm to adjust the selected parameters of the reinforcement learning algorithm determined by the core algorithm component so as to improve the performance of the navigation control algorithm; the visualization component is used for visualizing the output numerical values of the core algorithm component and the test component; the underlying control algorithm component is used to provide a baseline component as a comparison to a reinforcement learning algorithm

The method comprises the following steps:

combining the defined action space, state space, reward function, sensor data carried by the mobile robot and global position data of the mobile robot, selecting and determining a reinforcement learning algorithm from a pre-constructed core algorithm component, combining a bottom control algorithm component or directly outputting a controller instruction, obtaining current mobile robot information through an environment modeling module again after executing the instruction, and repeating the steps to complete reinforcement learning closed-loop control;

selecting and determining an optimization algorithm from the optimization component, and adjusting the selected parameters of the reinforcement learning algorithm determined by the core algorithm component until the mobile robot obtains a preset execution effect in the navigation control task;

10. The navigation control method for the auxiliary mobile robot based on the learning component library of claim 9, wherein performance evaluation function modules are respectively arranged in the path planning component and the core algorithm component, the method further comprises the steps of determining a performance evaluation function by using the performance evaluation function modules, performing performance evaluation on parameter adjustment and algorithm selection of the path planning component and the core algorithm component, and visualizing the evaluation result of the performance evaluation function by using the visualization component.