CN114905505B - Navigation control method, system and storage medium of mobile robot - Google Patents

Navigation control method, system and storage medium of mobile robot Download PDF

Info

Publication number
CN114905505B
CN114905505B CN202210383369.XA CN202210383369A CN114905505B CN 114905505 B CN114905505 B CN 114905505B CN 202210383369 A CN202210383369 A CN 202210383369A CN 114905505 B CN114905505 B CN 114905505B
Authority
CN
China
Prior art keywords
navigation control
control model
training
mobile robot
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210383369.XA
Other languages
Chinese (zh)
Other versions
CN114905505A (en
Inventor
余淼盈
杨尚东
陈蕾
王昱川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202210383369.XA priority Critical patent/CN114905505B/en
Publication of CN114905505A publication Critical patent/CN114905505A/en
Application granted granted Critical
Publication of CN114905505B publication Critical patent/CN114905505B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses a navigation control method, a system and a storage medium of a mobile robot in the field of robot navigation, comprising the following steps: adjusting the use sequence of sub-strategies in a navigation control model according to target task data in a real environment, and navigating the mobile robot by using the navigation control model; the training process of the navigation control model comprises the following steps: constructing a navigation control model by using a hierarchical reinforcement learning algorithm, and introducing an LSTM network into the navigation control model to serve as a track coding network; training the navigation control model through a training data set, and performing meta-learning training on an LSTM track coding network of the navigation control model through a meta-training data set; repeatedly iterating and updating to obtain a final navigation control model with function convergence; the sub-strategies are used in a specific order according to specific tasks, so that the migration process of the learned navigation control model applied to the actual environment is simplified, and the instantaneity of the navigation control model is improved.

Description

Navigation control method, system and storage medium of mobile robot
Technical Field
The invention belongs to the field of robot navigation, and particularly relates to a navigation control method, a navigation control system and a storage medium of a mobile robot.
Background
In recent years, reinforcement learning has been attracting attention with the rise of artificial intelligence. In particular, deep reinforcement learning algorithms that combine reinforcement learning with deep learning have made great breakthroughs in many fields. The reinforcement learning aims to enable an intelligent body to sample in the environment, learn autonomously to make correct behavior decisions, and can migrate to reality to help people solve actual problems.
Navigation control is a technology for guiding a mobile robot to move to a target position without collision through planning of movement direction and displacement, and is one of basic functions of the mobile robot and one of core research contents in the field of robot control. The traditional path planning algorithm depends on a global high-precision map, complex modeling and accurate positioning are required, the calculation efficiency can be reduced along with the increase of the complexity of the environment, and the real-time performance of the algorithm is not strong; meanwhile, the reinforcement learning intelligent model needs to be researched and sampled in a large scale in a training environment, but when the learned model is applied to an actual environment, all parameters in the model need to be adjusted according to data of the actual environment, so that the learning efficiency is low.
Disclosure of Invention
The invention aims to provide a navigation control method, a system and a storage medium of a mobile robot, wherein a hierarchical element reinforcement learning algorithm is utilized to construct a navigation control model, sub-strategies are used in a specific order according to specific tasks, the migration process of the learned navigation control model applied to an actual environment is simplified, and the instantaneity of the navigation control model is improved.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
The first aspect of the present invention provides a navigation control method for a mobile robot, comprising:
controlling the mobile robot to acquire target task data in a real environment by using the trained navigation control model;
Adjusting the use sequence of sub-strategies in the navigation control model according to the target task data, and navigating the mobile robot by using the navigation control model;
the training process of the navigation control model comprises the following steps:
constructing a navigation control model by using a hierarchical reinforcement learning algorithm, and introducing an LSTM network into the navigation control model to serve as a track coding network;
Building a training environment of a navigation control model and a mobile robot model, and controlling the mobile robot to interact with the training environment through the navigation control model to obtain a plurality of groups of training data sets;
Training the navigation control model through the training data set to obtain an updated navigation control model, controlling the mobile robot model to interact with the training environment again by using the updated navigation control model to obtain a multi-component training data set, and performing meta-learning training on an LSTM track coding network of the navigation control model through the meta-training data set; and repeatedly and iteratively updating to obtain a final navigation control model with function convergence.
Preferably, the method for training the navigation control model through the training data set to obtain the updated navigation control model comprises the following steps:
Training a navigation control model through a training data set containing multiple tasks, constructing a loss function of the navigation control model, calculating a training loss value of the navigation control model according to the loss function, iteratively updating navigation control model parameters through a training loss value gradient descent method, and storing the navigation control model parameters of each task.
Preferably, the method for performing meta-learning training on the LSTM track coding network of the navigation control model through the meta-training data set comprises the following steps:
And constructing a meta training loss function according to the loss function of the navigation control model, calculating a meta training loss value of the navigation control model, carrying out iterative updating on parameters of an LSTM track coding network in the navigation control model by a meta training loss value gradient descent method, obtaining a final navigation control model with function convergence, and storing the parameters of the final navigation control model.
Preferably, the method for constructing a training environment of a navigation control model and a mobile robot model and controlling the mobile robot to interact with the training environment through the navigation control model to obtain multiple sets of training data sets comprises the following steps:
Constructing a mobile robot model by adopting a robot physical simulation engine MuJoCo platform, and initializing and setting sensor parameters of the mobile robot model;
Designing a training environment comprising a plurality of obstacle areas and a plurality of target point areas, randomly generating obstacles and target points in the obstacle areas and the target point areas respectively to obtain training tasks, resetting positions of the obstacles and the target points to collect a plurality of groups of training tasks, and controlling the mobile robot to interact with each group of training tasks in the training environment by using a navigation control model to obtain a training data set.
Preferably, controlling the mobile robot to interact with each set of training tasks in the training environment through the navigation control model to obtain the training data set includes:
Generating a group of training tasks in a training environment, putting the mobile robot model into the training environment, and acquiring sensor information through a sensor of the mobile robot model; encoding track information according to the sensor information, and outputting a track state z t and a memory hidden variable (h t,ct);
The top-level strategy network pi Ω in the navigation control model selects strategy sequence number omega t according to the obtained track state z t, and sub-strategy networks corresponding to the strategy sequence number omega t in the navigation control model are started up The sub-policy network/>Outputting action a t according to track state z t;
After the mobile robot model executes the action a t, the mobile robot model interacts with the training environment to obtain the reward r t, if an obstacle is encountered, r t = -1, if a target point is encountered, r t =1, otherwise, r t =0;
The mobile robot model acquires the next group of sensor information through the sensors and encodes track information to acquire a new track state z t+1;
termination network for navigation control model Selecting whether to terminate the sub-policy network/>, based on the trace state z t+1 Executing the action; if terminator policy network/>Performing an action to reselect a starting sub-policy network/>, through a value function network Q U of a navigation control model
Controlling the mobile robot model to interact with the set of training tasks by using the navigation control model to obtain a set of training data setsResetting training task in training environment, repeating iterative process to obtain multiple sets of training data set
Preferably, the method for encoding track information according to sensor information, outputting the track state z t and the memory hidden variable (h t,ct) includes:
Acquiring the state s t of the mobile robot model at the current moment by using the sensor information, and reading the memory hidden variable (h t-1,ct-1) stored at the last moment, wherein the memory hidden variable (h t-1,ct-1) of the initial state of the mobile robot model is a zero vector; long-short time memory network for inputting memory hidden variable (h t-1,ct-1) and state s t into navigation control model The encoded track information is performed, and the track state z t and the memory hidden variable (h t,ct) are output.
Preferably, the Loss functions of the navigation control model include a Loss function Loss c, a Loss function Loss a and a Loss function Loss l;
The expression formula of the Loss function Loss c is:
The expression formula of the Loss function Loss a is:
The expression formula of the Loss function Loss l is:
Lossl=Lossa+Lossc
In the formula (i), Indicating that the sub-strategy/>, which is the most desirable for choosing the jackpot, is chosen at the trace state z i Representing that in the track state of z i+1, a sub-strategy/>, is selectedObtaining the expected value of the maximum accumulated rewards; gamma is expressed as discount rate, ranging from [0,1].
Preferably, the method for controlling the mobile robot model to interact with the training environment again by using the updated navigation control model to obtain the multi-component training data set comprises the following steps:
using a set of training data According to the Loss function of the navigation control model, gradient/>, taking parameter U as independent variable, of the Loss function Loss c is calculated respectivelyGradient of Loss function Loss a with parameter θ as argumentLoss function Loss a is expressed as a parameter/>Gradient/>, which is an argumentLoss function Loss l is expressed as a parameter/>Gradient/>, which is an argumentValue function network Q U, sub-policy network/>, using gradient descent methodTerminating network/>And long and short term memory network/>Is updated to U ', (ω, θ'),/>And saving training environment parameters and navigation control model parameters;
controlling the mobile robot model to interact with the set of training tasks again by using the updated navigation control model to obtain a set of metadata training data Repeating the iterative process to obtain T-group meta-training data, and constructing a meta-training data set
Preferably, the expression of the meta-training Loss function Loss meta_l is:
In the formula (i), The loss value, denoted as the i-th set of navigation control model parameters, T is denoted as the number of sets of navigation control model parameters.
A second aspect of the present invention provides a navigation control system of a mobile robot, comprising:
The target task acquisition module is used for controlling the mobile robot to acquire target task data in a real environment by using the trained navigation control model;
the strategy migration module is used for adjusting the use sequence of sub-strategies in the navigation control model according to the target task data;
The navigation module is used for controlling the mobile robot to navigate by using the navigation control model;
The navigation control model construction module is used for constructing a navigation control model by using a hierarchical reinforcement learning algorithm, and introducing the LSTM network into the navigation control model to serve as a track coding network;
The training data set acquisition module is used for building a training environment of the navigation control model and a mobile robot model, and controlling the mobile robot to interact with the training environment through the navigation control model to acquire a plurality of groups of training data sets;
The pre-training module is used for training the navigation control model through the training data set to obtain an updated navigation control model, controlling the mobile robot model to interact with the training environment again by using the updated navigation control model to obtain a multi-component training data set, and performing meta-learning training on the LSTM track coding network of the navigation control model through the meta-training data set; and repeatedly and iteratively updating to obtain a final navigation control model with function convergence.
Preferably, the mobile robot is provided with a sensor for interacting with a training environment and a real environment; the sensor information detected by the sensor comprises coordinates of a mobile robot, coordinates of a plurality of obstacles, coordinates of a plurality of target points, obstacle areas and target point areas.
A third aspect of the present invention provides a computer-readable storage medium, characterized in that a computer program is stored thereon, which program, when being executed by a processor, implements the steps of the navigation control method.
Compared with the prior art, the invention has the beneficial effects that:
(1) The method comprises the steps of controlling a mobile robot to acquire target task data in a real environment by using a trained navigation control model; adjusting the use sequence of sub-strategies in the navigation control model according to the target task data, and navigating the mobile robot by using the navigation control model; the sub-strategies are used in a specific order according to specific tasks, so that the migration process of the learned navigation control model applied to the actual environment is simplified, and the instantaneity of the navigation control model is improved.
(2) The invention utilizes the LSTM network to abstract the characteristics of the state track, so that the mobile robot can distinguish different navigation tasks, thereby better learning the sub-strategies shared by the tasks and memorizing the sub-strategy combination sequence required by different tasks.
Drawings
FIG. 1 is a block diagram of a navigation control model provided by an embodiment of the present invention;
fig. 2 is a flowchart of encoding track information according to sensor information through an LSTM network according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
Example 1
As shown in fig. 1 to 2, the present embodiment provides a navigation control method of a mobile robot, including:
controlling the mobile robot to acquire target task data in a real environment by using the trained navigation control model;
Adjusting the use sequence of sub-strategies in the navigation control model according to the target task data, and navigating the mobile robot by using the navigation control model;
the training process of the navigation control model comprises the following steps:
Constructing a navigation control model by using a hierarchical reinforcement learning algorithm, and introducing an LSTM network into the navigation control model to serve as a track coding network; the navigation control model is internally provided with a value function network Q U and a sub-strategy network Top policy network pi Ω and termination network/>
The method for constructing the training environment of the navigation control model and the mobile robot model and controlling the mobile robot to interact with the training environment through the navigation control model to obtain a plurality of groups of training data sets comprises the following steps:
Constructing a mobile robot model by adopting a robot physical simulation engine MuJoCo platform, and initializing and setting sensor parameters of the mobile robot model;
Designing a training environment comprising a plurality of obstacle areas and a plurality of target point areas, randomly generating obstacles and target points in the obstacle areas and the target point areas respectively to obtain training tasks, resetting positions of the obstacles and the target points to collect a plurality of groups of training tasks, and controlling the mobile robot to interact with each group of training tasks in the training environment by using a navigation control model to obtain a training data set.
The method for controlling the mobile robot to interact with each group of training tasks in the training environment through the navigation control model to obtain the training data set comprises the following steps:
Generating a group of training tasks in a training environment, putting the mobile robot model into the training environment, and acquiring sensor information through a sensor of the mobile robot model; acquiring the state s t of the mobile robot model at the current moment by using the sensor information, and reading the memory hidden variable (h t-1,ct-1) stored at the last moment, wherein the memory hidden variable (h t-1,ct-1) of the initial state of the mobile robot model is a zero vector; long-short time memory network for inputting memory hidden variable (h t-1,ct-1) and state s t into navigation control model Coding track information, outputting a track state z t and a memory hidden variable (h t,ct), wherein the expression formula is as follows:
it=σ(Wist+Uiht-1+bi)
ft=σ(Wfst+Ufht-1+bf)
ot=σ(Wost+Uoht-1+bo)
ht=ot⊙tanh(ct)
zt=ht
Wherein s t represents the environment state of the mobile robot at the time point t, s t includes the coordinates of the mobile robot at the time point t, the coordinates of the plurality of obstacles, the coordinates of the plurality of target points, the codes of the obstacle regions, the target point regions, h t-1 and c t-1 represent the memory information ,Wi、Ui、bi、Wf、Uf、bf、Wo、Uo、bo、Wc、Uc、bc at the last time point t-1 as network parameters, σ (·) represents a Logistic function, and as vector element product.
The top-level strategy network pi Ω in the navigation control model selects strategy sequence number omega t according to the obtained track state z t, and sub-strategy networks corresponding to the strategy sequence number omega t in the navigation control model are started upThe sub-policy network/>Outputting action a t according to track state z t;
The mobile robot model executes action a t to interact with the training environment to obtain rewards r t, wherein r t = -1 when an obstacle is encountered, r t = 1 when an obstacle is encountered, or r t = 0 when a target point is encountered;
The mobile robot model acquires the next group of sensor information through the sensors and encodes track information to acquire a new track state z t+1;
termination network for navigation control model Selecting whether to terminate the policy network/>, based on the trajectory state z t+1 and the reward r t Executing the action; if terminator policy network/>Performing an action to reselect a starting sub-policy network/>, through a value function network Q U of a navigation control model
Controlling the mobile robot model to interact with the set of training tasks by using the navigation control model to obtain a set of training data setsResetting training tasks in a training environment, repeating the iterative process, and obtaining a plurality of groups of training data sets/>
Constructing a Loss function of a navigation control model, wherein the Loss function of the navigation control model comprises a Loss function Loss c, a Loss function Loss a and a Loss function Loss l;
The expression formula of the Loss function Loss c is:
The expression formula of the Loss function Loss a is:
The expression formula of the Loss function Loss l is:
Lossl=Lossa+Lossc
In the formula (i), Indicating that the sub-strategy/>, which is the most desirable for choosing the jackpot, is chosen at the trace state z i Representing that in the track state of z i+1, a sub-strategy/>, is selectedObtaining the expected value of the maximum accumulated rewards; gamma is expressed as discount rate, ranging from [0,1].
And calculating a training loss value of the navigation control model according to the loss function, and updating parameters of the navigation control model for the first time through a training loss value gradient descent method.
Using a set of training dataAccording to the Loss function of the navigation control model, gradient/>, taking parameter U as independent variable, of the Loss function Loss c is calculated respectivelyGradient of Loss function Loss a with parameter θ as argumentLoss function Loss a is expressed as a parameter/>Gradient/>, which is an argumentLoss function Loss l is expressed as a parameter/>Gradient/>, which is an argumentThe parameters of the navigation control model are updated for the second time through the training data set, and the value function network Q U and the sub-strategy network/>, are updated by utilizing the gradient descent methodTerminating network/>And long and short term memory network/>The network parameters of (a) are U ', (omega, theta'),/>And saving navigation control model parameters;
controlling the mobile robot model to interact with the set of training tasks again by using the updated navigation control model to obtain a set of metadata training data
Repeating the iterative process, and completing training of the T-group navigation control model by using the T-group training data to obtain the parameters of the T-group navigation control modelRespectively utilizing the updated T-group navigation control model to control the mobile robot model to interact with the corresponding T-group training tasks so as to obtain a meta-training data set
Performing meta training on the navigation control model through a meta training data set, wherein the process comprises the following steps:
Constructing a meta-training Loss function according to the Loss function of the navigation control model, wherein the expression formula of the meta-training Loss function Loss meta_l is as follows:
In the formula (i), Expressed as a loss value on the ith group of meta-training data, T is expressed as the number of groups of meta-training data;
Calculation of Meta training is carried out on the navigation control model through a meta training data set, and a gradient descent method is utilized to record the long and short time memory network/>Performing one-time meta-updating and storing long-short-time memory network/>The parameter of (2) is/>Repeating the N-wheel training process until the navigation control model converges to obtain a final navigation control model; saving the final parameter of the navigation control model as/>
The final navigation control model after training is used, and the final navigation control model is obtained through interaction between a sensor and a real environment and samplingCalculating a Loss function Loss l of the final navigation control model; calculation/>Updating long short-term memory network/>, using gradient descent methodAnd repeatedly sampling on a real environment for a plurality of times, updating and iterating to finish policy migration, adjusting the use sequence of sub-policies in the navigation control model according to the target task data, obtaining an actually used navigation control model, and controlling the mobile robot to navigate by using the actually used navigation control model.
Example two
The present embodiment provides a navigation control system for a mobile robot, where the navigation control system may be applied to the navigation control method of the first embodiment, and the navigation control system includes:
The target task acquisition module is used for controlling the mobile robot to acquire target task data in a real environment by using the trained navigation control model;
the strategy migration module is used for adjusting the use sequence of sub-strategies in the navigation control model according to the target task data;
The navigation module is used for controlling the mobile robot to navigate by using the navigation control model;
The navigation control model construction module is used for constructing a navigation control model by using a hierarchical reinforcement learning algorithm, and introducing the LSTM network into the navigation control model to serve as a track coding network;
The training data set acquisition module is used for building a training environment of the navigation control model and a mobile robot model, and controlling the mobile robot to interact with the training environment through the navigation control model to acquire a plurality of groups of training data sets;
The pre-training module is used for training the navigation control model through the training data set to obtain an updated navigation control model, controlling the mobile robot model to interact with the training environment again by using the updated navigation control model to obtain a multi-component training data set, and performing meta-learning training on the LSTM track coding network of the navigation control model through the meta-training data set; and repeatedly and iteratively updating to obtain a final navigation control model with function convergence.
The mobile robot is provided with a sensor for interacting with a training environment and a real environment; the sensor information detected by the sensor comprises coordinates of a mobile robot, coordinates of a plurality of obstacles, coordinates of a plurality of target points, obstacle areas and target point areas.
Example III
The present embodiment provides a computer-readable storage medium, characterized in that a computer program is stored thereon, which when executed by a processor, implements the steps of the navigation control method of the embodiment.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims (10)

1. A navigation control method of a mobile robot, comprising:
controlling the mobile robot to acquire target task data in a real environment by using the trained navigation control model;
Adjusting the use sequence of sub-strategies in the navigation control model according to the target task data, and navigating the mobile robot by using the navigation control model;
the training process of the navigation control model comprises the following steps:
constructing a navigation control model by using a hierarchical reinforcement learning algorithm, and introducing an LSTM network into the navigation control model to serve as a track coding network;
Building a training environment of a navigation control model and a mobile robot model, and controlling the mobile robot to interact with the training environment through the navigation control model to obtain a plurality of groups of training data sets;
Training the navigation control model through the training data set to obtain an updated navigation control model, controlling the mobile robot model to interact with the training environment again by using the updated navigation control model to obtain a multi-component training data set, and performing meta-learning training on an LSTM track coding network of the navigation control model through the meta-training data set; and repeatedly and iteratively updating to obtain a final navigation control model with function convergence.
2. The method for controlling navigation of a mobile robot according to claim 1, wherein the method for obtaining an updated navigation control model by training the navigation control model with the training data set comprises:
Training a navigation control model through a training data set containing multiple tasks, constructing a loss function of the navigation control model, calculating a training loss value of the navigation control model according to the loss function, iteratively updating navigation control model parameters through a training loss value gradient descent method, and storing the navigation control model parameters of each task.
3. The method for controlling navigation of a mobile robot according to claim 2, wherein the method for performing meta-learning training on the LSTM track coding network of the navigation control model by using the meta-training data set comprises:
And constructing a meta training loss function according to the loss function of the navigation control model, calculating a meta training loss value of the navigation control model, carrying out iterative updating on parameters of an LSTM track coding network in the navigation control model by a meta training loss value gradient descent method, obtaining a final navigation control model with function convergence, and storing the parameters of the final navigation control model.
4. A method for controlling navigation of a mobile robot according to claim 3, wherein the method for constructing a training environment of a navigation control model and a mobile robot model, and controlling the mobile robot to interact with the training environment by the navigation control model to obtain a plurality of sets of training data sets comprises:
Constructing a mobile robot model by adopting a robot physical simulation engine MuJoCo platform, and initializing and setting sensor parameters of the mobile robot model;
Designing a training environment comprising a plurality of obstacle areas and a plurality of target point areas, randomly generating obstacles and target points in the obstacle areas and the target point areas respectively to obtain training tasks, resetting positions of the obstacles and the target points to collect a plurality of groups of training tasks, and controlling the mobile robot to interact with each group of training tasks in the training environment by using a navigation control model to obtain a training data set.
5. The method of claim 4, wherein controlling the mobile robot to interact with each set of training tasks in the training environment via the navigation control model to obtain the training data set comprises:
Generating a group of training tasks in a training environment, putting the mobile robot model into the training environment, and acquiring sensor information through a sensor of the mobile robot model; encoding track information according to the sensor information, and outputting a track state z t and a memory hidden variable (h t,ct);
The top-level strategy network pi Ω in the navigation control model selects strategy sequence number omega t according to the obtained track state z t, and sub-strategy networks corresponding to the strategy sequence number omega t in the navigation control model are started up The sub-policy network/>Outputting action a t according to track state z t;
After the mobile robot model executes the action a t, the mobile robot model interacts with the training environment to obtain the reward r t, if an obstacle is encountered, r t = -1, if a target point is encountered, r t =1, otherwise, r t =0;
The mobile robot model acquires the next group of sensor information through the sensors and encodes track information to acquire a new track state z t+1;
termination network for navigation control model Selecting whether to terminate the sub-policy network/>, based on the trace state z t+1 Executing the action; if terminator policy network/>Performing an action to reselect a starting sub-policy network/>, through a value function network Q U of a navigation control model
Controlling the mobile robot model to interact with the set of training tasks by using the navigation control model to obtain a set of training data setsResetting training task in training environment, repeating iterative process to obtain multiple sets of training data set
6. The method of claim 5, wherein the method of encoding the trajectory information based on the sensor information, outputting the trajectory state z t and the memory hidden variable (h t,ct) comprises:
Acquiring the state s t of the mobile robot model at the current moment by using the sensor information, and reading the memory hidden variable (h t-1,ct-1) stored at the last moment, wherein the memory hidden variable (h t-1,ct-1) of the initial state of the mobile robot model is a zero vector; long-short time memory network for inputting memory hidden variable (h t-1,ct-1) and state s t into navigation control model The encoded track information is performed, and the track state z t and the memory hidden variable (h t,ct) are output.
7. The method according to claim 6, wherein the Loss functions of the navigation control model include Loss functions Loss c, loss function Loss a, and Loss function Loss l;
The expression formula of the Loss function Loss c is:
The expression formula of the Loss function Loss a is:
The expression formula of the Loss function Loss l is:
Lossl=Lossa+Lossc
In the formula (i), Indicating that the sub-strategy that maximizes jackpot expectations is selected when the track state is zi Representing that the track state is zi+1, and selecting sub-strategy/>Obtaining the expected value of the maximum accumulated rewards; gamma is denoted as discount rate and ranges from [0,1].
8. The method of claim 7, wherein the method of using the updated navigational control model to control the mobile robot model to interact again with the training environment to obtain the multicomponent training data set comprises:
using a set of training data According to the Loss function of the navigation control model, gradient/>, taking parameter U as independent variable, of the Loss function Loss c is calculated respectivelyGradient/>, with parameter θ as argument, of Loss function Loss a Loss function Loss a is expressed as a parameter/>Gradient/>, which is an argumentLoss function Loss l is expressed as a parameter/>Gradient as an argumentValue function network Q U, sub-policy network/>, using gradient descent methodTerminating network/>And long and short term memory network/>Is updated to U ', (ω, θ'),/> And saving navigation control model parameters;
controlling the mobile robot model to interact with the set of training tasks again by using the updated navigation control model to obtain a set of metadata training data Repeating the iterative process to obtain T-group meta-training data, and constructing a meta-training data set
9. A navigation control system for a mobile robot, comprising:
The target task acquisition module is used for controlling the mobile robot to acquire target task data in a real environment by using the trained navigation control model;
the strategy migration module is used for adjusting the use sequence of sub-strategies in the navigation control model according to the target task data;
The navigation module is used for controlling the mobile robot to navigate by using the navigation control model;
The navigation control model construction module is used for constructing a navigation control model by using a hierarchical reinforcement learning algorithm, and introducing the LSTM network into the navigation control model to serve as a track coding network;
The training data set acquisition module is used for building a training environment of the navigation control model and a mobile robot model, and controlling the mobile robot to interact with the training environment through the navigation control model to acquire a plurality of groups of training data sets;
The pre-training module is used for training the navigation control model through the training data set to obtain an updated navigation control model, controlling the mobile robot model to interact with the training environment again by using the updated navigation control model to obtain a multi-component training data set, and performing meta-learning training on the LSTM track coding network of the navigation control model through the meta-training data set; and repeatedly and iteratively updating to obtain a final navigation control model with function convergence.
10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which program, when being executed by a processor, realizes the steps of the navigation control method according to any one of claims 1 to 8.
CN202210383369.XA 2022-04-13 2022-04-13 Navigation control method, system and storage medium of mobile robot Active CN114905505B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210383369.XA CN114905505B (en) 2022-04-13 2022-04-13 Navigation control method, system and storage medium of mobile robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210383369.XA CN114905505B (en) 2022-04-13 2022-04-13 Navigation control method, system and storage medium of mobile robot

Publications (2)

Publication Number Publication Date
CN114905505A CN114905505A (en) 2022-08-16
CN114905505B true CN114905505B (en) 2024-04-19

Family

ID=82765617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210383369.XA Active CN114905505B (en) 2022-04-13 2022-04-13 Navigation control method, system and storage medium of mobile robot

Country Status (1)

Country Link
CN (1) CN114905505B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260026A (en) * 2020-01-10 2020-06-09 电子科技大学 Navigation migration method based on meta reinforcement learning
WO2020154542A1 (en) * 2019-01-23 2020-07-30 Google Llc Efficient adaption of robot control policy for new task using meta-learning based on meta-imitation learning and meta-reinforcement learning
CN111506063A (en) * 2020-04-13 2020-08-07 中国科学技术大学 Mobile robot map-free navigation method based on layered reinforcement learning framework
CN111783994A (en) * 2020-05-29 2020-10-16 华为技术有限公司 Training method and device for reinforcement learning
CN111942621A (en) * 2020-07-17 2020-11-17 北京控制工程研究所 On-orbit autonomous filling control method and system based on multitask learning
CN112433525A (en) * 2020-11-16 2021-03-02 南京理工大学 Mobile robot navigation method based on simulation learning and deep reinforcement learning
CN112809689A (en) * 2021-02-26 2021-05-18 同济大学 Language-guidance-based mechanical arm action element simulation learning method and storage medium
CN112947081A (en) * 2021-02-05 2021-06-11 浙江大学 Distributed reinforcement learning social navigation method based on image hidden variable probability model
CN113609786A (en) * 2021-08-27 2021-11-05 中国人民解放军国防科技大学 Mobile robot navigation method and device, computer equipment and storage medium
WO2022012265A1 (en) * 2020-07-13 2022-01-20 Guangzhou Institute Of Advanced Technology, Chinese Academy Of Sciences Robot learning from demonstration via meta-imitation learning

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020154542A1 (en) * 2019-01-23 2020-07-30 Google Llc Efficient adaption of robot control policy for new task using meta-learning based on meta-imitation learning and meta-reinforcement learning
CN111260026A (en) * 2020-01-10 2020-06-09 电子科技大学 Navigation migration method based on meta reinforcement learning
CN111506063A (en) * 2020-04-13 2020-08-07 中国科学技术大学 Mobile robot map-free navigation method based on layered reinforcement learning framework
CN111783994A (en) * 2020-05-29 2020-10-16 华为技术有限公司 Training method and device for reinforcement learning
WO2022012265A1 (en) * 2020-07-13 2022-01-20 Guangzhou Institute Of Advanced Technology, Chinese Academy Of Sciences Robot learning from demonstration via meta-imitation learning
CN111942621A (en) * 2020-07-17 2020-11-17 北京控制工程研究所 On-orbit autonomous filling control method and system based on multitask learning
CN112433525A (en) * 2020-11-16 2021-03-02 南京理工大学 Mobile robot navigation method based on simulation learning and deep reinforcement learning
CN112947081A (en) * 2021-02-05 2021-06-11 浙江大学 Distributed reinforcement learning social navigation method based on image hidden variable probability model
CN112809689A (en) * 2021-02-26 2021-05-18 同济大学 Language-guidance-based mechanical arm action element simulation learning method and storage medium
CN113609786A (en) * 2021-08-27 2021-11-05 中国人民解放军国防科技大学 Mobile robot navigation method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于强化学习和模糊逻辑的移动机器人导航;卓睿, 陈宗海, 陈春林;计算机仿真(08);全文 *

Also Published As

Publication number Publication date
CN114905505A (en) 2022-08-16

Similar Documents

Publication Publication Date Title
CN109655066A (en) One kind being based on the unmanned plane paths planning method of Q (λ) algorithm
CN113805572A (en) Method and device for planning movement
CN109726676B (en) Planning method for automatic driving system
CN101871782B (en) Position error forecasting method for GPS (Global Position System)/MEMS-INS (Micro-Electricomechanical Systems-Inertial Navigation System) integrated navigation system based on SET2FNN
CN112119409A (en) Neural network with relational memory
CN106338919A (en) USV (Unmanned Surface Vehicle) track tracking control method based on enhanced learning type intelligent algorithm
CN111783994A (en) Training method and device for reinforcement learning
CN109858137A (en) It is a kind of based on the complicated maneuvering-vehicle track estimation method that can learn Extended Kalman filter
CN115280322A (en) Hidden state planning actor control using learning
CN110268338A (en) It is inputted using vision and carries out Agent navigation
CN114521262A (en) Controlling an agent using a causal correct environment model
CN116848532A (en) Attention neural network with short term memory cells
CN116147627A (en) Mobile robot autonomous navigation method combining deep reinforcement learning and internal motivation
CN115206099A (en) Self-adaptive path inference method for vehicle GPS track
CN114626505A (en) Mobile robot deep reinforcement learning control method
CN116300977B (en) Articulated vehicle track tracking control method and device based on reinforcement learning
CN114905505B (en) Navigation control method, system and storage medium of mobile robot
CN118043824A (en) Retrieval enhanced reinforcement learning
CN115009291A (en) Automatic driving aid decision-making method and system based on network evolution replay buffer area
CN115016499A (en) Path planning method based on SCA-QL
CN115576317A (en) Multi-preview-point path tracking control method and system based on neural network
CN114967472A (en) Unmanned aerial vehicle trajectory tracking state compensation depth certainty strategy gradient control method
Ge et al. Deep reinforcement learning navigation via decision transformer in autonomous driving
CN114721397A (en) Maze robot path planning method based on reinforcement learning and curiosity
CN116295449B (en) Method and device for indicating path of autonomous underwater vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant