CN113485323B - Flexible formation method for cascading multiple mobile robots - Google Patents

Flexible formation method for cascading multiple mobile robots Download PDF

Info

Publication number
CN113485323B
CN113485323B CN202110655081.9A CN202110655081A CN113485323B CN 113485323 B CN113485323 B CN 113485323B CN 202110655081 A CN202110655081 A CN 202110655081A CN 113485323 B CN113485323 B CN 113485323B
Authority
CN
China
Prior art keywords
robot
mobile robot
formation
mobile
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110655081.9A
Other languages
Chinese (zh)
Other versions
CN113485323A (en
Inventor
董璐
何子辰
孙长银
王嘉伟
薛磊
潘晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202110655081.9A priority Critical patent/CN113485323B/en
Publication of CN113485323A publication Critical patent/CN113485323A/en
Application granted granted Critical
Publication of CN113485323B publication Critical patent/CN113485323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0287Control of position or course in two dimensions specially adapted to land vehicles involving a plurality of land vehicles, e.g. fleet or convoy travelling
    • G05D1/0291Fleet control
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0219Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory ensuring the processing of the whole working surface
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides a flexible formation method of cascading multiple mobile robots, which is based on a strategy gradient algorithm combining priori nonlinear distance-angle-heading formation control knowledge and continuous control, avoids blind exploration of the mobile robots, improves training convergence speed, avoids a fussy coefficient tuning process, and simultaneously introduces a near-end strategy to optimize flexible obstacle avoidance capability of independently training a single mobile robot to cope with local static and dynamic obstacles. The method is divided into a training stage and an reasoning stage, a complex online resolving process is transferred to offline, a formation and a flexible obstacle avoidance strategy are independently trained based on course learning ideas, and meanwhile, a pre-training strategy is flexibly called in a reasoning link, so that the whole formation has higher autonomy and flexibility.

Description

Flexible formation method for cascading multiple mobile robots
Technical Field
The invention belongs to the field of multi-mobile robots, in particular relates to a multi-mobile robot flexible formation method based on a cascade architecture, and particularly relates to a cascade multi-mobile robot formation method based on reinforcement learning and priori nonlinear distance-angle-heading formation control.
Background
With the development of robot technology, the multi-mobile robot formation operation effectively improves the operation efficiency by virtue of the cooperation capability, and gradually replaces the traditional single-machine operation. For example, a plurality of underwater robots search through a collaborative formation. Military, unmanned aerial vehicle cluster, many ground mobile robot are arranged thunder, search and rescue, investigation etc. do not embody the advantage characteristic of many machine formations. Recently, in order to replace the traditional manual mode with the sterilizing mobile robots in China for sterilizing the hospitals, a plurality of sterilizing mobile robots cooperate through formation, and the efficiency of single machine operation is effectively improved.
The piloting following formation strategy based on distance-angle-heading is one of common technologies for realizing the formation tracking of a plurality of mobile robots, and the method has better flexibility and expansibility compared with the traditional piloting following formation strategy. The basic idea of the strategy is to preset one robot as a navigator and the other robots as trackers, and then determine the relative distance, relative angle and heading between the navigator robot and the follower robot through preset formation so as to design a formation control strategy.
Currently, the main current method for realizing the pilot following of the distance, the angle and the course comprises nonlinear control, nonlinear model predictive control and the like. The former includes input-output feedback linearization control, feedback control, and the like. Because more performance gain parameters are introduced, a complicated parameter adjustment process cannot be avoided; the latter is highly dependent on the accurate model and has high requirements on the on-line calculation speed. On the other hand, the robustness of the traditional pilot following formation model needs to be improved, and certain flexible obstacle avoidance and formation recovery capability is lacked.
With the development of artificial intelligence technology, the deep reinforcement learning technology is widely applied to related tasks of an end-to-end mobile robot due to the advantages of no model, offline training and the like, but is mostly in the field of single robots; the end-to-end implementation mode in the field of multiple machines has the advantages of harsher requirements on the performances of the sensor and the actuator, higher dimensions of states and action spaces, higher training cost and higher reasoning and reproduction difficulty in the process of landing an actual mobile robot.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a flexible formation method of a multi-mobile robot with certain flexible obstacle avoidance and formation recovery capability.
The invention adopts the following technical scheme. The utility model provides a flexible formation method of cascading multiple mobile robots, which comprises the steps of determining a dynamic model according to the distance, angle and heading among robots based on the selected formation; determining a priori controller of a reinforcement learning architecture in a flexible formation method of the nonlinear mobile robot according to the dynamics model and the dynamics model constraint; determining an action space based on the super parameters of the pose vectors of the mobile robots, wherein the action space comprises a formation tracking action space of two adjacent mobile robots and an action space required by each mobile robot to independently and flexibly avoid an obstacle; determining a state space according to the tracking error of the gesture and the speed of the mobile robot, wherein the state space comprises: each mobile robot in the current time step tracks a state space of tracking errors of the corresponding virtual mobile robot, a state space between adjacent mobile robots, and a state space required by each mobile robot to describe surrounding environment information; setting reinforcement learning reward functions, wherein the reward functions comprise formation reward functions and obstacle avoidance reward functions;
Based on the prior controller, performing reinforcement learning training according to the action space, the state space and the rewarding function through interaction with the environment, and completing training to obtain the cascading multi-mobile robot flexible formation method comprising the formation strategy and the flexible obstacle avoidance strategy.
Further, the kinetic equation is described as follows:
wherein η= [ x, y, θ ]] T Representing the pose vector of each mobile robot, wherein (x, y) is the position of each mobile robot, and θ is the angle of each mobile robot;for the speed of the mobile robot, +.>Omega is the current angular velocity of the mobile robot, v r And v l Respectively representing the speeds of the left wheel and the right wheel of the mobile robot;
the dynamic model constraint form is as follows:
still further, the method for determining the prior controller of the reinforcement learning architecture in the flexible formation method of the nonlinear mobile robot specifically includes: s31, determining that the expected track of the virtual expected mobile robot is defined as eta r =[x r ,y rr ] T ,(x r ,y r ) θ is the position of the virtually expected mobile robot r For the angle of the virtual expected mobile robot, the tracking error of the pose of the mobile robot and the tracking error of the speed, which are determined by the mobile robot according to the virtual expected trajectory, are expressed as:
e x Position tracking error in the x direction; e, e y Is the position tracking error in the y direction; e, e θ Is the tracking error of azimuth;the speed tracking errors in the x direction and the y direction are respectively; />Is an angular velocity tracking error; />Is the desired angular velocity of the virtual robot;
s32, determining an expected formation model between the distance, the angle and the heading of adjacent mobile robots, wherein the expected formation model is specifically described as follows:
wherein v is 1 ,v 2 The virtual robot objects representing the adjacent mobile robots to be tracked are respectively marked as a virtual robot 1 and a virtual robot 2, (x) v1 ,y v1 ) Is the position of the virtual robot 1, (x) v2 ,y v2 ) θ is the position of the virtual robot 2 v1 Angle θ of virtual robot 1 v2 Is the angle of the virtual robot 2; d, d v2v1 The relative distance between adjacent mobile robots v1, v 2; phi (phi) v2v1 The relative angles of adjacent mobile robots v1, v 2; beta v2v1 An angle correction amount for the mobile robot to maintain the same azimuth angle;
s33, combining (1) - (4) with a feedback linearization nonlinear control theory, and describing the prior formation control of the adjacent mobile robots in the following form:
wherein v is 1 Speed v for the virtual robot 1 to meet preset formation requirements 2 Speed w meeting preset formation requirements for virtual robot 2 1 Angular velocity, w, meeting preset formation requirements for virtual robot 1 2 Angular velocity satisfying preset formation requirements for the virtual robot 2,nonlinear braiding for virtual robot 1The performance of the team a priori controller is over-parametered,the performance super-parameters of the prior controller are formed for the nonlinearity of the virtual robot 2, and the control performance of the prior controller is directly determined by the performance super-parameters.
Further, the formation tracking action space of the two adjacent mobile robots is expressed as follows;
wherein,tracking the performance super-parameters of the virtual robot 1 nonlinear formation a priori controller for mobile robots,/-for mobile robots>The performance hyper-parameters of the non-linear formation a priori controllers of the virtual robot 2 are tracked for neighboring mobile robots of the mobile robot,
the action space required by each mobile robot to independently and flexibly avoid the obstacle is expressed as follows;
wherein v is discrete And omega discrete The speed command and the angular speed command of the mobile robot are discretized respectively.
Further, the state space in which each mobile robot tracks the tracking error of the corresponding virtual mobile robot at the current time step is represented as follows:
the state space between adjacent mobile robots is represented as follows:
wherein,tracking a position tracking error of the virtual robot 1 in the x direction for the mobile robot; />Tracking a position tracking error of the virtual robot 1 in the y direction for the mobile robot; / >Tracking a tracking error of the virtual robot 1 in an azimuth for the mobile robot; />The position tracking error of the virtual robot 2 in the x-direction is tracked for the neighboring mobile robots of the mobile robot,tracking a position tracking error of the virtual robot 2 in the y direction for an adjacent mobile robot of the mobile robot; />Tracking a tracking error of the virtual robot 2 in an azimuth for an adjacent mobile robot of the mobile robot; e, e 1 Tracking error, e, of the virtual robot 1 for the mobile robot 2 Tracking errors of the virtual robot 2 for adjacent robots of the mobile robot;
and->Representing the lower phase of each time step tA distance, an angle and a formation state quantity of heading between adjacent mobile robots; i U 1 || 2 ,||u 2 || 2 ,/>Representing the relative values of speed, angular speed and acceleration between robot 1 and robot 2, respectively, with respect to the virtual robot, the purpose of this term is to hope that the mobile robot will run at a continuous and smooth speed and acceleration, where u 1 || 2 The speed value of the mobile robot 1 relative to the virtual robot, including speed and angular speed; />The acceleration value of the robot 1 relative to the virtual robot includes acceleration and angular acceleration; i U 2 || 2 Is a velocity value between the mobile robot 2 and the virtual robot, including a velocity and an angular velocity; / >The acceleration value of the mobile robot 2 with respect to the virtual robot includes acceleration and angular acceleration.
The state space required for each mobile robot to describe the surrounding information is represented as follows:
wherein eta t Is the pose vector of the mobile robot at the current moment, d r D is the distance between the mobile robot and its desired virtual mobile robot position at the present time ob The distance vector of the obstacle within the distance safety threshold of the mobile robot is a vector, and the distance vector comprises two elements, namely the speed difference between the current moment and the speed difference between the previous moment of the mobile robot and the angular speed difference between the current angular speed and the angular speed difference between the previous moment of the mobile robot.
Still further, the formation reward function between two adjacent mobile robots is specifically described as follows:
wherein ε is thresh To set the threshold, R in the bonus function error_1 Is the sum of penalty terms for tracking errors of the two mobile robots with respect to the desired virtual mobile robot, for inspiring the robot to reduce the tracking error with respect to the desired position as much as possible; r is a prize or penalty value, R formation For rewarding or punishing function, the method is used for guiding the robot to keep consistency of formation, if the dynamic variation range of formation is within a set threshold, a positive rewarding value is fed back, otherwise, a negative punishing value is fed back; r is R velocity The device is used for guiding the mobile robot to maintain consistency of speed and acceleration and maintain a continuous and smooth movement mode.
Still further, the specific form of the obstacle avoidance reward function is as follows:
wherein the reward function R error_2 The method is a punishment item of the mobile robot i for the tracking error of the expected virtual mobile robot, and the formation recovery of the robot is guided; r is R avoid Guiding mobile robot to avoid autonomous obstacle epsilon safe Is a safety threshold, r 1 A penalty value when the distance of the robot from the nearest obstacle is within a safety threshold but has not yet been completely collided. r is (r) 2 The punishment value is a punishment value when the robot collides with an obstacle; r is R delta_yaw The direction angle change of the mobile robot i is made by the punishment value of the direction angle change of the adjacent time steps of the mobile robot, so that the whole motion track is smoother.
Further, in the training process, independent training is performed for two subtasks of formation tracking and flexible obstacle avoidance respectively, and the specific method comprises the following steps:
for formation tracking tasks, the action space is selected as a formation tracking action space a of two adjacent mobile robots 1 space The state space is based on the state space in which each mobile robot tracks the tracking error of the corresponding virtual mobile robot at the current time step State space between adjacent mobile robots +.>
The action value network outputs the evaluation of the current action, takes the Q value of the evaluation output by the current action value network as the weight, and updates the action network based on the strategy gradient;
the specific updating of the action value network is described as follows
Wherein w is i Priority sampling weights calculated for the current time i based on a priority experience replay algorithm; r is (r) i A reward signal for the current moment i; gamma is a discount factor; q (Q) θ′ (s i+1 ,μ′(s i+1 ) For the next moment i+1, the target action μ'(s) is the target action value network i+1 ) S, evaluation of (c) i The state value s of the robot at the current moment i i+1 State value of robot at next time i+1, a i For the action of the robot at the current moment i, N is the sample number of small batch sampling; q (Q) θ (s i ,a i ) The evaluation value of the current action value network to the state and action command of the robot at the current moment i. .
Aiming at flexible obstacle avoidance tasks, a near-end strategy optimization algorithm architecture based on discrete action spaces is adopted, and the action spaces are selected as action spaces required by each mobile robot for independent flexible obstacle avoidanceSelecting a state space as a state space in which each mobile robot tracks a tracking error of a corresponding virtual mobile robot for a current time step +. >State space required for describing surrounding environment information with each mobile robot +.>
Further, the updating method of the target action value network comprises the following steps: after each training of a small batch, the parameters of the updated online action network and online action value network are utilized for updating, and the specific description forms are as follows:
η′←τη”+(1-τ)η′ (14)
wherein η' and η "sub-tables represent target network parameters versus current network parameters, τ being used to control the updated ratio.
Further, the method further comprises a local collision detection step, wherein the local collision detection step is used for detecting the safety distance between the local obstacle and the robot, and if the returned safety distance meets the safety state requirement, the mobile robot individual exits the flexible obstacle avoidance strategy and resumes the formation strategy.
The beneficial technical effects obtained by the invention are that;
the cascade multi-mobile robot flexible formation method provided by the invention is a cascade multi-mobile robot flexible formation method based on reinforcement learning and priori nonlinear distance-angle-heading formation control, so that a plurality of mobile robots can adaptively adjust key parameters in a formation control algorithm, and the stability of formation and the tracking precision are improved; meanwhile, the flexible obstacle avoidance strategy is independently trained, so that each robot in the formation has the capability of flexible obstacle avoidance to a certain extent, and the flexibility and autonomy of each mobile robot in the formation are improved.
The formation tracking architecture of the algorithm designed by the invention is based on a depth-determined strategy gradient algorithm, and the performance and efficiency of the algorithm are further improved by simplifying a random exploration process and introducing a priority experience playback mechanism. The blind exploration is avoided by introducing the priori nonlinear distance-angle-heading formation controller information, so that the training process is more targeted, the algorithm convergence speed is improved, and in the reasoning application process, the priori formation controller information controller can avoid abnormal behaviors damaging the actuator from end to end, and the overall formation robustness is improved.
The obstacle avoidance architecture of the designed algorithm is based on a near-end strategy optimization algorithm, and when the obstacle avoidance is performed flexibly, the action space of the mobile robot is discretized, so that the search space is reduced, and the training complexity is reduced; by introducing a collision detection function module, the obstacle distance is monitored in real time to determine whether a formation tracking mode can be returned.
Preferably, the training of the two sets of architecture is mutually independent, and the two sets of architecture complement each other in the reasoning formation process to jointly complete flexible formation of the multi-mobile robot.
Drawings
FIG. 1 is a schematic illustration of an overall framework of an embodiment of the present invention;
FIG. 2 is a schematic diagram of a training phase of an embodiment of the present invention;
fig. 3 is a schematic diagram of a flexible inference-based formation in accordance with a specific embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and specific examples of the specification.
Examples: a flexible formation method of cascading multiple mobile robots mainly comprises the following steps: s1, selecting formation from a formation library, and confirming the priority of each robot and the specific position in the team according to a distance-angle-heading formation mode;
s2, determining a dynamic model according to the type of the robot;
s3, according to the dynamic model constraint, combining the relative distance constraint, the relative angle constraint and the heading constraint between robots, designing expected priori trajectories of a virtual pilot and a virtual follower, converting the formation problem of an actual robot into a plurality of tracking method problems for tracking the trajectories of the virtual mobile robots, and designing a nonlinear formation tracking priori controller of the angular velocity of the corresponding velocity as a knowledge priori in the whole reinforcement learning framework;
s4, designing a collision detection module of the whole formation algorithm, and detecting the safety distance between the local obstacle and the robot;
S5, designing an action space part of the whole formation algorithm framework, wherein the action space part is mainly divided into two parts, one part is a speed space containing the speed and the angular speed of the mobile robot, and the other part is a parameter space containing all performance parameters of priori nonlinear tracking control knowledge;
s6, designing a state space part in the whole formation algorithm framework, wherein the state space part mainly comprises the position and the posture of each robot and barrier information in the environment;
s7, designing a reward function for guiding robot formation to learn and flexibly avoid the obstacle, wherein the reward function mainly comprises a formation reward function, a tracking reward function and an obstacle avoidance reward function;
s8, constructing a simulation environment for training, so that under the condition of checking nonlinear formation control knowledge in advance, an intelligent agent performs trial and error with the environment, and learns to enable a plurality of mobile robots to complete a flexible and stable formation process strategy and a flexible obstacle avoidance strategy based on distance, angle and direction.
Further, in step S1, each robot type is isomorphic, and the number N of robots is more than or equal to 2;
further, in step S2, taking a two-wheeled differential mobile robot as an example, the kinetic equation is described as follows:
wherein η= [ x, y, θ ]] T Representing a pose vector of each mobile robot; For the speed of the mobile robot, +.>V is the angular velocity of the mobile robot r And v l Respectively representing the speeds of the left wheel and the right wheel; it should be noted that there is a non-complete constraint of the two-wheel drive mobile robot, so that the mobile robot can only move back and forth, but not move left and right, and the constraint is as follows:
further, taking distance-angle-heading formation of multiple mobile robots as an example, the specific design steps of the prior formation control knowledge of the mobile robots in S3 are as follows:
s31, designing a tracking controller between the mobile robot and the virtual expected mobile robot. The desired trajectory of the virtual mobile robot is defined as η r =[x r ,y rr ] T The tracking error of the gesture and the speed is as follows:
s32, designing a desired formation model between distance, angle and heading of adjacent mobile robots, wherein the specific description is as follows:
wherein v1 and v2 respectively represent virtual robot objects to be tracked by adjacent mobile robots, and are marked as a virtual robot 1 and a virtual robot 2, d v2v1 、φ v2v1 、β v2v1 The distance angle and direction between v1 and v2 are expressed as state quantities under the distance-angle-heading formation architecture.
S33, combining (1) - (4) with a feedback linearization nonlinear control theory, and describing the prior formation control of the adjacent mobile robots in the following form:
Wherein v is 1 Speed v for the virtual robot 1 to meet preset formation requirements 2 Speed w meeting preset formation requirements for virtual robot 2 1 Angular velocity, w, meeting preset formation requirements for virtual robot 1 2 Angular velocity [ K ] satisfying preset formation requirements for virtual robot 2 x ,K y ,K θ ]The prior performance super-parameters are controlled for the nonlinear formation of the mobile robot, and the quality of formation tracking is directly determined by the values of the prior performance super-parameters;
further, in S4, the collision detection function module determines a distance to the obstacle through the mobile robot sensor, and outputs a boolean collision warning flag bit;
further, in S5, the action space mainly comprises two parts, one part is the action space required by formation tracking, and the other part is the action space required by flexible obstacle avoidance when detecting a local obstacle, and the specific design is described as follows:
s51, designing formation tracking action spaces of two adjacent mobile robots. The specific method is based on the nonlinear formation knowledge prior involved in the step S33, and the action space is as follows:
wherein [ K ] x ,K y ,K θ ]Controlling a priori performance superparameter for nonlinear formation of the mobile robot;
s52, designing an action space required by each mobile robot to independently and flexibly avoid the obstacle:
Wherein v is discrete And omega discrete Velocity instructions separately discretized for mobile robotsAnd an angular velocity command;
further, in S6, the state space mainly consists of three parts, one part is a state space describing tracking errors of each mobile robot for tracking the corresponding virtual robot, one part is a state space describing meeting distance-angle-heading formation between adjacent mobile robots, and the other part is a state space required for describing surrounding environment information, and the specific design is as follows:
s61, taking two adjacent mobile robots as an example, designing and describing a state space of tracking errors of each mobile robot tracking the corresponding virtual mobile robot in the current time step as follows:
s62, taking two adjacent mobile robots as an example, designing a state space between the two adjacent mobile robots, which meets a distance-angle-heading formation framework, as follows:
d, phi and beta respectively represent the distance, angle and heading formation state quantity between adjacent mobile robots in each time step; i U 1 || 2 ,||u 2 || 2 ,Representing the relative values of speed, angular speed and acceleration between robot 1 and robot 2, respectively, with respect to the virtual robot, the purpose of this term being to hope that the mobile robot operates at a continuous and smooth speed and acceleration;
S63, designing a state space required by each mobile robot to describe surrounding environment information as follows:
wherein eta t Is the pose vector of the mobile robot at the current moment, d r D is the distance between the mobile robot and its desired virtual mobile robot position at the present time ob The distance vector of the obstacle within the distance safety threshold of the mobile robot is a vector, and the distance vector comprises two elements, namely the speed difference between the current moment and the speed difference between the previous moment of the mobile robot and the angular speed difference between the current angular speed and the angular speed difference between the previous moment of the mobile robot.
Further, the S7 winning prize function design may be subdivided into two sub-prize function designs, one for the formation tracking subtask and the other for the flexible obstacle avoidance and formation recovery subtask, namely:
s71. designing a reward function of the formation tracking subtask,
the specific description form of the formation reward function between two adjacent mobile robots is as follows:
wherein R in the bonus function error Is the sum of penalty terms for tracking errors of the two mobile robots with respect to the desired virtual mobile robot, for inspiring the robot to reduce the tracking error with respect to the desired position as much as possible; r is R formation The system is used for guiding the robot to keep consistency of formation, and if the dynamic variation range of formation is within a threshold value, a positive reward is fed back, otherwise, a negative penalty is fed back; r is R velocity The device is used for guiding the mobile robot to maintain consistency of speed and acceleration and maintain a continuous and smooth movement mode;
s72, designing a flexible obstacle avoidance reward function of the mobile robot i, wherein the specific form is as follows:
wherein the reward function R error Is a mobile robot i virtual for expectationsA punishment item of tracking errors of the quasi-mobile robot guides formation recovery of the robot; r is R avoid Guiding the mobile robot to avoid autonomous obstacles; r is R delta_yaw Limiting the change of the direction angle of the mobile robot i to save energy;
optionally, the reward function designed in S72 only occurs in the obstacle avoidance task stage, and is used for inspiring the mobile robot to quickly avoid the local obstacle, when judging that the obstacle is far away from the obstacle through S4, the obstacle avoidance task stage is exited, the task is switched to the formation tracking subtask, and the formation is recovered and maintained under the guidance of the reward function of S71;
further, in S8, in the training process, independent training is performed for two subtasks of formation tracking and flexible obstacle avoidance, which is specifically described as follows:
s81, aiming at formation tracking tasks, adopting a deterministic strategy gradient algorithm architecture based on continuous action space, wherein the action space is selected as a 1 space The state space is based onAnd->The algorithm generally follows an "actor-commentator" pattern, but unlike other reinforcement learning algorithms, the biggest advantage of this algorithm is that the output of the action network is a deterministic action rather than a strategic distribution.
On the other hand, the action value network outputs an evaluation of the current action, and then the action network is updated based on the policy gradient with the Q value of the evaluation output by the current action value network as a weight. The updating of the action value network is based on the offline target action network and the target action value network, and the method has the advantages that the parameter change of the target network is small, so that the training process is more stable.
The specific updating of the action value network is described below
Wherein w is i Priority sampling weights calculated for priority based empirical replay algorithms; r is (r) i Is the current reward signal; gamma is a discount factor; q (Q) θ′ (s i+1 ,μ′(s i+1 ) For the next moment, the target motion μ'(s) is the target motion value network i+1 ) Is (1) evaluated by
Preferably, the updating of the target network is based on a soft update strategy, and after each small batch is trained, the updated parameters of the online action network and the online action value network are used for updating, and the specific description form is as follows:
η′←τη”+(1-τ)η′ (14)
Wherein η' and η "sub-tables represent target network parameters versus current network parameters, τ being used to control the updated ratio.
τ is used to control the update proportion, and this soft update method reduces the influence of abnormal parameters, and avoids abnormal parameter jump in the parameter update process.
S82, aiming at flexible obstacle avoidance tasks, adopting a near-end strategy optimization algorithm architecture based on discrete action space, and selecting the action space asThe selection state space is +.>And->
The near-end strategy optimization algorithm optimizes problems of slow parameter updating, low data utilization rate and the like of an online strategy of a traditional strategy gradient algorithm, introduces a resampling mechanism on the basis of a generalized advantage evaluation algorithm to convert the online strategy into an offline strategy to improve the data utilization rate, and simultaneously obtains a more stable training process based on KL divergence or cutting operation constraint parameter updating amplitude.
Further, S9, building the whole cascade formation control algorithm based on reasoning is completed according to the formation strategy of offline learning in S7.
In S9, a mobile robot flexible formation algorithm framework based on reasoning is constructed by utilizing the formation and flexible obstacle avoidance strategy trained in S8, and the specific process is described as follows:
S91, determining formation requirements and task environments;
s92, a pre-trained prior-based formation strategy and a flexible obstacle avoidance strategy in the mobile robot formation loading S8;
s93, carrying out formation tracking by adopting a formation tracking strategy according to the interaction information of the mobile robot formation and the environment, and carrying out local collision detection on each robot individual;
s94, if flexible obstacle avoidance is needed, mobile robot formation switches individual flexible obstacle avoidance strategies, and real-time obstacle avoidance is carried out according to the interaction information with the environment;
s95, if the local collision detection functional module returns to a safe state, the mobile robot individual exits the flexible obstacle avoidance strategy and rapidly recovers the formation state;
s96. repeating S93 to S95 until the target point is reached.
The invention provides a cascade multi-mobile robot flexible formation method based on reinforcement learning and priori nonlinear distance-angle-course formation control, which is characterized in that the method transfers the calculated power consumption of on-line calculation to off-line through reinforcement learning to realize the multi-mobile robot flexible formation based on reasoning;
the training stage is used for independently training the formation tracking strategy and the flexible obstacle avoidance strategy, so that the training difficulty is reduced, and meanwhile, the nonlinear distance-angle-heading formation control prior is introduced, the training speed is improved, and the complicated parameter tuning process is avoided; in the reasoning stage, independent strategies based on offline training are combined, so that task requirements of autonomous stable formation and flexible obstacle avoidance are met. Compared with the existing formation tracking control algorithm based on piloting-tracking, the automatic tracking capability of the robot formation is provided, and meanwhile, the independent obstacle avoidance capability of each mobile robot facing to local static and dynamic obstacles is provided, so that the method has the characteristics of autonomy, stability, high efficiency and flexibility.
The overall framework of this embodiment is shown in fig. 1, where 1 is an offline independent training framework, 2 is an inference flexible formation framework, 21 is a flexible formation obstacle avoidance strategy, and 3 is a simulation interaction environment.
Firstly, in the training stage 1, training a formation strategy and a flexible obstacle avoidance strategy respectively; the training of the formation strategy is based on priori formation experience, so that the training process and the convergence speed are accelerated, blind exploration of multiple mobile robots is prevented, and the stability of formation is improved; after training, storing two strategy parameters;
then, in the reasoning stage 2, the multi-mobile robot flexibly invokes the flexible formation obstacle avoidance strategy 21 based on experience to perform autonomous formation tracking and flexible obstacle avoidance, and the on-line calculation process is migrated to be off-line in the mode, so that the method is more efficient and stable.
The frame of the training stage is shown in fig. 2, and the whole strategy of training is based on the idea of course learning, namely the training environment is simplified to be complex, and the strategy performance is gradually improved; FIG. 2 is a training environment for formation based on a priori distance-angle-heading formation control; 2, performing flexible obstacle avoidance environment for each mobile robot; 3 is a continuous deterministic strategy gradient algorithm agent; 4 is a discrete near-end strategy optimization algorithm agent; 5 is a formation strategy parameter and a flexible obstacle avoidance strategy parameter which are stored offline after training, and the specific process is described as follows:
The formation strategy training process is described as follows:
firstly, configuring a plurality of simple to complex training environments according to the idea of course learning, for example, training the formation environments of two mobile robots and then gradually increasing the number of the formation robots;
then, initializing action network, target action network, action value network and target action value network parameters in the continuous deterministic strategy gradient algorithm intelligent agent 3 in fig. 2 for each preset simulation environment; for each iteration cycle, initializing a formation training environment 1 based on a distance-angle-heading formation control prior, followed by each time step:
step 1: selecting an action according to a strategy within a threshold range of the action space, and adding random Gaussian noise to improve random exploration performance;
step 2: interacting with a formation training environment, namely inputting a selected deterministic action into a cascade priori formation controller of a plurality of mobile robots designed by combining a preset formation mode and the geometric relations of distance, angle and heading, wherein the specific description is as shown in the formula (5), inputting a priori formation upper-layer control speed and angular speed instruction into each mobile robot, and updating the current state of the environment while finishing the instruction by the robot, and feeding back a state value, a reward value and a Boolean flag bit for indicating whether a task is ended or interrupted;
Step 3: storing the information fed back by the environment and the calculated priority into an experience pool to be used as training data;
step 4: when the capacity of the experience pool overflows, sampling is carried out according to the priority, and an action value network and an action network are trained and updated; the method specifically comprises the following steps:
step 4-1: inputting the sampled state value into an action network to obtain an action;
step 4-2: inputting the action and the sampled state value into an action value network to obtain a Q value;
step 4-3: according to the Q value, the counter-propagation updates the action network according to the strategy gradient;
step 4-4: repeating the steps 4-2 and 4-3 to obtain the Q value calculated by the action through the action value network after updating;
step 4-5: inputting the next state value in the sampled experience into a target action network to obtain a target action;
step 4-6: inputting the action and the state into a target state network to obtain a target Q value;
step 4-7: according to the formula (13), the target Q value and the Q value calculated in the step (4-4) are combined with the priority weight coefficient to update the action value network;
step 5: repeating the steps 1-4;
step 6: after a certain time step is met, respectively carrying out soft update on the target action network and the target action value network according to a formula (14);
Step 7: and storing each network parameter of the final formation strategy for being called in the next training or reasoning.
The training process of the autonomous and flexible obstacle avoidance strategy of the individual is as follows:
firstly, configuring a plurality of simple to complex training environments according to the idea of course learning, for example, firstly training an obstacle avoidance strategy of a mobile robot in a static obstacle environment, and then training the obstacle avoidance strategy of the mobile robot in a dynamic obstacle environment;
then, initializing a strategy network and a value network in 2-4 for each preset simulation environment; in each iteration cycle, the corresponding environment is initialized, and then in each time step:
step 1: in a policy network, inputting environmental state information to obtain policy distribution, and sampling an action in a discrete action space according to the distribution;
step 2: inputting actions into an environment, interacting with a flexible obstacle avoidance environment, updating the environment, feeding back a state value, a reward value and a Boolean flag bit for indicating whether a task is ended or interrupted;
step 3: repeating the step 1, sampling a certain experience, and storing;
step 4: inputting the state of the last step in the step 3 into a value network to obtain a state value, and then backtracking to calculate discount rewards values in so many time steps;
Step 5: inputting all stored experiences into a value network, and calculating a dominance value by utilizing generalized dominance assessment;
step 6: back-propagating the update value network according to the calculated dominance value;
step 7: inputting all state values in the storage experience into a strategy network and a past strategy network to respectively obtain different strategy distribution, converting the same strategy into different strategies by resampling, and back-propagating and updating the strategy network;
step 8: repeating steps 5-6, and then updating past policy network parameters by using the policy network parameters;
step 9: repeating the steps 1-8, and storing each network parameter of the final flexible obstacle avoidance strategy for the next training or reasoning to call.
The method for flexibly forming the cascade multi-mobile robot based on reinforcement learning and priori nonlinear distance-angle-heading formation control provided by the embodiment is carried out according to the steps shown in fig. 3 when deployment application is actually carried out:
step 1: acquiring an expected track of an upper-layer motion plan for formation tracking;
step 2: the specific formation form requirements of the formation task are defined, priori formation control information is obtained, and the task environment is defined;
step 3: loading an offline formation tracking strategy and a flexible obstacle avoidance strategy which are pre-trained in a training stage;
Step 4: according to a pre-trained formation strategy, after the state of the mobile robot is obtained, the action network feeds back actions, and the mobile robot carries out formation tracking tasks according to the actions;
step 5: performing local collision detection to ensure the safety of formation tracking, if the distance between an obstacle and a certain mobile robot is within a safety threshold, jumping to the step 6, otherwise, performing the step 7;
step 6: the mobile robot samples discrete actions from the distribution output by the strategy network, performs local obstacle avoidance, quickly returns to the position in the formation with the smallest error, and continuously tracks the virtual mobile robot in the corresponding formation mode;
step 7: and (4) whether the target point of the expected track is reached, if not, returning to the step (4), and continuing formation tracking.
The method provided by the invention is based on a strategy gradient algorithm combining priori nonlinear distance-angle-heading formation control knowledge and continuous control, avoids blind exploration of the mobile robot, improves training convergence speed, avoids a tedious coefficient tuning process, and simultaneously introduces a near-end strategy to optimize the flexible obstacle avoidance capability of independently training a single mobile robot to cope with local static and dynamic obstacles. The method is divided into a training stage and an reasoning stage, a complex online resolving process is transferred to offline, a formation and a flexible obstacle avoidance strategy are independently trained based on course learning ideas, and meanwhile, a pre-training strategy is flexibly called in a reasoning link, so that the whole formation has higher autonomy and flexibility.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are all within the protection of the present invention.

Claims (7)

1. A flexible formation method of cascading multiple mobile robots is characterized by comprising the steps of determining a dynamic model according to the distance, angle and heading among the robots based on a selected formation; determining a priori controller of a reinforcement learning architecture in a flexible formation method of the nonlinear mobile robot according to the dynamics model and the dynamics model constraint; determining an action space based on the super parameters of the pose vectors of the mobile robots, wherein the action space comprises a formation tracking action space of two adjacent mobile robots and an action space required by each mobile robot to independently and flexibly avoid an obstacle; determining a state space according to the tracking error of the gesture and the speed of the mobile robot, wherein the state space comprises: each mobile robot in the current time step tracks a state space of tracking errors of the corresponding virtual mobile robot, a state space between adjacent mobile robots, and a state space required by each mobile robot to describe surrounding environment information; setting reinforcement learning reward functions, wherein the reward functions comprise formation reward functions and obstacle avoidance reward functions;
based on the prior controller, performing reinforcement learning training according to an action space, a state space and a reward function through interaction with an environment, and completing training to obtain a cascading multi-mobile robot flexible formation method comprising a formation strategy and a flexible obstacle avoidance strategy;
The kinetic equation of the kinetic model is described as follows:
wherein η= [ x, y, θ ]] T Representing the pose vector of each mobile robot, wherein (x, y) is the position of each mobile robot, and θ is the angle of each mobile robot;for the speed of the mobile robot, +.>Omega is the current angular velocity of the mobile robot, v r And v l Respectively representing the speeds of the left wheel and the right wheel of the mobile robot;
the dynamic model constraint form is as follows:
the method for determining the prior controller of the reinforcement learning architecture in the flexible formation method of the nonlinear mobile robot specifically comprises the following steps: s31, determining that the expected track of the virtual expected mobile robot is defined as eta r =[x r ,y rr ] T ,(x r ,y r ) θ is the position of the virtually expected mobile robot r For the angle of the virtual expected mobile robot, the tracking error of the gesture and the tracking error of the speed of the mobile robot according to the virtual expected trajectory are expressed as:
e x position tracking error in the x direction; e, e y Is the position tracking error in the y direction; e, e θ Is azimuth angleTracking errors;the speed tracking error in the x direction and the speed tracking error in the y direction are respectively; />Is an angular velocity tracking error; />Is the desired angular velocity of the virtual robot;
s32, determining an expected formation model between the distance, the angle and the heading of adjacent mobile robots, wherein the expected formation model is specifically described as follows:
Wherein v is 1 ,v 2 The virtual robot objects representing the adjacent mobile robots to be tracked are respectively marked as a virtual robot 1 and a virtual robot 2, (x) v1 ,y v1 ) Is the position of the virtual robot 1, (x) v2 ,y v2 ) θ is the position of the virtual robot 2 v1 Angle θ of virtual robot 1 v2 Is the angle of the virtual robot 2; d, d v2v1 The relative distance between adjacent mobile robots v1, v 2; phi (phi) v2v1 The relative angles of adjacent mobile robots v1, v 2; beta v2v1 An angle correction amount for the mobile robot to maintain the same azimuth angle;
s33, combining (1) - (4) with a feedback linearization nonlinear control theory, and describing the prior formation control of the adjacent mobile robots in the following form:
wherein v is 1 Speed v for the virtual robot 1 to meet preset formation requirements 2 Speed w meeting preset formation requirements for virtual robot 2 1 Angular velocity, w, meeting preset formation requirements for virtual robot 1 2 Angular velocity satisfying preset formation requirements for the virtual robot 2,performance superparameter of a priori controller for nonlinear formation of virtual robot 1, +.>The method comprises the steps that the performance super-parameters of the prior controller are formed for the nonlinearity of the virtual robot 2, and the control performance of the prior controller is directly determined by the performance super-parameters;
in the training process, independent training is respectively carried out for two subtasks of formation tracking and flexible obstacle avoidance, and the specific method comprises the following steps:
For formation tracking tasks, the action space is selected as a formation tracking action space a of two adjacent mobile robots 1 space The state space is based on the state space in which each mobile robot tracks the tracking error of the corresponding virtual mobile robot at the current time stepState space between adjacent mobile robots +.>
The action value network outputs the evaluation of the current action, takes the Q value of the evaluation output by the current action value network as the weight, and updates the action network based on the strategy gradient;
the specific updating of the action value network is described as follows
Wherein w is i Playback of calculations based on priority experience for current time instant iPriority sampling weight calculated by the method; r is (r) i A reward signal for the current moment i; gamma is a discount factor; q (Q) θ′ (s i+1 ,μ′(s i+1 ) For the next moment i+1, the target action μ'(s) is the target action value network i+1 ) S, evaluation of (c) i The state value s of the robot at the current moment i i+1 For the state value of the robot at the next moment i+1, a i For the action of the robot at the current moment i, N is the sample number of small batch sampling; q (Q) θ (s i ,a i ) The state and the evaluation value of the action command of the robot at the current moment i are evaluated by the current action value network;
aiming at flexible obstacle avoidance tasks, a near-end strategy optimization algorithm architecture based on discrete action spaces is adopted, and the action spaces are selected as action spaces required by each mobile robot for independent flexible obstacle avoidance Selecting a state space as a state space in which each mobile robot tracks a tracking error of a corresponding virtual mobile robot for a current time step +.>State space required for describing surrounding environment information with each mobile robot +.>
2. The flexible formation method of cascaded multiple mobile robots according to claim 1, wherein the formation tracking action space of the two adjacent mobile robots is represented as follows;
wherein,tracking the performance super-parameters of the virtual robot 1 nonlinear formation a priori controller for mobile robots,/-for mobile robots>The performance hyper-parameters of the non-linear formation a priori controllers of the virtual robot 2 are tracked for neighboring mobile robots of the mobile robot,
the action space required by each mobile robot to independently and flexibly avoid the obstacle is expressed as follows;
wherein v is discrete And omega discrete The speed command and the angular speed command of the mobile robot are discretized respectively.
3. A method of flexible formation of cascaded multi-mobile robots according to claim 1, wherein the state space in which each mobile robot tracks the tracking error of the corresponding virtual mobile robot at the current time step is represented as follows:
the state space between adjacent mobile robots is represented as follows:
wherein, Tracking a position tracking error of the virtual robot 1 in the x direction for the mobile robot; />Tracking a position tracking error of the virtual robot 1 in the y direction for the mobile robot; />Tracking a tracking error of the virtual robot 1 in an azimuth for the mobile robot; />Tracking a position tracking error of the virtual robot 2 in the x-direction for an adjacent mobile robot of the mobile robot, +.>Tracking a position tracking error of the virtual robot 2 in the y direction for an adjacent mobile robot of the mobile robot; />Tracking a tracking error of the virtual robot 2 in an azimuth for an adjacent mobile robot of the mobile robot; e, e 1 Tracking error, e, of the virtual robot 1 for the mobile robot 2 Tracking errors of the virtual robot 2 for adjacent robots of the mobile robot;
formation state quantity representing distance between adjacent mobile robots at each time step t, +.>Formation state quantity representing angle between adjacent mobile robots at each time step t, +.>And->Representing each timeA formation state quantity of the heading between adjacent mobile robots under the step t; i U 1 || 2 The speed value of the mobile robot 1 relative to the virtual robot, including speed and angular speed; />The acceleration value of the robot 1 relative to the virtual robot includes acceleration and angular acceleration; i U 2 || 2 Is a velocity value between the mobile robot 2 and the virtual robot, including a velocity and an angular velocity; />The acceleration value of the mobile robot 2 relative to the virtual robot includes acceleration and angular acceleration;
the state space required for each mobile robot to describe the surrounding information is represented as follows:
wherein eta t Is the pose vector of the mobile robot at the current moment, d r D is the distance between the mobile robot and its desired virtual mobile robot position at the present time ob The distance vector of the obstacle within the distance safety threshold of the mobile robot is a vector, and the distance vector comprises two elements, namely the speed difference between the current moment and the speed difference between the previous moment of the mobile robot and the angular speed difference between the current angular speed and the angular speed difference between the previous moment of the mobile robot.
4. A method for flexibly queuing cascaded multi-mobile robots according to claim 3, wherein the queuing reward functions between two adjacent mobile robots are specifically described as follows:
wherein ε is thresh To set the threshold, R in the bonus function error_1 Is the sum of penalty terms for tracking errors of the two mobile robots with respect to the desired virtual mobile robot, for inspiring the robot to reduce the tracking error with respect to the desired position as much as possible; r is a prize or penalty value, R formation For rewarding or punishing function, the method is used for guiding the robot to keep consistency of formation, if the dynamic variation range of formation is within a set threshold, a positive rewarding value is fed back, otherwise, a negative punishing value is fed back; r is R velocity The device is used for guiding the mobile robot to maintain consistency of speed and acceleration and maintain a continuous and smooth movement mode.
5. A method for flexibly forming a cascade multi-mobile robot according to claim 3, wherein the obstacle avoidance reward function is specifically formed as follows:
wherein the reward function R error_2 The method is a punishment item of the mobile robot i for the tracking error of the expected virtual mobile robot, and the formation recovery of the robot is guided; r is R avoid Guiding mobile robot to avoid autonomous obstacle epsilon safe Is a safety threshold, r 1 A penalty value for a robot when the distance to the nearest obstacle is within a safety threshold but not yet fully impacted; r is (r) 2 The punishment value is a punishment value when the robot collides with an obstacle; r is R delta_yaw The direction angle change of the mobile robot i is made by the punishment value of the direction angle change of the adjacent time steps of the mobile robot, so that the whole motion track is smoother.
6. The flexible formation method of cascading multiple mobile robots according to claim 5, wherein the updating method of the target action value network is as follows: after each training of a small batch, the parameters of the updated online action network and online action value network are utilized for updating, and the specific description forms are as follows:
η′←τη”+(1-τ)η′ (14)
Wherein η' and η "sub-tables represent target network parameters versus current network parameters, τ being used to control the updated ratio.
7. The method for flexibly forming a queue of cascaded multi-mobile robots according to claim 1, further comprising a local collision detection step for detecting a safety distance between a local obstacle and the robot, wherein if the returned safety distance meets a safety state requirement, the mobile robot individual exits from the flexible obstacle avoidance strategy and resumes the formation strategy.
CN202110655081.9A 2021-06-11 2021-06-11 Flexible formation method for cascading multiple mobile robots Active CN113485323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110655081.9A CN113485323B (en) 2021-06-11 2021-06-11 Flexible formation method for cascading multiple mobile robots

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110655081.9A CN113485323B (en) 2021-06-11 2021-06-11 Flexible formation method for cascading multiple mobile robots

Publications (2)

Publication Number Publication Date
CN113485323A CN113485323A (en) 2021-10-08
CN113485323B true CN113485323B (en) 2024-04-12

Family

ID=77935320

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110655081.9A Active CN113485323B (en) 2021-06-11 2021-06-11 Flexible formation method for cascading multiple mobile robots

Country Status (1)

Country Link
CN (1) CN113485323B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114020013B (en) * 2021-10-26 2024-03-15 北航(四川)西部国际创新港科技有限公司 Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning
CN114625138A (en) * 2022-03-11 2022-06-14 江苏集萃道路工程技术与装备研究所有限公司 Traffic cone robot autonomous movement method and traffic cone robot system
CN115542901B (en) * 2022-09-21 2024-06-07 北京航空航天大学 Deformable robot obstacle avoidance method based on near-end strategy training

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013119942A1 (en) * 2012-02-08 2013-08-15 Adept Technology, Inc. Job management sytem for a fleet of autonomous mobile robots
CN110007688A (en) * 2019-04-25 2019-07-12 西安电子科技大学 A kind of cluster distributed formation method of unmanned plane based on intensified learning
CN110147101A (en) * 2019-05-13 2019-08-20 中山大学 A kind of end-to-end distributed robots formation air navigation aid based on deeply study
CN111857184A (en) * 2020-07-31 2020-10-30 中国人民解放军国防科技大学 Fixed-wing unmanned aerial vehicle cluster control collision avoidance method and device based on deep reinforcement learning
CN111880567A (en) * 2020-07-31 2020-11-03 中国人民解放军国防科技大学 Fixed-wing unmanned aerial vehicle formation coordination control method and device based on deep reinforcement learning
WO2020253316A1 (en) * 2019-06-18 2020-12-24 中国科学院上海微系统与信息技术研究所 Navigation and following system for mobile robot, and navigation and following control method
CN112711261A (en) * 2020-12-30 2021-04-27 浙江大学 Multi-agent formation planning method based on local visual field

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013119942A1 (en) * 2012-02-08 2013-08-15 Adept Technology, Inc. Job management sytem for a fleet of autonomous mobile robots
CN110007688A (en) * 2019-04-25 2019-07-12 西安电子科技大学 A kind of cluster distributed formation method of unmanned plane based on intensified learning
CN110147101A (en) * 2019-05-13 2019-08-20 中山大学 A kind of end-to-end distributed robots formation air navigation aid based on deeply study
WO2020253316A1 (en) * 2019-06-18 2020-12-24 中国科学院上海微系统与信息技术研究所 Navigation and following system for mobile robot, and navigation and following control method
CN111857184A (en) * 2020-07-31 2020-10-30 中国人民解放军国防科技大学 Fixed-wing unmanned aerial vehicle cluster control collision avoidance method and device based on deep reinforcement learning
CN111880567A (en) * 2020-07-31 2020-11-03 中国人民解放军国防科技大学 Fixed-wing unmanned aerial vehicle formation coordination control method and device based on deep reinforcement learning
CN112711261A (en) * 2020-12-30 2021-04-27 浙江大学 Multi-agent formation planning method based on local visual field

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
动态环境中移动机器人路径规划研究综述;张国亮;;机床与液压(01);全文 *
多移动机器人的队形控制;李强;刘国栋;;计算机系统应用(04);全文 *
无人机避障航路规划方法研究综述;吴健发;王宏伦;刘一恒;姚鹏;;无人系统技术(01);全文 *

Also Published As

Publication number Publication date
CN113485323A (en) 2021-10-08

Similar Documents

Publication Publication Date Title
CN113485323B (en) Flexible formation method for cascading multiple mobile robots
Enthrakandi Narasimhan et al. Implementation and study of a novel approach to control adaptive cooperative robot using fuzzy rules
Badgwell et al. Reinforcement learning–overview of recent progress and implications for process control
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
Precup et al. Grey wolf optimizer-based approaches to path planning and fuzzy logic-based tracking control for mobile robots
Shou et al. Finite‐time formation control and obstacle avoidance of multi‐agent system with application
Al Dabooni et al. Heuristic dynamic programming for mobile robot path planning based on Dyna approach
CN114020013A (en) Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning
Guo et al. Optimal navigation for AGVs: A soft actor–critic-based reinforcement learning approach with composite auxiliary rewards
Fan et al. Spatiotemporal path tracking via deep reinforcement learning of robot for manufacturing internal logistics
Rubagotti et al. Shared control of robot manipulators with obstacle avoidance: A deep reinforcement learning approach
Velagic et al. Efficient path planning algorithm for mobile robot navigation with a local minima problem solving
CN116551703B (en) Motion planning method based on machine learning in complex environment
CN117908565A (en) Unmanned aerial vehicle safety path planning method based on maximum entropy multi-agent reinforcement learning
CN117553798A (en) Safe navigation method, equipment and medium for mobile robot in complex crowd scene
Kabtoul et al. Proactive and smooth maneuvering for navigation around pedestrians
CN116245286A (en) Intelligent task supervision method for multiple incomplete constraint mobile robot system
CN113959446B (en) Autonomous logistics transportation navigation method for robot based on neural network
Amin et al. Particle swarm fuzzy controller for behavior-based mobile robot
Singh et al. Learning safe cooperative policies in autonomous multi-UAV navigation
Bi et al. Hierarchical path planning approach for mobile robot navigation under the dynamic environment
Németh et al. Hierarchical control design of automated vehicles for multi-vehicle scenarios in roundabouts
Yang et al. Least mean p-power extreme learning machine for obstacle avoidance of a mobile robot
Huang et al. Dynamic motion planning for driverless vehicles via decentralized model predictive control
Juan et al. High-precision motion control of underwater gliders based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant