CN113485323A - Flexible formation method for cascaded multiple mobile robots - Google Patents
Flexible formation method for cascaded multiple mobile robots Download PDFInfo
- Publication number
- CN113485323A CN113485323A CN202110655081.9A CN202110655081A CN113485323A CN 113485323 A CN113485323 A CN 113485323A CN 202110655081 A CN202110655081 A CN 202110655081A CN 113485323 A CN113485323 A CN 113485323A
- Authority
- CN
- China
- Prior art keywords
- robot
- mobile robot
- formation
- mobile
- virtual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 194
- 238000000034 method Methods 0.000 title claims abstract description 75
- 238000012549 training Methods 0.000 claims abstract description 58
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 31
- 230000008569 process Effects 0.000 claims abstract description 25
- 230000009471 action Effects 0.000 claims description 96
- 230000006870 function Effects 0.000 claims description 40
- 230000001133 acceleration Effects 0.000 claims description 19
- 230000002787 reinforcement Effects 0.000 claims description 16
- 230000008859 change Effects 0.000 claims description 11
- 238000011156 evaluation Methods 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 11
- 238000001514 detection method Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 6
- 238000011084 recovery Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 4
- 238000012937 correction Methods 0.000 claims description 2
- 230000003068 static effect Effects 0.000 abstract description 4
- 230000010485 coping Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 12
- 230000008901 benefit Effects 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 238000013461 design Methods 0.000 description 6
- 238000009826 distribution Methods 0.000 description 5
- 239000003795 chemical substances by application Substances 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000004088 simulation Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000012952 Resampling Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000004659 sterilization and disinfection Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000010977 unit operation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0287—Control of position or course in two dimensions specially adapted to land vehicles involving a plurality of land vehicles, e.g. fleet or convoy travelling
- G05D1/0291—Fleet control
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0219—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory ensuring the processing of the whole working surface
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
Abstract
The invention provides a flexible formation method of a cascade multi-mobile robot, which is based on a strategy gradient algorithm combining prior nonlinear distance-angle-course formation control knowledge and continuous control, avoids blind exploration of the mobile robot, improves the speed of training convergence, avoids a fussy coefficient tuning process, and simultaneously introduces a near-end strategy to optimize and independently train the flexible obstacle avoidance capability of a single mobile robot for coping with local static and dynamic obstacles. The method comprises a training and reasoning stage, complex on-line resolving processes are transferred to the off-line, formation and flexible obstacle avoidance strategies are independently trained based on course learning ideas, and meanwhile a pre-training strategy is flexibly called in a reasoning stage, so that the whole formation has higher autonomy and flexibility.
Description
Technical Field
The invention belongs to the field of multiple mobile robots, and particularly relates to a flexible formation method for multiple mobile robots based on a cascade architecture, in particular to a cascade multiple mobile robot formation method based on reinforcement learning and priori nonlinear distance-angle-course formation control.
Background
With the development of the robot technology, the multi-mobile robot formation operation effectively improves the operation efficiency by virtue of the cooperation capability, and gradually replaces the traditional single-machine operation. For example, multiple underwater robots search through a collaborative formation. In military affairs, unmanned aerial vehicle clusters, multi-ground mobile robots, mine elimination, search and rescue, investigation and the like have no advantage characteristics which do not reflect multi-machine formation. Recently, new crown epidemic situation is abused worldwide, and domestic many hospitals carry out the disinfection work of hospital in order to adopt the mobile robot that kills to replace traditional artifical mode, and a plurality of mobile robot that kills are through formation cooperation, have effectively improved the efficiency of unit operation.
The formation strategy of the pilot following based on distance-angle-course is one of the common technologies for realizing the formation tracking of a plurality of mobile robots, and compared with the traditional pilot following formation strategy, the method has better flexibility and expansibility. The basic idea of the strategy is to preset a robot as a pilot and other robots as trackers, and then determine the relative distance, relative angle and course between the pilot robot and the following robots through a preset formation so as to design a formation control strategy.
At present, the mainstream methods for realizing distance-angle-course navigation following include nonlinear control, nonlinear model predictive control and the like. The former includes input/output feedback linearization control, feedback control, and the like. As more performance gain parameters are introduced, the complicated parameter adjustment process cannot be avoided; the latter is highly dependent on an accurate model and has a high requirement on the online resolving speed. On the other hand, the robustness of the traditional piloting following formation model needs to be improved, and certain flexible obstacle avoidance and formation recovery capabilities are lacked.
With the development of artificial intelligence technology, the deep reinforcement learning technology is widely applied to related tasks of end-to-end mobile robots due to the advantages of no model, offline training and the like, but is mostly in the field of single robots; the end-to-end implementation mode in the multi-machine field has strict requirements on the performances of the sensor and the actuator, the state and action space dimensions are higher, the training cost is too high in the process of landing the actual mobile robot, and the difficulty of reasoning and recurrence is higher.
Disclosure of Invention
The invention aims to provide a flexible formation method for multiple mobile robots, which has certain flexible obstacle avoidance and formation recovery capabilities, aiming at the defects in the prior art.
The invention adopts the following technical scheme. The method comprises determining a dynamic model according to the distance, angle and course among robots based on the selected formation form; determining a priori controller of a reinforcement learning framework in a flexible formation method of the nonlinear mobile robot according to the dynamic model and the dynamic model constraint; determining an action space based on the hyper-parameters of the pose vectors of the mobile robots, wherein the action space comprises a formation tracking action space of two adjacent mobile robots and an action space required by each mobile robot for independently and flexibly avoiding obstacles; determining a state space according to a tracking error of a mobile robot attitude and a speed, wherein the state space comprises: at the current time step, each mobile robot tracks the state space of the tracking error of the corresponding virtual mobile robot, the state space between adjacent mobile robots and the state space required by each mobile robot to describe the surrounding environment information; setting a reward function for reinforcement learning, wherein the reward function comprises a formation reward function and an obstacle avoidance reward function;
and based on the prior controller, performing reinforcement learning training according to the action space, the state space and the reward function by interacting with the environment, and finishing the training to obtain a flexible formation method of the cascade multi-mobile robot, wherein the flexible formation method comprises a formation strategy and a flexible obstacle avoidance strategy.
Further, the kinetic equation is described as follows:
where eta is [ x, y, theta ]]TA pose vector representing each mobile robot, where (x, y) is the position of each mobile robot and θ isAn angle of each mobile robot;in order to move the speed of the robot,omega is the current angular velocity of the mobile robot, vrAnd vlRespectively representing the speeds of the left and right wheels of the mobile robot;
the kinetic model constraint form is as follows:
still further, the method for determining the prior controller of the reinforcement learning architecture in the flexible formation method of the nonlinear mobile robot specifically includes: s31, determining the expected track of the virtual expected mobile robot as etar=[xr,yr,θr]T,(xr,yr) To virtually expect the position of the mobile robot, θrTo virtually expect the angle of the mobile robot, the tracking error of the mobile robot for determining the attitude and the tracking error of the velocity of the mobile robot according to the virtual expected trajectory are expressed as:
exposition tracking error for the x direction; e.g. of the typeyPosition tracking error for the y direction; e.g. of the typeθIs the tracking error of the azimuth;the speed tracking errors in the x direction and the y direction respectively;
s32, determining an expected formation model among the distance, the angle and the heading among adjacent mobile robots, wherein the expected formation model is specifically described as follows:
wherein v is1,v2Respectively representing virtual robot objects to be tracked by adjacent mobile robots, namely a virtual robot 1 and a virtual robot 2, (x)v1,yv1) Is the position of the virtual robot 1, (x)v2,yv2) Is the position of the virtual robot 2, thetav1Angle of the virtual robot 1, thetav2Is the angle of the virtual robot 2; dv2v1Relative distance of adjacent mobile robots v1, v 2; phi is av2v1Relative angles of adjacent mobile robots v1, v 2; beta is av2v1An angle correction amount for the mobile robot maintaining the same azimuth angle;
s33, combining the theories (1) to (4) and the feedback linearization nonlinear control theory, the formation control prior description form of the adjacent mobile robots is as follows:
wherein v and w are the speed and angular velocity of the mobile robot meeting the preset formation requirement,
the performance of the prior controller for the non-linear formation of the virtual robot 1 is over-parametric,performance superparameters of a priori controllers for nonlinear formation of virtual robots 2, and performance superparameters directly determine the priori controllersAnd controlling the performance.
Further, the formation tracking motion space of the two adjacent mobile robots is expressed as follows;
wherein the content of the first and second substances,tracking the performance hyper-parameters of the nonlinear formation prior controller of the virtual robot 1 for the mobile robot,tracking performance hyper-parameters of the virtual robot 2 non-linear formation prior controller for neighboring mobile robots of the mobile robot,
the action space required by each mobile robot for independently and flexibly avoiding the obstacle is expressed as follows;
wherein v isdiscreteAnd omegadiscreteRespectively, a discretized speed command and an angular speed command of the mobile robot.
Further, a state space in which each mobile robot tracks a tracking error of the corresponding virtual mobile robot at the current time step is represented as follows:
the state space between adjacent mobile robots is represented as follows:
wherein the content of the first and second substances,tracking the position tracking error of the virtual robot 1 in the x direction for the mobile robot;tracking the position tracking error of the virtual robot 1 in the y direction for the mobile robot;tracking the tracking error of the virtual robot 1 in the azimuth angle for the mobile robot;a position tracking error of the virtual robot 2 in the x direction is tracked for neighboring mobile robots of the mobile robot,tracking the position tracking error of the virtual robot 2 in the y direction for the adjacent mobile robots of the mobile robot;tracking the tracking error of the virtual robot 2 at the azimuth angle for the adjacent mobile robots of the mobile robot; e.g. of the type1Tracking error of virtual robot 1 for mobile robot, e2Tracking errors of the virtual robot 2 for neighboring robots of the mobile robot;andrespectively representing the formation state quantities of the distance, the angle and the course between the adjacent mobile robots under each time step t; | u1||2,||u2||2,Respectively representing the speed and angular velocity between the robot 1 and the robot 2 relative to the virtual robotAnd the relative value of the acceleration, the purpose of which is to expect the mobile robot to operate with continuous and smooth speed and acceleration, where | u1||2Is the speed value of the mobile robot 1 relative to the virtual robot, including speed and angular velocity;is the acceleration value of the robot 1 relative to the virtual robot, including acceleration and angular acceleration; | u2||2Is a speed value between the mobile robot 2 and the virtual robot, including a speed and an angular speed;is an acceleration value of the mobile robot 2 relative to the virtual robot, including acceleration and angular acceleration.
The state space required for each mobile robot to describe the surrounding environment information is represented as follows:
wherein eta istIs the pose vector of the mobile robot at the current moment, drThe distance between the mobile robot at the present time and its desired virtual mobile robot position, dobAnd a distance vector of the mobile robot within a safety threshold value at the current moment, | delta theta | is the difference between the speed and the angular speed of the mobile robot at the adjacent moment.
Still further, the formation reward function between two adjacent mobile robots is described in the following form:
wherein epsilonthreshR in the reward function for setting the thresholderror_1Is the sum of penalty terms for tracking errors of two mobile robots with respect to a desired virtual mobile robot, for enlighteningThe robot reduces the tracking error of the expected position as much as possible; r is a reward or penalty value, RformationThe system is a reward or penalty function and is used for guiding the robot to keep the continuity of formation, if the dynamic change range of the formation is within a set threshold value, a positive reward value is fed back, and if the dynamic change range of the formation is not within the set threshold value, a negative penalty value is returned; rvelocityThe method is used for guiding the mobile robot to keep consistency of speed and acceleration and maintain a continuous and smooth motion mode.
Still further, the obstacle avoidance reward function is in a specific form as follows:
wherein the reward function Rerror_2The penalty item of the mobile robot i for the expected tracking error of the virtual mobile robot is used for guiding the formation recovery of the robot; ravoidGuiding a mobile robot to perform autonomous obstacle avoidance, epsilonsafeTo a safe threshold, r1Is the penalty value when the robot is within a safe threshold from the nearest obstacle but has not yet fully collided. r is2Is a penalty value when the robot collides with the obstacle; rdelta_yawThe penalty value of the change of the direction angle of the adjacent time step of the mobile robot is used for controlling the change of the direction angle of the mobile robot i, so that the overall motion track is smoother.
Further, in the training process, independent training is respectively carried out on two subtasks of formation tracking and flexible obstacle avoidance, and the specific method comprises the following steps:
aiming at the formation tracking task, the action space is selected to be the formation tracking action space of two adjacent mobile robotsState space based on tracking error of each mobile robot tracking corresponding virtual mobile robot at current time stepState space between adjacent mobile robots
The action value network outputs the evaluation of the current action, the Q value of the evaluation output by the current action value network is used as the weight, and the action network is updated based on the strategy gradient;
the specific updating of the action value network is described as follows
Wherein, wiCalculating a priority sampling weight for the current time i based on a priority empirical replay algorithm; r isiThe reward signal is the current time i; gamma is a discount factor; qθ′(si+1,μ′(si+1) For the next time i +1, the target action mu'(s) is taken as the target action valuei+1) Evaluation of (1), siIs the current time i the state value of the robot, si+1The state value, a, of the robot at the next moment i +1iThe motion of the robot at the current moment i, and N is the number of samples sampled in small batches; qθ(si,ai) And the evaluation value of the current motion value network to the state and motion command of the robot at the current time i.
Aiming at the flexible obstacle avoidance task, a near-end strategy optimization algorithm framework based on discrete action space is adopted, and the action space is selected to be the action space required by each mobile robot for independently and flexibly avoiding the obstacleSelecting a state space for tracking the tracking error of a corresponding virtual mobile robot for each mobile robot at the current time stepState space required for describing surrounding environment information with each mobile robot
Further, the method for updating the target action value network comprises the following steps: after each small batch of training is finished, parameters of the updated online action network and the updated online action value network are updated, and the specific description form is as follows:
η′←τη+(1-τ)η′ (14)
where η' and η are partial tables representing the target network parameter and the current network parameter, and τ is used to control the ratio of updates.
Further, the method further comprises a local collision detection step, wherein the local collision detection step is used for detecting the safe distance between a local obstacle and the robot, and if the returned safe distance meets the requirement of a safe state, the mobile robot individually exits from the flexible obstacle avoidance strategy and recovers the formation strategy.
The invention has the beneficial technical effects;
the flexible formation method of the cascaded multi-mobile robots is based on reinforcement learning and priori nonlinear distance-angle-course formation control, so that the plurality of mobile robots can adaptively adjust key parameters in a formation control algorithm, and the stability and tracking accuracy of formation are improved; meanwhile, a flexible obstacle avoidance strategy is trained independently, so that each robot in the formation has certain flexible obstacle avoidance capability, and the flexibility and the autonomy of each mobile robot in the formation are improved.
The formation tracking framework of the calculation method is based on a depth determination type strategy gradient algorithm, and the performance and the efficiency of the algorithm are further improved by simplifying the random exploration process and introducing a priority experience playback mechanism. Blind exploration is avoided by introducing prior nonlinear distance-angle-course formation controller information, so that the training process is more targeted, the speed of algorithm convergence is increased, the prior formation controller information controller can avoid abnormal behaviors damaging an actuator from end to end in the inference application process, and the robustness of the whole formation is improved.
The obstacle avoidance framework of the calculation method is based on a near-end strategy optimization algorithm, and when the obstacle is flexibly avoided, the motion space of the mobile robot is discretized, so that the search space is reduced, and the training complexity is reduced; by introducing a collision detection function module, the obstacle distance is monitored in real time to determine whether the formation tracking mode can be returned.
Preferably, the training of the two sets of frameworks are mutually independent, supplement each other in the inference formation process, and jointly complete the flexible formation of the multiple mobile robots.
Drawings
FIG. 1 is a schematic diagram of an overall framework for a particular embodiment of the invention;
FIG. 2 is a schematic diagram of a training phase of an embodiment of the present invention;
FIG. 3 is a schematic diagram of inference based flexible formation according to a specific embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples of the specification.
Example (b): a flexible formation method of cascaded multiple mobile robots mainly comprises the following steps: s1, selecting a formation from a formation library, and confirming the priority of each robot and the specific position in the formation according to a distance-angle-course formation mode;
s2, determining a dynamic model according to the type of the robot;
s3, designing expected prior tracks of a virtual navigator and a virtual follower according to the constraint of a dynamic model and by combining the constraint of relative distance, the constraint of relative angle and the constraint of course between robots, converting the formation problem of the actual robots into a plurality of tracking method problems for tracking the tracks of the virtual mobile robots, and designing a nonlinear formation tracking prior controller of angular velocity of corresponding velocity as a knowledge prior in the whole reinforcement learning framework;
s4, designing a collision detection module of the whole formation algorithm, and detecting the safe distance between a local obstacle and the robot;
s5, designing an action space part of the whole formation algorithm framework, wherein the action space part is mainly divided into two parts, one part is a speed space containing the speed and the angular velocity of the mobile robot, and the other part is a parameter space containing all performance parameters of the priori nonlinear tracking control knowledge;
s6, designing a state space part in the whole formation algorithm framework, wherein the state space part mainly comprises the position and the posture of each robot and barrier information in the environment;
s7, designing a reward function for guiding the robots to form a team for learning and flexibly avoiding the obstacles, wherein the reward function mainly comprises a team reward function, a tracking reward function and an obstacle avoidance reward function;
and S8, building a simulation environment for training, so that the intelligent agent interacts with the environment for trial and error under the condition of checking nonlinear formation control knowledge, and a plurality of mobile robots complete a flexible and stable formation process strategy and a flexible obstacle avoidance strategy based on distance, angle and direction through learning.
Further, in step S1, the type of each robot is isomorphic, and the number N of robots is greater than or equal to 2;
further, in step S2, taking the two-wheel differential mobile robot as an example, the kinetic equation is described as follows:
where eta is [ x, y, theta ]]TA pose vector representing each mobile robot;in order to move the speed of the robot,angular velocity, v, of the mobile robotrAnd vlRespectively representing the speeds of the left and right wheels; it should be noted that the two-wheel drive mobile robot has incomplete constraint, so that the mobile robot can only move forward and backward, but not left and right, and the constraint form is as follows:
further, taking distance-angle-heading formation of multiple mobile robots as an example, the specific design steps of the a priori formation control knowledge of the mobile robots in S3 are as follows:
s31, designing a tracking controller between the mobile robot and the virtual expected mobile robot. The expected trajectory of the virtual mobile robot is defined as etar=[xr,yr,θr]TThe tracking error of the attitude and the velocity is:
s32, designing an expected formation model between the distance, the angle and the course between adjacent mobile robots, wherein the expected formation model is specifically described as follows:
wherein v1 and v2 represent virtual robot objects to be tracked by adjacent mobile robots, and are respectively marked as virtual robot 1 and virtual robot 2, dv2v1、φv2v1、βv2v1The distance angle and the direction between v1 and v2 are represented as state quantities under a distance-angle-heading formation framework.
S33, combining the theories (1) to (4) and the feedback linearization nonlinear control theory, the formation control prior description form of the adjacent mobile robots is as follows:
wherein v and w are the speed and angular velocity of the mobile robot meeting the preset formation requirement, [ K ]x,Ky,Kθ]Controlling prior performance hyper-parameters for nonlinear formation of mobile robots, and directly determining values of the performance hyper-parametersQuality of formation tracking;
further, in S4, the collision detection function module determines the distance to the obstacle by the mobile robot sensor itself, and outputs a boolean collision warning flag;
further, in S5, the motion space mainly consists of two parts, one part is the motion space required for formation tracking, and the other part is the motion space required for flexible obstacle avoidance when detecting a local obstacle, and the specific design description is as follows:
and S51, designing a formation tracking action space of two adjacent mobile robots. The specific method is based on the nonlinear formation knowledge prior involved in S33, and the motion space is as follows:
wherein [ K ]x,Ky,Kθ]Controlling prior performance hyper-parameters for nonlinear formation of mobile robots;
s52, designing an action space required by each mobile robot for independently and flexibly avoiding the obstacle:
wherein v isdiscreteAnd omegadiscreteRespectively representing a discretized speed instruction and an angular speed instruction of the mobile robot;
further, in S6, the state space mainly includes three parts, one part is a state space describing a tracking error of each mobile robot tracking a corresponding virtual robot, one part is a state space describing a formation satisfying distance-angle-heading between adjacent mobile robots, and one part is a state space required for describing surrounding environment information, and the specific design description is as follows:
s61, taking two adjacent mobile robots as an example, designing and describing a state space of tracking errors of each mobile robot tracking the corresponding virtual mobile robot at the current time step as follows:
s62, taking two adjacent mobile robots as an example, designing a state space between the two adjacent mobile robots, which meets a distance-angle-course formation framework, as follows:
d, phi and beta respectively represent the formation state quantities of the distance, the angle and the course between adjacent mobile robots at each time step; | u1||2,||u2||2,Representing the relative values of speed, angular velocity and acceleration between the robot 1 and the robot 2, respectively, with respect to the virtual robot, the purpose of this item being that the mobile robot is expected to operate with continuous and smooth speed and acceleration;
s63, designing a state space required by each mobile robot to describe the surrounding environment information as follows:
wherein eta istIs the pose vector of the mobile robot at the current moment, drThe distance between the mobile robot at the present time and its desired virtual mobile robot position, dobThe distance vector of the mobile robot within the safety threshold value at the current moment, | delta u | is the difference between the speed and the angular speed of the mobile robot at the adjacent moment;
further, the incentive function design in S7 can be subdivided into two sub-incentive function designs, one is for the formation tracking sub-task, and the other is for the flexible obstacle avoidance and formation recovery sub-task, that is:
s71, designing a reward function of the formation tracking subtask,
the formation reward function between two adjacent mobile robots is described in the following form:
wherein R in the reward functionerrorIs the sum of penalty terms of the tracking error of the two mobile robots to the expected virtual mobile robot and is used for inspiring that the tracking error of the robot to the expected position is reduced as much as possible; rformationThe system is used for guiding the robot to keep the consistency of formation, if the dynamic change range of the formation is within a threshold value, a positive reward is fed back, otherwise, a negative penalty is fed back; rvelocityThe system is used for guiding the mobile robot to keep the consistency of speed and acceleration and maintain a continuous and smooth motion mode;
s72, designing a flexible obstacle avoidance reward function of the mobile robot i, wherein the specific form is as follows:
wherein the reward function RerrorThe penalty item of the mobile robot i for the expected tracking error of the virtual mobile robot is used for guiding the formation recovery of the robot; ravoidGuiding the mobile robot to perform autonomous obstacle avoidance; rdelta_yawThe change of the direction angle of the mobile robot i is restricted,
to save energy;
optionally, the reward function designed in S72 only occurs in the obstacle avoidance task stage, and is used to inspire that the mobile robot quickly avoids a local obstacle, and when it is determined through S4 that the mobile robot is far away from the obstacle, the obstacle avoidance task stage exits, and is switched to a formation tracking subtask, and formation is resumed and maintained under the guidance of the S71 reward function;
further, in S8, in the training process, independent training is performed for two subtasks of formation tracking and flexible obstacle avoidance, which are specifically described as follows:
s81, aiming at the formation tracking task, adopting a deterministic strategy gradient algorithm framework based on a continuous action space, wherein the action space is selected asThe state space is based onAndthe algorithm generally follows the "actor-critic" pattern, but unlike other reinforcement learning algorithms, the greatest advantage of the algorithm is that the output of the action network is a deterministic action rather than a strategic distribution.
On the other hand, the action value network outputs an evaluation of the current action, and then updates the action network based on the policy gradient with the evaluated Q value output by the current action value network as a weight. The action value network is updated based on the target action network and the target action value network which are off-line, and the method has the advantages that the parameter change of the target network is small, so that the training process is more stable.
The specific updating of the action value network is described as follows
Wherein, wiA priority sampling weight calculated for a priority-based empirical replay algorithm; r isiIs the current reward signal; gamma is a discount factor; qθ′(si+1,μ′(si+1) For the next moment, the target action mu'(s) is taken as the target action valuei+1) Evaluation of (2)
Preferably, the updating of the target network is based on a soft updating strategy, and after each small batch is trained, the parameters of the updated online action network and the updated online action value network are updated, and the specific description form is as follows:
η′←τη+(1-τ)η′ (14)
where η' and η are partial tables representing the target network parameter and the current network parameter, and τ is used to control the ratio of updates.
Tau is used for controlling the updating proportion, and the soft updating method reduces the influence of abnormal parameters and avoids abnormal jump of parameters in the parameter updating process.
S82, aiming at the flexible obstacle avoidance task, adopting a near-end strategy optimization algorithm framework based on discrete action space, and selecting the action space asSelecting a state space ofAnd
the near-end strategy optimization algorithm is optimized aiming at the problems of slow parameter updating, low data utilization rate and the like of the on-line strategy of the traditional strategy gradient algorithm, a resampling mechanism is introduced to the algorithm on the basis of a generalized advantage evaluation algorithm to convert the on-line strategy into an off-line strategy to improve the data utilization rate, and meanwhile, a more stable training process is obtained on the basis of KL divergence or the updating amplitude of cutting operation constraint parameters.
And further, S9, completing the construction of the whole inference-based cascade formation control algorithm according to the offline learning formation strategy in S7.
In S9, a flexible formation algorithm framework of the mobile robot based on reasoning is constructed by using the formation and flexible obstacle avoidance strategy trained in S8, and the specific process is described as follows:
s91, determining a formation requirement and a task environment;
s92, the mobile robot is formed and loaded with a priori-based formation strategy and a flexible obstacle avoidance strategy which are pre-trained in S8;
s93, the mobile robot formation adopts a formation tracking strategy to perform formation tracking according to the interaction information with the environment, and each robot individual performs local collision detection;
s94, if flexible obstacle avoidance is needed, the mobile robots form a team to switch individual flexible obstacle avoidance strategies, and real-time obstacle avoidance is conducted according to the interaction information with the environment;
s95, if the local collision detection function module returns to the safe state, the mobile robot individual exits the flexible obstacle avoidance strategy and rapidly restores the formation state;
s96, repeating S93 to S95 until the target point is reached.
The invention provides a cascade multi-mobile-robot flexible formation method based on reinforcement learning and prior nonlinear distance-angle-course formation control, which moves the computational power consumption of on-line resolving to the off-line through reinforcement learning to realize the flexible formation of the multi-mobile-robot based on reasoning;
in the training stage, a formation tracking strategy and a flexible obstacle avoidance strategy are trained independently, so that the training difficulty is reduced, meanwhile, a nonlinear distance-angle-course formation control prior is introduced, the training speed is improved, and a complicated parameter tuning process is avoided; and in the inference stage, independent strategies based on offline training are combined to meet the task requirements of autonomous stable formation and flexible obstacle avoidance. Compared with the existing formation tracking control algorithm based on navigation-tracking, the robot formation tracking control algorithm endows the robot formation with independent tracking capability and also endows each mobile robot with independent obstacle avoidance capability facing local static and dynamic obstacles, and has the characteristics of autonomy, stability, high efficiency and flexibility.
The overall framework of the embodiment is shown in fig. 1, where 1 is an offline independent training framework, 2 is an inference flexible formation framework, 21 is a flexible formation obstacle avoidance strategy, and 3 is a simulation interactive environment.
Firstly, in a training stage 1, respectively training a formation strategy and a flexible obstacle avoidance strategy; the training of the formation strategy is based on prior formation experience, the training process and the convergence speed are accelerated, blind exploration of multiple mobile robots is prevented, and the stability of formation is improved; after training, storing the two strategy parameters;
then, in an inference stage 2, the multiple mobile robots flexibly call a flexible formation obstacle avoidance strategy 21 based on experience to perform autonomous formation tracking and flexible obstacle avoidance, and the on-line calculation process is migrated to the off-line mode, so that the method is more efficient and stable.
The frame of the training stage is shown in fig. 2, the whole training strategy is based on the thought of course learning, namely the training environment is simplified to be more complicated, and the strategy performance is gradually improved; FIG. 2 is a diagram 1 of a formation training environment based on a distance-angle-heading formation control prior; 2, flexibly avoiding the obstacle for each mobile robot; 3 is a continuous deterministic strategy gradient algorithm agent; 4, a discrete near-end strategy optimization algorithm agent; and 5, a formation strategy parameter and a flexible obstacle avoidance strategy parameter which are stored in an off-line mode after training, wherein the specific process is described as follows:
the formation strategy training process is described as follows:
firstly, configuring various simple to complex training environments according to the thought of course learning, such as training the formation environment of two mobile robots and then gradually increasing the number of the formation robots;
next, for each preset simulation environment, initializing an action network, a target action network, an action value network, and target action value network parameters in the continuous deterministic policy gradient algorithm agent 3 in fig. 2; for each iteration cycle, a formation training environment 1 based on a distance-angle-heading formation control prior is initialized, followed by, for each time step:
step 1: selecting an action from the threshold range of the action space according to a strategy, and adding random Gaussian noise to improve the random exploration performance;
step 2: interacting with a formation training environment, specifically inputting a selected deterministic action into a cascade prior formation controller of the multiple mobile robots designed by combining a preset formation mode and geometric relations of distance, angle and course, specifically describing the controller as the formula (5), then inputting prior formation upper-layer control speed and angular speed instructions into each mobile robot, updating the current state of the environment while the robot completes the instructions, and feeding back a state value, a reward value and a Boolean flag bit indicating whether a task is finished or interrupted;
and step 3: storing the information fed back by the environment and the calculated priority into an experience pool as data for training;
and 4, step 4: when the capacity of the experience pool overflows, sampling according to the priority, and training and updating the action value network and the action network; the method comprises the following steps:
step 4-1: inputting the sampled state value into an action network to obtain an action;
step 4-2: inputting the action and the sampled state value into an action value network to obtain a Q value;
step 4-3: according to the Q value, the back propagation updates the action network according to the strategy gradient;
step 4-4: repeating the steps 4-2 and 4-3 to obtain a Q value calculated by the updated action through the action value network;
and 4-5: inputting the next state value in the sampled experience into a target action network to obtain a target action;
and 4-6: inputting the action and the state into a target state network to obtain a target Q value;
and 4-7: updating the action value network by combining the target Q value and the Q value calculated in the step 4-4 according to the formula (13) and the priority weight coefficient;
and 5: repeating the steps 1-4;
step 6: after a certain time step is met, respectively carrying out soft updating on the target action network and the target action value network according to the formula (14);
and 7: and storing each network parameter of the final formation strategy for calling in next training or reasoning.
The individual autonomous flexible obstacle avoidance strategy training process is as follows:
firstly, configuring various simple to complex training environments according to the thought of course learning, such as firstly training an obstacle avoidance strategy of the mobile robot in a static obstacle environment and then training an obstacle avoidance strategy of the mobile robot in a dynamic obstacle environment;
secondly, initializing a policy network and a value network in 2-4 for each preset simulation environment; during each iteration cycle, the corresponding context is initialized, and then, during each time step:
step 1: in a strategy network, inputting environment state information to obtain strategy distribution, and sampling an action in a discrete action space according to the distribution;
step 2: inputting the action into an environment, interacting with a flexible obstacle avoidance environment, updating the environment state, feeding back a state value, a reward value and a Boolean type flag bit indicating whether the task is finished or interrupted;
and step 3: repeating the step 1, sampling certain experiences, and storing;
and 4, step 4: inputting the state of the last step in the step 3 into a value network to obtain the state value, and then backtracking and calculating the discount reward value under the time steps;
and 5: inputting all stored experiences into a value network, and calculating an advantage value by utilizing generalized advantage evaluation;
step 6: according to the calculated advantage value, the updated value network is propagated reversely;
and 7: inputting all state values in the stored experience into a strategy network and a past strategy network to respectively obtain different strategy distributions, converting the same strategy into different strategies by using resampling, and reversely propagating and updating the strategy network;
and 8: repeating the step 5-6, and then updating the past strategy network parameters by using the strategy network parameters;
and step 9: and (5) repeating the steps 1-8, and storing each network parameter of the final flexible obstacle avoidance strategy for calling in the next training or reasoning.
The method for flexibly forming a cascade multi-mobile robot based on reinforcement learning and priori nonlinear distance-angle-course formation control provided by the embodiment is carried out according to the steps shown in fig. 3 when actual deployment and application are carried out:
step 1: acquiring an expected track of an upper-layer motion plan for formation tracking;
step 2: the specific formation form requirements of the formation tasks are determined, prior formation control information is obtained, and the task environment is determined;
and step 3: loading off-line formation tracking strategy and flexible obstacle avoidance strategy pre-trained in training phase;
and 4, step 4: after the state of the mobile robot is obtained according to a pre-trained formation strategy, the action network feeds back actions, and the mobile robot performs a formation tracking task according to the actions;
and 5: performing local collision detection to ensure the safety of formation tracking, if an obstacle is within a safety threshold from a certain mobile robot, skipping to the step 6, otherwise, performing the step 7;
step 6: calling an offline flexible formation strategy pre-trained in a training phase corresponding to the mobile robot, sampling discrete actions from distribution output by a strategy network by the mobile robot, avoiding local obstacles, quickly returning to the position of the mobile robot in formation with an error as small as possible, and continuously tracking the virtual mobile robot in a corresponding formation mode;
and 7: and if not, returning to the step 4 and continuing to perform the formation tracking.
The method provided by the invention is based on a strategy gradient algorithm combining the priori nonlinear distance-angle-course formation control knowledge and continuous control, avoids blind exploration of the mobile robot, improves the speed of training convergence, avoids a fussy coefficient tuning process, and simultaneously introduces a near-end strategy to optimize the flexible obstacle avoidance capability of independently training a single mobile robot to deal with local static and dynamic obstacles. The method comprises a training and reasoning stage, complex on-line resolving processes are transferred to the off-line, formation and flexible obstacle avoidance strategies are independently trained based on course learning ideas, and meanwhile a pre-training strategy is flexibly called in a reasoning stage, so that the whole formation has higher autonomy and flexibility.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (10)
1. A flexible formation method of a cascade multi-mobile robot is characterized by comprising the steps of determining a dynamic model according to the distance, the angle and the course among robots based on a selected formation form; determining a priori controller of a reinforcement learning framework in a flexible formation method of the nonlinear mobile robot according to the dynamic model and the dynamic model constraint; determining an action space based on the hyper-parameters of the pose vectors of the mobile robots, wherein the action space comprises a formation tracking action space of two adjacent mobile robots and an action space required by each mobile robot for independently and flexibly avoiding obstacles; determining a state space according to a tracking error of a mobile robot attitude and a speed, wherein the state space comprises: at the current time step, each mobile robot tracks the state space of the tracking error of the corresponding virtual mobile robot, the state space between adjacent mobile robots and the state space required by each mobile robot to describe the surrounding environment information; setting a reward function for reinforcement learning, wherein the reward function comprises a formation reward function and an obstacle avoidance reward function;
and based on the prior controller, performing reinforcement learning training according to the action space, the state space and the reward function by interacting with the environment, and finishing the training to obtain a flexible formation method of the cascade multi-mobile robot, wherein the flexible formation method comprises a formation strategy and a flexible obstacle avoidance strategy.
2. The method of claim 1, wherein the dynamic equations are described as follows:
where eta is [ x, y, theta ]]TA pose vector representing each mobile robot, where (x, y) is a position of each mobile robot and θ is an angle of each mobile robot;in order to move the speed of the robot,omega is the current angular velocity of the mobile robot, vrAnd vlRespectively representing the speeds of the left and right wheels of the mobile robot;
the kinetic model constraint form is as follows:
3. the method of claim 2, wherein the method for determining the prior controller of the reinforcement learning architecture in the flexible formation method of the nonlinear mobile robot specifically comprises: s31, determining the expected track of the virtual expected mobile robot as etar=[xr,yr,θr]T,(xr,yr) To virtually expect the position of the mobile robot, θrFor virtually expecting the angle of the mobile robot, the tracking error of the attitude and the tracking error of the speed determined by the mobile robot according to the virtual expected track are expressed as follows:
exposition tracking error for the x direction; e.g. of the typeyPosition tracking error for the y direction; e.g. of the typeθIs the tracking error of the azimuth;respectively, a velocity tracking error in the x direction and a velocity tracking error in the y direction;is the angular velocity tracking error;is the desired angular velocity of the virtual robot;
s32, determining an expected formation model among the distance, the angle and the heading among adjacent mobile robots, wherein the expected formation model is specifically described as follows:
wherein v is1,v2Respectively representing virtual robot objects to be tracked by adjacent mobile robots, namely a virtual robot 1 and a virtual robot 2, (x)v1,yv1) Is the position of the virtual robot 1, (x)v2,yv2) Is the position of the virtual robot 2, thetav1Angle of the virtual robot 1, thetav2Is the angle of the virtual robot 2; dv2v1Relative distance of adjacent mobile robots v1, v 2; phi is av2v1Relative angles of adjacent mobile robots v1, v 2; beta is av2v1An angle correction amount for the mobile robot maintaining the same azimuth angle;
s33, combining the theories (1) to (4) and the feedback linearization nonlinear control theory, the formation control prior description form of the adjacent mobile robots is as follows:
wherein v and w are the speed and angular velocity of the mobile robot meeting the preset formation requirement,the performance of the prior controller for the non-linear formation of the virtual robot 1 is over-parametric,and (3) directly determining the control performance of the prior controller for the performance hyper-parameter of the nonlinear formation prior controller of the virtual robot 2.
4. The flexible formation method of cascaded multiple mobile robots according to claim 1, wherein the formation tracking motion space of two adjacent mobile robots is represented as follows;
wherein the content of the first and second substances,tracking the performance hyper-parameters of the nonlinear formation prior controller of the virtual robot 1 for the mobile robot,tracking performance hyper-parameters of the virtual robot 2 non-linear formation prior controller for neighboring mobile robots of the mobile robot,
the action space required by each mobile robot for independently and flexibly avoiding the obstacle is expressed as follows;
wherein v isdiscreteAnd omegadiscreteRespectively, a discretized speed command and an angular speed command of the mobile robot.
5. The method for flexible formation of cascaded multi-mobile robots according to claim 1, wherein the state space of tracking error of each mobile robot at the current time step for tracking the corresponding virtual mobile robot is represented as follows:
the state space between adjacent mobile robots is represented as follows:
wherein the content of the first and second substances,tracking the position tracking error of the virtual robot 1 in the x direction for the mobile robot;tracking the position tracking error of the virtual robot 1 in the y direction for the mobile robot;tracking the tracking error of the virtual robot 1 in the azimuth angle for the mobile robot;a position tracking error of the virtual robot 2 in the x direction is tracked for neighboring mobile robots of the mobile robot,tracking the position tracking error of the virtual robot 2 in the y direction for the adjacent mobile robots of the mobile robot;tracking the tracking error of the virtual robot 2 at the azimuth angle for the adjacent mobile robots of the mobile robot; e.g. of the type1Tracking error of virtual robot 1 for mobile robot, e2Tracking errors of the virtual robot 2 for neighboring robots of the mobile robot;
a formation state quantity representing a distance between adjacent mobile robots at each time step t,A formation state quantity representing an angle between adjacent mobile robots at each time step t,Andrepresenting the formation state quantity of the course between the adjacent mobile robots under each time step t; | u1||2Is the speed value of the mobile robot 1 relative to the virtual robot, including speed and angular velocity;is the acceleration value of the robot 1 relative to the virtual robot, including acceleration and angular acceleration; | u2||2Is a speed value between the mobile robot 2 and the virtual robot, including a speed and an angular speed;is the acceleration value of the mobile robot 2 relative to the virtual robot, including acceleration and angular acceleration;
the state space required for each mobile robot to describe the surrounding environment information is represented as follows:
wherein eta istIs the pose vector of the mobile robot at the current moment, drThe distance between the mobile robot at the present time and its desired virtual mobile robot position, dobFor the current time, the movementThe distance vector of the robot from the obstacle within the safety threshold, | Δ θ | is the difference between the speed and the angular velocity of the mobile robot at the adjacent time.
6. The flexible formation method of the cascaded multiple mobile robots according to claim 5, wherein a formation reward function between two adjacent mobile robots is described in a specific form as follows:
wherein epsilonthreshR in the reward function for setting the thresholderror_1Is the sum of penalty terms of the tracking error of the two mobile robots to the expected virtual mobile robot and is used for inspiring that the tracking error of the robot to the expected position is reduced as much as possible; r is a reward or penalty value, RformationThe system is a reward or penalty function and is used for guiding the robot to keep the continuity of formation, if the dynamic change range of the formation is within a set threshold value, a positive reward value is fed back, and if the dynamic change range of the formation is not within the set threshold value, a negative penalty value is returned; rvelocityThe method is used for guiding the mobile robot to keep consistency of speed and acceleration and maintain a continuous and smooth motion mode.
7. The flexible formation method of cascaded multiple mobile robots according to claim 5, wherein the obstacle avoidance reward function is in the following specific form:
wherein the reward function Rerror_2The penalty item of the mobile robot i for the expected tracking error of the virtual mobile robot is used for guiding the formation recovery of the robot; ravoidGuiding a mobile robot to perform autonomous obstacle avoidance, epsilonsafeTo a safe threshold, r1The distance between the robot and the nearest obstacle isPenalty values within a safe threshold but not yet fully collided; r is2Is a penalty value when the robot collides with the obstacle; rdelta_yawThe penalty value of the change of the direction angle of the adjacent time step of the mobile robot is used for controlling the change of the direction angle of the mobile robot i, so that the overall motion track is smoother.
8. The flexible formation method of the cascaded multi-mobile robot as claimed in claim 1, wherein in the training process, independent training is performed respectively for two subtasks of formation tracking and flexible obstacle avoidance, and the specific method comprises the following steps:
for the formation tracking task, the action space is selected as the formation tracking action space a of two adjacent mobile robots1 spaceState space tracking error of each mobile robot on the basis of current time stepState space between adjacent mobile robots
The action value network outputs the evaluation of the current action, the Q value of the evaluation output by the current action value network is used as the weight, and the action network is updated based on the strategy gradient;
the specific updating of the action value network is described as follows
Wherein, wiCalculating a priority sampling weight for the current time i based on a priority empirical replay algorithm; r isiThe reward signal is the current time i; gamma is a discount factor; qθ′(si+1,μ′(si+1) For the next time i +1, the target action mu'(s) is taken as the target action valuei+1) Evaluation of (1), siIs the current time i the state value of the robot, si+1Is the state value of the robot at the next moment i +1, aiThe motion of the robot at the current moment i, and N is the number of samples sampled in small batches; qθ(si,ai) Evaluating the state and the action instruction of the robot at the current moment i by the current action value network;
aiming at the flexible obstacle avoidance task, a near-end strategy optimization algorithm framework based on discrete action space is adopted, and the action space is selected to be the action space required by each mobile robot for independently and flexibly avoiding the obstacleSelecting a state space for tracking the tracking error of a corresponding virtual mobile robot for each mobile robot at the current time stepState space required for describing surrounding environment information with each mobile robot
9. The method as claimed in claim 8, wherein the target action value network is updated by: after each small batch of training is finished, parameters of the updated online action network and the updated online action value network are updated, and the specific description form is as follows:
η′←τη+(1-τ)η′ (14)
where η' and η are partial tables representing the target network parameter and the current network parameter, and τ is used to control the ratio of updates.
10. The flexible formation method of cascaded multiple mobile robots according to claim 1, further comprising a local collision detection step, wherein the local collision detection step is used for detecting a safe distance from a local obstacle to the robot, and if the safe distance is returned to meet a safe state requirement, the mobile robot exits from a flexible obstacle avoidance strategy and recovers the formation strategy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110655081.9A CN113485323B (en) | 2021-06-11 | 2021-06-11 | Flexible formation method for cascading multiple mobile robots |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110655081.9A CN113485323B (en) | 2021-06-11 | 2021-06-11 | Flexible formation method for cascading multiple mobile robots |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113485323A true CN113485323A (en) | 2021-10-08 |
CN113485323B CN113485323B (en) | 2024-04-12 |
Family
ID=77935320
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110655081.9A Active CN113485323B (en) | 2021-06-11 | 2021-06-11 | Flexible formation method for cascading multiple mobile robots |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113485323B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114020013A (en) * | 2021-10-26 | 2022-02-08 | 北航(四川)西部国际创新港科技有限公司 | Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning |
CN115542901A (en) * | 2022-09-21 | 2022-12-30 | 北京航空航天大学 | Deformable robot obstacle avoidance method based on near-end strategy training |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013119942A1 (en) * | 2012-02-08 | 2013-08-15 | Adept Technology, Inc. | Job management sytem for a fleet of autonomous mobile robots |
CN110007688A (en) * | 2019-04-25 | 2019-07-12 | 西安电子科技大学 | A kind of cluster distributed formation method of unmanned plane based on intensified learning |
CN110147101A (en) * | 2019-05-13 | 2019-08-20 | 中山大学 | A kind of end-to-end distributed robots formation air navigation aid based on deeply study |
CN111857184A (en) * | 2020-07-31 | 2020-10-30 | 中国人民解放军国防科技大学 | Fixed-wing unmanned aerial vehicle cluster control collision avoidance method and device based on deep reinforcement learning |
CN111880567A (en) * | 2020-07-31 | 2020-11-03 | 中国人民解放军国防科技大学 | Fixed-wing unmanned aerial vehicle formation coordination control method and device based on deep reinforcement learning |
WO2020253316A1 (en) * | 2019-06-18 | 2020-12-24 | 中国科学院上海微系统与信息技术研究所 | Navigation and following system for mobile robot, and navigation and following control method |
CN112711261A (en) * | 2020-12-30 | 2021-04-27 | 浙江大学 | Multi-agent formation planning method based on local visual field |
-
2021
- 2021-06-11 CN CN202110655081.9A patent/CN113485323B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013119942A1 (en) * | 2012-02-08 | 2013-08-15 | Adept Technology, Inc. | Job management sytem for a fleet of autonomous mobile robots |
CN110007688A (en) * | 2019-04-25 | 2019-07-12 | 西安电子科技大学 | A kind of cluster distributed formation method of unmanned plane based on intensified learning |
CN110147101A (en) * | 2019-05-13 | 2019-08-20 | 中山大学 | A kind of end-to-end distributed robots formation air navigation aid based on deeply study |
WO2020253316A1 (en) * | 2019-06-18 | 2020-12-24 | 中国科学院上海微系统与信息技术研究所 | Navigation and following system for mobile robot, and navigation and following control method |
CN111857184A (en) * | 2020-07-31 | 2020-10-30 | 中国人民解放军国防科技大学 | Fixed-wing unmanned aerial vehicle cluster control collision avoidance method and device based on deep reinforcement learning |
CN111880567A (en) * | 2020-07-31 | 2020-11-03 | 中国人民解放军国防科技大学 | Fixed-wing unmanned aerial vehicle formation coordination control method and device based on deep reinforcement learning |
CN112711261A (en) * | 2020-12-30 | 2021-04-27 | 浙江大学 | Multi-agent formation planning method based on local visual field |
Non-Patent Citations (3)
Title |
---|
吴健发;王宏伦;刘一恒;姚鹏;: "无人机避障航路规划方法研究综述", 无人系统技术, no. 01 * |
张国亮;: "动态环境中移动机器人路径规划研究综述", 机床与液压, no. 01 * |
李强;刘国栋;: "多移动机器人的队形控制", 计算机系统应用, no. 04 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114020013A (en) * | 2021-10-26 | 2022-02-08 | 北航(四川)西部国际创新港科技有限公司 | Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning |
CN114020013B (en) * | 2021-10-26 | 2024-03-15 | 北航(四川)西部国际创新港科技有限公司 | Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning |
CN115542901A (en) * | 2022-09-21 | 2022-12-30 | 北京航空航天大学 | Deformable robot obstacle avoidance method based on near-end strategy training |
Also Published As
Publication number | Publication date |
---|---|
CN113485323B (en) | 2024-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Juang et al. | Wall-following control of a hexapod robot using a data-driven fuzzy controller learned through differential evolution | |
Patle et al. | Application of probability to enhance the performance of fuzzy based mobile robot navigation | |
Precup et al. | Grey wolf optimizer-based approaches to path planning and fuzzy logic-based tracking control for mobile robots | |
Kamel et al. | Real-time fault-tolerant formation control of multiple WMRs based on hybrid GA–PSO algorithm | |
CN113848974B (en) | Aircraft trajectory planning method and system based on deep reinforcement learning | |
Wang et al. | A survey of underwater search for multi-target using Multi-AUV: Task allocation, path planning, and formation control | |
CN113485323A (en) | Flexible formation method for cascaded multiple mobile robots | |
Rubí et al. | A deep reinforcement learning approach for path following on a quadrotor | |
Al Dabooni et al. | Heuristic dynamic programming for mobile robot path planning based on Dyna approach | |
Al-Sagban et al. | Neural-based navigation of a differential-drive mobile robot | |
Lei et al. | A fuzzy behaviours fusion algorithm for mobile robot real-time path planning in unknown environment | |
Sun et al. | A Fuzzy-Based Bio-Inspired Neural Network Approach for Target Search by Multiple Autonomous Underwater Vehicles in Underwater Environments. | |
Atiyah et al. | An overview: On path planning optimization criteria and mobile robot navigation | |
Velagic et al. | Efficient path planning algorithm for mobile robot navigation with a local minima problem solving | |
Lakhal et al. | Safe and adaptive autonomous navigation under uncertainty based on sequential waypoints and reachability analysis | |
Guo et al. | Optimal navigation for AGVs: A soft actor–critic-based reinforcement learning approach with composite auxiliary rewards | |
Pshikhopov et al. | Trajectory planning algorithms in two-dimensional environment with obstacles | |
Zhu et al. | A fuzzy logic-based cascade control without actuator saturation for the unmanned underwater vehicle trajectory tracking | |
Mohanty et al. | A new intelligent approach for mobile robot navigation | |
Boufera et al. | Fuzzy inference system optimization by evolutionary approach for mobile robot navigation | |
Rubagotti et al. | Shared control of robot manipulators with obstacle avoidance: A deep reinforcement learning approach | |
Ratnayake et al. | A comparison of fuzzy logic controller and pid controller for differential drive wall-following mobile robot | |
CN113959446B (en) | Autonomous logistics transportation navigation method for robot based on neural network | |
Zhang et al. | AUV 3D docking control using deep reinforcement learning | |
Amin et al. | Particle swarm fuzzy controller for behavior-based mobile robot |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |