WO2022241806A1 - 一种基于强化学习的双机器人力/位多元数据驱动方法 - Google Patents

一种基于强化学习的双机器人力/位多元数据驱动方法 Download PDF

Info

Publication number
WO2022241806A1
WO2022241806A1 PCT/CN2021/095966 CN2021095966W WO2022241806A1 WO 2022241806 A1 WO2022241806 A1 WO 2022241806A1 CN 2021095966 W CN2021095966 W CN 2021095966W WO 2022241806 A1 WO2022241806 A1 WO 2022241806A1
Authority
WO
WIPO (PCT)
Prior art keywords
robot
force
actual
slave
master
Prior art date
Application number
PCT/CN2021/095966
Other languages
English (en)
French (fr)
Inventor
张弓
侯至丞
杨文林
吕浩亮
徐征
吴月玉
李亚锋
杨根
Original Assignee
广州先进技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州先进技术研究所 filed Critical 广州先进技术研究所
Priority to US17/751,024 priority Critical patent/US20220371186A1/en
Publication of WO2022241806A1 publication Critical patent/WO2022241806A1/zh

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/1633Programme controls characterised by the control loop compliant, force, torque control, e.g. combined with position control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1661Programme controls characterised by programming, planning systems for manipulators characterised by task planning, object-oriented languages
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1679Programme controls characterised by the tasks executed
    • B25J9/1682Dual arm manipulator; Coordination of several manipulators

Definitions

  • the invention relates to the technical field of multi-robot collaborative control, in particular to a dual-robot force/position multivariate data driving method based on reinforcement learning.
  • Multi-machine collaborative operation has replaced a single machine and has become a research hotspot in building an intelligent production line.
  • the multi-robot system has the characteristics of strong adaptability to the environment, high self-regulation ability, wide spatial distribution of the system, better data redundancy, and robustness.
  • it can reliably complete complex tasks such as high-precision operations and efficient processing that cannot be completed by a single robot.
  • the present invention proposes a dual-robot force/position multivariate data driving method based on reinforcement learning.
  • the main robot adopts the ideal position element control strategy, and learns the desired position through reinforcement learning algorithm ;
  • the slave robot Based on the force element control strategy of the master robot's position deviation, the slave robot adopts the damping PD control strategy suitable for the unknown environment, and learns the expected force through the reinforcement learning algorithm.
  • the present invention solves the above problems by the following technical means:
  • a dual-robot force/bit multivariate data-driven method based on reinforcement learning comprising the following steps:
  • the suction cup force of the master robot and the slave robot is obtained.
  • the suction cup force of the master robot is the actual force applied by the master robot
  • the suction cup force of the slave robot is the actual force applied by the slave robot. force
  • the main robot adopts the ideal position element control strategy, learns the expected position through the reinforcement learning algorithm, adopts the proportional differential control rate according to the actual applied force of the main robot, adjusts the differential coefficient and the proportional coefficient, and feeds back the actual position to the expected position; when the main robot does not have When contacting the environment, the actual position of the main robot follows the expected position; when the main robot contacts the environment, the expected position of the main robot is modified and updated by the position PD control, and the actual position of the main robot follows the new expected position;
  • the slave robot is based on the force element control strategy of the master robot's position deviation, adopts the damping PD control strategy suitable for unknown environments, and learns the expected force through the reinforcement learning algorithm.
  • the The force error feedback signal is converted into the velocity correction at the end of the slave robot; then the desired reference position is generated using admittance control, and the relationship between the desired force and the desired reference position of the slave robot is maintained.
  • q are the joint position, velocity and acceleration respectively;
  • M(q) is the symmetric positive definite inertia matrix; Indicates the centripetal and Coriolis moments;
  • G(q) is the gravitational torque vector;
  • is the driving torque vector;
  • f e is the external force measured by the force sensor;
  • f(q) is the mapping of the external force vector f e to the generalized
  • the Jacobian matrix of the coordinates satisfies:
  • C e and K e are the environmental damping and stiffness constant matrices respectively;
  • x e is the position of the environment; when x ⁇ x e , there is an interaction force between the end effector of the robot and the environment; otherwise, when x ⁇ x When e , there is no interaction force;
  • the suction cup force of the main robot is obtained as follows:
  • the sucker force f1 is :
  • f 1 is the actual force exerted by the master robot
  • k s is the environmental stiffness coefficient
  • b s is the environmental damping coefficient
  • x 1 is the actual position of the master robot
  • x 2 is the actual position of the slave robot
  • m 1 is the sum of the mass of the suction cup and the workpiece of the main robot.
  • the suction cup force of the robot is obtained as follows:
  • the force f 2 of the suction cup is:
  • f 2 can be equivalent to the external force f e measured by the force sensor installed on the wrist of the robot; k s is the environmental stiffness coefficient; b s is the environmental damping coefficient; x 1 is the actual position of the master robot; x 2 is the slave robot physical location; is the actual speed of the main robot, is the actual speed of the slave robot, is the actual acceleration of the slave robot; m 2 is the mass of the sucker of the slave robot.
  • the master robot feeds back the actual position to the expected position as follows:
  • f 1 is the actual force exerted by the master robot
  • f d is the expected force of the slave robot
  • x d is the expected position of the master robot
  • x 1 is the actual position of the main robot.
  • the slave robot converts the force error feedback signal into the speed correction amount at the end of the slave robot as follows:
  • the damping control law for the slave robot is expressed as:
  • e f is the force error value of the slave robot; is the error value of the force change rate of the slave robot; is the force control proportional coefficient; is the force control differential coefficient; f d is the expected force from the robot; f 2 is the actual force applied from the robot.
  • the beneficial effects of the present invention at least include:
  • the main robot of the present invention adopts the ideal position element control strategy, and learns the desired position through the reinforcement learning algorithm; the slave robot adopts the force element control strategy based on the position deviation of the master robot, and adopts the damping PD control strategy applicable to the unknown environment, and learns through the reinforcement learning algorithm expected force.
  • the force/position multivariate data-driven method under reinforcement learning can improve the dexterity of two-machine cooperation, solve the parameter optimization problem in force/position control, and avoid large errors in transient state.
  • Fig. 1 is a schematic diagram of dual robot cooperative clamping, handling and flipping of the present invention
  • Fig. 2 is the dual robot mechanical damping system model of the present invention
  • Fig. 3 is a block diagram of a multivariate data-driven mode for dual-robot reinforcement learning in the present invention.
  • the collaborative clamping, handling and turning of the workpiece by two robots in the same station area requires the study of the interaction between the robot and the environment.
  • the most commonly used interactive control method is force-position control.
  • the force/position control is not sufficient to generate the desired intensity for the uncertainty in the environment, and to obtain the force/position control, its expected value needs to be estimated.
  • Machine learning (Machine Learning, ML) is a technology that uses computers to realize functions such as human learning capabilities.
  • Reinforcement Learning (Reinforcement Learning, RL) is to train machine learning models so that robots can learn in an uncertain and potentially complex environment. In the absence of an accurate system model, actions are chosen to be performed based on the environment, goals are programmed with rewards or punishments, and learning is achieved to achieve them.
  • Reinforcement learning estimates its function by analyzing and measuring system trajectory data, so as to improve its control behavior in real time, and can be widely used in robot control, scheduling and other fields.
  • Q-learning is an iterative algorithm whose goal is to maximize the expected value of the total reward and is also an optimal behavior in the Markov decision process Choose a strategy and don't require a model of the environment.
  • Q-learning is an iterative algorithm whose goal is to maximize the expected value of the total reward and is also an optimal behavior in the Markov decision process Choose a strategy and don't require a model of the environment.
  • Real-time tracking is realized when two robots cooperate to carry the same rigid body, and the robustness of the robot is maintained when the dynamics are uncertain.
  • FIG. 1 The schematic diagram of the coordinate calibration of the dual-robot collaborative handling is shown in Figure 1.
  • the master-slave cooperative control mode is adopted.
  • the ends of the master-slave robots are respectively equipped with pneumatic suction cups.
  • the main suction cup and the auxiliary suction cup hold the same workpiece to perform complex handling trajectories.
  • Point O in the figure is the origin of the world coordinate system, and (x i , y i , z i ) represent the current axial joint coordinate system.
  • the base coordinates of the robot are symmetrical to the center of point O, and the z-axis of the end joint coordinate system is symmetrical to the center of rotation.
  • q are the joint position, velocity and acceleration respectively;
  • M(q) is the symmetric positive definite inertia matrix; Indicates the centripetal and Coriolis torque;
  • G(q) is the gravitational torque vector;
  • is the driving torque vector;
  • f e is the external force measured by the force sensor;
  • f(q) is the external force vector f e Jacobians mapped to generalized coordinates such that:
  • C e and K e are the environmental damping and stiffness constant matrices respectively;
  • x e is the position of the environment; when x ⁇ x e , there is an interaction force between the end effector of the robot and the environment; otherwise, when x ⁇ x e , there is no interaction force.
  • f 1 is the actual force exerted by the master robot
  • k s is the environmental stiffness coefficient
  • b s is the environmental damping coefficient
  • x 1 is the actual position of the master robot
  • x 2 is the actual position of the slave robot
  • m 1 is the suction cup of the master robot and The sum of the workpiece mass.
  • the force f 2 of the suction cup is:
  • f 2 can be equivalent to the external force f e measured by the force sensor installed on the wrist of the robot; m 2 is the mass of the sucker of the robot.
  • the main robot adopts the ideal position element control strategy, learns the expected position through the reinforcement learning algorithm, and feeds back the actual position to the expected position.
  • the goal is to generate an optimal force when the robot interacts with the environment to minimize the position error.
  • the proportional-derivative (Proportion Differentiation, PD) control law based on the position error value is applied, and the output is the force correction amount.
  • the position control law for the master robot is expressed as:
  • f d is the expected force of the slave robot
  • x d is the expected position of the master robot
  • e x is the expected position of the master robot
  • e x is the expected position of the master robot
  • e x is the expected position of the master robot
  • e x is the expected position of the master robot
  • e x is the expected position of the master robot
  • e x is the expected position of the master robot
  • e x the position offset error and velocity error of the main robot, respectively
  • Derivative coefficient for position control is the position control proportional coefficient
  • the master robot's actual position x 1 follows the desired position x d .
  • the expected position x d of the main robot is modified and updated by the position PD control, and the actual position of the main robot follows the new expected position.
  • the slave robot must track the real-time motion state of the master robot in real time, thus adopting the damping PD control strategy suitable for the unknown environment, and learning the expected force through the reinforcement learning algorithm, that is, driving the slave robot
  • the minimum force close to the desired reference point which is the minimum force required for the robot to approach its reference point, can be obtained by reinforcement learning methods.
  • Admittance control is then used to generate the desired reference position and maintain the relationship between the desired force from the robot and the desired reference position.
  • damping PD control is adopted, and the force error feedback signal is converted into the speed correction amount of the end of the slave robot by comparing the error value between the expected force and the actual force of the slave robot.
  • the damping control law for the slave robot is expressed as:
  • e f is the force error value of the slave robot; is the error value of the force change rate of the slave robot; is the force control proportional coefficient; is the force control differential coefficient.
  • the Q-learning algorithm is modified by using Eligibility Traces, which can provide a better method for assigning credits to access states. It decays over time so that recently visited states are more eligible for credit rewards, speeding up the convergence of reinforcement learning.
  • the block diagram of the multivariate data-driven mode for dual-robot reinforcement learning can be obtained, as shown in Figure 3, which is a dual-input and dual-output system.
  • the input is: the expected position x d of the master robot, the expected force f d of the slave robot; the output is: the actual position x 1 of the master robot, and the actual force f 2 applied by the slave robot.
  • the master robot adopts the ideal position element control strategy, learns the expected position through the reinforcement learning algorithm, and feeds back the actual position to the expected position.
  • the goal is to generate an optimal force when the robot interacts with the environment to minimize the position error;
  • the slave robot is based on
  • the force element control strategy for the position deviation of the master robot adopts the damping PD control strategy suitable for unknown environments, and learns the expected force through the reinforcement learning algorithm, and drives the slave robot to approach the minimum force of the expected reference point.
  • Admittance control is then used to generate the desired reference position and maintain the relationship between the desired force from the robot and the desired reference position. That is to say, the master-slave robot reinforces the learning algorithm to learn the expected position and the expected force respectively, and uses the proportional differential control rate to adjust the respective differential coefficient (k p ) and proportional coefficient (k d ).

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Manipulator (AREA)

Abstract

本发明公开了一种基于强化学习的双机器人力/位多元数据驱动方法,主机器人采用理想位置元控制策略,通过强化学习算法来学习期望位置,将实际位置反馈给期望位置,目标是在机器人与环境相互作用时产生一个最优力,使位置误差最小化;从机器人基于主机器人位置偏差的力元控制策略,采用适用于未知环境的阻尼PD控制策略,通过强化学习算法来学习期望作用力,驱动从机器人接近期望参考点的最小力。主从机器人分别强化学习算法来学习期望位置和期望作用力,均采用比例微分控制率,对各自的微分系数(kp)与比例系数(kd)进行整定。本发明可提高双机协同的灵巧性,解决力/位控制中的参数优化问题,避免瞬态时的较大误差。

Description

一种基于强化学习的双机器人力/位多元数据驱动方法 技术领域
本发明涉及多机器人协同控制技术领域,具体涉及一种基于强化学习的双机器人力/位多元数据驱动方法。
背景技术
随着钢/铝等复杂构件行业的加工量和作业环境的不断变化,有些工作仅靠单机器人难以承担,需要通过多台机器人之间的协同配合才能完成。多机协同作业已取代单机,成为构建智能产线的研究热点。多机器人系统相比于单机器人系统具有适应环境能力强、自我调节能力高、系统空间分布广、更好的数据冗余性、鲁棒性等特点。采用多台机器人之间的协同合作,能够可靠地完成单机器人无法完成的高精度作业和高效加工等复杂任务。
多机器人协同搬运同一个物体时,各机器人之间具有物理链接和内力约束,要实现紧耦合必须通过实施有效的力-位置协调控制策略,来提升多机器人协同作业的柔顺性和稳定性。
已有双机器人协调控制研究,多对从动机器人施加控制策略,未充分考虑主动机器人的优化控制,也并未涉及从机器人对主机器人的跟踪控制概念。较多的机器人力-位置控制方案都假设对动力学模型有精确的了解,但是多机器人的协同动力学模型是高度不确定的,并且面临外部不确定环境的干扰等,因此基于模型的控制方法不足以应对这种不确定的系统。
应用于复杂任务的多机器人的协同控制作业,需要研究机器人与环境的相互作用。当环境未知时,力控制对于环境中的不确定性不足以产生期望的强度。如何通过实施有效的力-位置协同控制策略,解决力/位控制中的参数优化问题,避免瞬态时的较大误差,实现双机器人协同搬运和翻转的柔顺性与平稳性,是 目前拟解决的关键问题。
发明内容
有鉴于此,为了解决现有技术中的上述问题,本发明提出一种基于强化学习的双机器人力/位多元数据驱动方法,主机器人采用理想位置元控制策略,通过强化学习算法来学习期望位置;从机器人基于主机器人位置偏差的力元控制策略,采用适用于未知环境的阻尼PD控制策略,通过强化学习算法来学习期望作用力。
本发明通过以下技术手段解决上述问题:
一种基于强化学习的双机器人力/位多元数据驱动方法,包括如下步骤:
获得主机器人和从机器人的末端执行器在任务空间中的实际位置、实际速度和实际加速度;
利用主机器人和从机器人末端执行器在任务空间中的实际位置、实际速度和实际加速度,建立双机器人机械阻尼系统模型;
根据双机器人机械阻尼系统模型的动态力平衡方程,获得主机器人和从机器人的吸盘作用力,主机器人的吸盘作用力即为主机器人实际施加力,从机器人的吸盘作用力即为从机器人实际施加力;
主机器人采用理想位置元控制策略,通过强化学习算法来学习期望位置,根据主机器人实际施加力采用比例微分控制率,对微分系数与比例系数进行整定,将实际位置反馈给期望位置;当主机器人没有接触环境时,主机器人实际位置跟随期望位置;当主机器人接触环境时,主机器人期望位置由位置PD控制修改更新,主机器人实际位置跟随新的期望位置;
从机器人基于主机器人位置偏差的力元控制策略,采用适用于未知环境的阻尼PD控制策略,通过强化学习算法来学习期望作用力,通过对比期望作用力与从机器人实际施加力的误差值,将力误差反馈信号转换为从机器人末端的速 度修正量;然后利用导纳控制产生期望参考位置,并维护从机器人的期望作用力和期望参考位置的关系。
进一步地,获得主机器人和从机器人的末端执行器在任务空间中的实际位置、实际速度和实际加速度具体如下:
机器人末端执行器上,具有力传感器的n连杆机器人的关节空间动力学可写成:
Figure PCTCN2021095966-appb-000001
式中,q、
Figure PCTCN2021095966-appb-000002
分别为关节位置、速度和加速度;M(q)为对称正定惯性矩阵;
Figure PCTCN2021095966-appb-000003
表示向心和科里奥利力矩;G(q)为重力转矩矢量;τ为驱动转矩矢量;f e为力传感器测得的外力;f(q)为将外力矢量f e映射到广义坐标的雅可比矩阵,满足:
Figure PCTCN2021095966-appb-000004
式中,
Figure PCTCN2021095966-appb-000005
分别是机器人末端执行器在任务空间中的实际速度和实际加速度,
Figure PCTCN2021095966-appb-000006
为机器人末端执行器在任务空间中的实际位置x的一阶导数。
进一步地,建立双机器人机械阻尼系统模型具体如下:
当机器人末端执行器接触环境时,可通过弹簧-阻尼模型进行建模:
Figure PCTCN2021095966-appb-000007
式中,C e、K e分别是环境阻尼和刚度常数矩阵;x e是环境的位置;当x≥x e时,则机器人末端执行器与环境之间存在交互力;反之,当x<x e时,不存在交互力;
理想工况下,两个机器人末端吸盘夹持工件时,机构间无任何相对移动,可视作从机器人刚体与夹持工件的主机器人刚体,在传感器的机械阻尼中相互耦合,可得双机器人机械阻尼系统模型。
进一步地,根据双机器人机械阻尼系统模型的动态力平衡方程,获得主机器人的吸盘作用力具体如下:
根据双机器人机械阻尼系统模型的动态力平衡方程,在主机器人端,吸盘作用力f 1为:
Figure PCTCN2021095966-appb-000008
式中,f 1为主机器人实际施加力;k s为环境刚度系数;b s为环境阻尼系数;x 1为主机器人实际位置;x 2为从机器人实际位置;
Figure PCTCN2021095966-appb-000009
为主机器人实际速度,
Figure PCTCN2021095966-appb-000010
为从机器人实际速度,
Figure PCTCN2021095966-appb-000011
为主机器人实际加速度;m 1为主机器人的吸盘与工件质量之和。
进一步地,根据双机器人机械阻尼系统模型的动态力平衡方程,获得从机器人的吸盘作用力具体如下:
在从机器人端,吸盘作用力f 2为:
Figure PCTCN2021095966-appb-000012
式中,f 2可等效为机器人腕部安装的力传感器测得的外力f e;k s为环境刚度系数;b s为环境阻尼系数;x 1为主机器人实际位置;x 2为从机器人实际位置;
Figure PCTCN2021095966-appb-000013
为主机器人实际速度,
Figure PCTCN2021095966-appb-000014
为从机器人实际速度,
Figure PCTCN2021095966-appb-000015
为从机器人实际加速度;m 2为从机器人的吸盘质量。
进一步地,主机器人将实际位置反馈给期望位置具体如下:
施加基于位置误差值的比例-微分控制律,输出为力修正量;对主机器人的位置控制律表示为:
Figure PCTCN2021095966-appb-000016
式中,f 1为主机器人实际施加力,f d为从机器人的期望作用力;x d为主机器人的期望位置;e x
Figure PCTCN2021095966-appb-000017
分别为主机器人的位置偏移量误差和速度误差;
Figure PCTCN2021095966-appb-000018
为位 置控制比例系数;
Figure PCTCN2021095966-appb-000019
为位置控制微分系数,x 1为主机器人实际位置。
进一步地,从机器人将力误差反馈信号转换为从机器人末端的速度修正量具体如下:
对从机器人的阻尼控制律表示为:
Figure PCTCN2021095966-appb-000020
式中,
Figure PCTCN2021095966-appb-000021
为从机器人速度修正量,即为从机器人实际速度;e f为从机器人的力误差值;
Figure PCTCN2021095966-appb-000022
为从机器人的力变化率误差值;
Figure PCTCN2021095966-appb-000023
为力控制比例系数;
Figure PCTCN2021095966-appb-000024
为力控制微分系数;f d为从机器人期望作用力;f 2为从机器人实际施加力。
与现有技术相比,本发明的有益效果至少包括:
本发明主机器人采用理想位置元控制策略,通过强化学习算法来学习期望位置;从机器人基于主机器人位置偏差的力元控制策略,采用适用于未知环境的阻尼PD控制策略,通过强化学习算法来学习期望作用力。利用强化学习下力/位多元数据驱动方法,可提高双机协同的灵巧性,解决力/位控制中的参数优化问题,避免瞬态时的较大误差。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明双机器人协同夹持、搬运和翻转示意图;
图2是本发明双机器人机械阻尼系统模型;
图3是本发明双机器人强化学习多元数据驱动模式框图。
具体实施方式
为使本发明的上述目的、特征和优点能够更加明显易懂,下面将结合附图和具体的实施例对本发明的技术方案进行详细说明。需要指出的是,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例,基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
双机器人在同一个工位区域内对工件的协同夹持、搬运和翻转,需要研究机器人与环境的相互作用,最常用的交互控制方法是力-位置控制。当环境未知时,力/位控制对于环境中的不确定性不足以产生期望的强度,要获得力/位控制,需要估计其期望值。
机器学习(Machine Learning,ML)是通过计算机实现人类学习能力等功能的技术,强化学习(Reinforcement Learning,RL)是对机器学习模型进行训练,以便机器人在一个不确定的、潜在的复杂环境中,在没有精确系统模型的情况下,根据环境选择要执行的动作,通过奖励或惩罚的方式对目标进行编程,进而学习以实现目标。强化学习通过分析与测量系统轨迹数据来估计其功能,从而实时改善其控制行为,可广泛应用于机器人控制、调度等领域。
应用最广泛的强化学习算法是Q学习(Q-Learning),这是一种迭代算法,其目标是使总奖励的期望值最大化,也是马尔可夫(Markov)决策过程中的一种最优行为选择策略,且不需要环境模型。从而提高双机协同性能,解决力/位控制中的参数优化问题,避免瞬态时的较大误差。实现了两个机器人协同搬运同一刚体时的实时追踪,维持机器人动力学不确定时的鲁棒性。
双机器人协同搬运的坐标标定示意图,如图1所示,采用主从协同控制模式,主从机器人的末端分别装载气动吸盘,主吸盘和副吸盘夹持同一工件,执行复杂搬运轨迹。图中O点为世界坐标系原点,(x i,y i,z i)表示当前轴向关节坐标系。机器人基坐标相对O点中心对称,末端关节坐标系的z轴相对于旋转中心 对称。
机器人末端执行器上,具有力传感器的n连杆机器人的关节空间动力学可以写成:
Figure PCTCN2021095966-appb-000025
式中,q、
Figure PCTCN2021095966-appb-000026
分别为关节位置、速度和加速度;M(q)为对称正定惯性矩阵;
Figure PCTCN2021095966-appb-000027
表示向心和科里奥利(Coriolis)力矩;G(q)为重力转矩矢量;τ为驱动转矩矢量;f e为力传感器测得的外力;f(q)为将外力矢量f e映射到广义坐标的雅可比矩阵,满足:
Figure PCTCN2021095966-appb-000028
式中,
Figure PCTCN2021095966-appb-000029
分别是机器人末端执行器在任务空间中的实际速度和实际加速度,
Figure PCTCN2021095966-appb-000030
为机器人末端执行器在任务空间中的实际位置x的一阶导数。
当机器人末端执行器接触环境时,可以通过弹簧-阻尼模型(Kelvin-Voigt)进行建模:
Figure PCTCN2021095966-appb-000031
式中,C e、K e分别是环境阻尼和刚度常数矩阵;x e是环境的位置;当x≥x e时,则机器人末端执行器与环境之间存在交互力;反之,当x<x e时,不存在交互力。
理想工况下,两个机器人末端吸盘夹持工件时,机构间无任何相对移动,可视作从机器人刚体与夹持工件的主机器人刚体,在传感器的机械阻尼中相互耦合,可得其机械阻尼系统模型,如图2所示。根据图中所述模型的动态力平衡方程,在主机器人端,吸盘作用力f 1为:
Figure PCTCN2021095966-appb-000032
式中,f 1为主机器人实际施加力;k s为环境刚度系数;b s为环境阻尼系数; x 1为主机器人实际位置;x 2为从机器人实际位置;m 1为主机器人的吸盘与工件质量之和。
在从机器人端,吸盘作用力f 2为:
Figure PCTCN2021095966-appb-000033
式中,f 2可等效为机器人腕部安装的力传感器测得的外力f e;m 2为从机器人的吸盘质量。
主机器人采用理想位置元控制策略,通过强化学习算法来学习期望位置,将实际位置反馈给期望位置,目标是在机器人与环境相互作用时产生一个最优力,使位置误差最小化。施加基于位置误差值的比例-微分(Proportion Differentiation,PD)控制律,输出为力修正量。对主机器人的位置控制律表示为:
Figure PCTCN2021095966-appb-000034
式中,f d为从机器人的期望作用力;x d为主机器人的期望位置;e x
Figure PCTCN2021095966-appb-000035
分别为主机器人的位置偏移量误差和速度误差;
Figure PCTCN2021095966-appb-000036
为位置控制比例系数;
Figure PCTCN2021095966-appb-000037
为位置控制微分系数。
当没有接触力时,主机器人实际位置x 1跟随期望位置x d。当机器人接触环境时,主机器人期望位置x d由位置PD控制修改更新,主机器人实际位置跟随新的期望位置。
另一方面,基于环境刚度和阻尼模型,从机器人须实时跟踪主机器人的实时运动状态,由此采用适用于未知环境的阻尼PD控制策略,通过强化学习算法来学习期望作用力,即驱动从机器人接近期望参考点的最小力,通过强化学习方法可以获得所期望的力,该力是机器人接近其参考点所需的最小力。然后利用导纳控制产生期望参考位置,并维护从机器人的期望作用力和期望参考位置的关系。
考虑到机器人末端的速度和位置参量,采用阻尼PD控制,通过对比期望作 用力与实际从机器人作用力的误差值,将力误差反馈信号转换为从机器人末端的速度修正量。对从机器人的阻尼控制律表示为:
Figure PCTCN2021095966-appb-000038
式中,
Figure PCTCN2021095966-appb-000039
为从机器人速度修正量;e f为从机器人的力误差值;
Figure PCTCN2021095966-appb-000040
为从机器人的力变化率误差值;
Figure PCTCN2021095966-appb-000041
为力控制比例系数;
Figure PCTCN2021095966-appb-000042
为力控制微分系数。
为了加快学习的收敛速度,采用资格迹(Eligibility Traces)对Q学习算法进行了修正,可为访问状态分配信用提供更好方法。它会随着时间而衰减,因此最近访问过的状态更有资格获得信用奖励,从而加快强化学习的收敛速度。
综上分析,可得到双机器人强化学习多元数据驱动模式框图,如图3所示,为双输入双输出系统。输入为:主机器人期望位置x d、从机器人期望作用力f d;输出为:主机器人实际位置x 1,从机器人实际施加力f 2
主机器人采用理想位置元控制策略,通过强化学习算法来学习期望位置,将实际位置反馈给期望位置,目标是在机器人与环境相互作用时产生一个最优力,使位置误差最小化;从机器人基于主机器人位置偏差的力元控制策略,采用适用于未知环境的阻尼PD控制策略,通过强化学习算法来学习期望作用力,驱动从机器人接近期望参考点的最小力。然后利用导纳控制产生期望参考位置,并维护从机器人的期望作用力和期望参考位置的关系。也就是,主从机器人分别强化学习算法来学习期望位置和期望作用力,均采用比例微分控制率,对各自的微分系数(k p)与比例系数(k d)进行整定。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (7)

  1. 一种基于强化学习的双机器人力/位多元数据驱动方法,其特征在于,包括如下步骤:
    获得主机器人和从机器人的末端执行器在任务空间中的实际位置、实际速度和实际加速度;
    利用主机器人和从机器人末端执行器在任务空间中的实际位置、实际速度和实际加速度,建立双机器人机械阻尼系统模型;
    根据双机器人机械阻尼系统模型的动态力平衡方程,获得主机器人和从机器人的吸盘作用力,主机器人的吸盘作用力即为主机器人实际施加力,从机器人的吸盘作用力即为从机器人实际施加力;
    主机器人采用理想位置元控制策略,通过强化学习算法来学习期望位置,根据主机器人实际施加力采用比例微分控制率,对微分系数与比例系数进行整定,将实际位置反馈给期望位置;当主机器人没有接触环境时,主机器人实际位置跟随期望位置;当主机器人接触环境时,主机器人期望位置由位置PD控制修改更新,主机器人实际位置跟随新的期望位置;
    从机器人基于主机器人位置偏差的力元控制策略,采用适用于未知环境的阻尼PD控制策略,通过强化学习算法来学习期望作用力,通过对比期望作用力与从机器人实际施加力的误差值,将力误差反馈信号转换为从机器人末端的速度修正量;然后利用导纳控制产生期望参考位置,并维护从机器人的期望作用力和期望参考位置的关系。
  2. 根据权利要求1所述的基于强化学习的双机器人力/位多元数据驱动方法,其特征在于,获得主机器人和从机器人的末端执行器在任务空间中的实际位置、实际速度和实际加速度具体如下:
    机器人末端执行器上,具有力传感器的n连杆机器人的关节空间动力学可 写成:
    Figure PCTCN2021095966-appb-100001
    式中,q、
    Figure PCTCN2021095966-appb-100002
    分别为关节位置、速度和加速度;M(q)为对称正定惯性矩阵;
    Figure PCTCN2021095966-appb-100003
    表示向心和科里奥利力矩;G(q)为重力转矩矢量;τ为驱动转矩矢量;f e为力传感器测得的外力;f(q)为将外力矢量f e映射到广义坐标的雅可比矩阵,满足:
    Figure PCTCN2021095966-appb-100004
    式中,
    Figure PCTCN2021095966-appb-100005
    分别是机器人末端执行器在任务空间中的实际速度和实际加速度,
    Figure PCTCN2021095966-appb-100006
    为机器人末端执行器在任务空间中的实际位置x的一阶导数。
  3. 根据权利要求2所述的基于强化学习的双机器人力/位多元数据驱动方法,其特征在于,建立双机器人机械阻尼系统模型具体如下:
    当机器人末端执行器接触环境时,可通过弹簧-阻尼模型进行建模:
    Figure PCTCN2021095966-appb-100007
    式中,C e、K e分别是环境阻尼和刚度常数矩阵;x e是环境的位置;当x≥x e时,则机器人末端执行器与环境之间存在交互力;反之,当x<x e时,不存在交互力;
    理想工况下,两个机器人末端吸盘夹持工件时,机构间无任何相对移动,可视作从机器人刚体与夹持工件的主机器人刚体,在传感器的机械阻尼中相互耦合,可得双机器人机械阻尼系统模型。
  4. 根据权利要求1所述的基于强化学习的双机器人力/位多元数据驱动方法,其特征在于,根据双机器人机械阻尼系统模型的动态力平衡方程,获得主机器人的吸盘作用力具体如下:
    根据双机器人机械阻尼系统模型的动态力平衡方程,在主机器人端,吸盘 作用力f 1为:
    Figure PCTCN2021095966-appb-100008
    式中,f 1为主机器人实际施加力;k s为环境刚度系数;b s为环境阻尼系数;x 1为主机器人实际位置;x 2为从机器人实际位置;
    Figure PCTCN2021095966-appb-100009
    为主机器人实际速度,
    Figure PCTCN2021095966-appb-100010
    为从机器人实际速度,
    Figure PCTCN2021095966-appb-100011
    为主机器人实际加速度;m 1为主机器人的吸盘与工件质量之和。
  5. 根据权利要求1所述的基于强化学习的双机器人力/位多元数据驱动方法,其特征在于,根据双机器人机械阻尼系统模型的动态力平衡方程,获得从机器人的吸盘作用力具体如下:
    在从机器人端,吸盘作用力f 2为:
    Figure PCTCN2021095966-appb-100012
    式中,f 2可等效为机器人腕部安装的力传感器测得的外力f e;k s为环境刚度系数;b s为环境阻尼系数;x 1为主机器人实际位置;x 2为从机器人实际位置;
    Figure PCTCN2021095966-appb-100013
    为主机器人实际速度,
    Figure PCTCN2021095966-appb-100014
    为从机器人实际速度,
    Figure PCTCN2021095966-appb-100015
    为从机器人实际加速度;m 2为从机器人的吸盘质量。
  6. 根据权利要求1所述的基于强化学习的双机器人力/位多元数据驱动方法,其特征在于,主机器人将实际位置反馈给期望位置具体如下:
    施加基于位置误差值的比例-微分控制律,输出为力修正量;对主机器人的位置控制律表示为:
    Figure PCTCN2021095966-appb-100016
    式中,f 1为主机器人实际施加力,f d为从机器人的期望作用力;x d为主机器人的期望位置;e x
    Figure PCTCN2021095966-appb-100017
    分别为主机器人的位置偏移量误差和速度误差;
    Figure PCTCN2021095966-appb-100018
    为位置控制比例系数;
    Figure PCTCN2021095966-appb-100019
    为位置控制微分系数,x 1为主机器人实际位置。
  7. 根据权利要求1所述的基于强化学习的双机器人力/位多元数据驱动方法,其特征在于,从机器人将力误差反馈信号转换为从机器人末端的速度修正量具体如下:
    对从机器人的阻尼控制律表示为:
    Figure PCTCN2021095966-appb-100020
    式中,
    Figure PCTCN2021095966-appb-100021
    为从机器人速度修正量,即为从机器人实际速度;e f为从机器人的力误差值;
    Figure PCTCN2021095966-appb-100022
    为从机器人的力变化率误差值;k pf为力控制比例系数;
    Figure PCTCN2021095966-appb-100023
    为力控制微分系数;f d为从机器人期望作用力;f 2为从机器人实际施加力。
PCT/CN2021/095966 2021-05-19 2021-05-26 一种基于强化学习的双机器人力/位多元数据驱动方法 WO2022241806A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/751,024 US20220371186A1 (en) 2021-05-19 2022-05-23 Dual-robot position/force multivariate-data-driven method using reinforcement learning

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110547805.8 2021-05-19
CN202110547805.8A CN113427483A (zh) 2021-05-19 2021-05-19 一种基于强化学习的双机器人力/位多元数据驱动方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/751,024 Continuation US20220371186A1 (en) 2021-05-19 2022-05-23 Dual-robot position/force multivariate-data-driven method using reinforcement learning

Publications (1)

Publication Number Publication Date
WO2022241806A1 true WO2022241806A1 (zh) 2022-11-24

Family

ID=77802471

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/095966 WO2022241806A1 (zh) 2021-05-19 2021-05-26 一种基于强化学习的双机器人力/位多元数据驱动方法

Country Status (2)

Country Link
CN (1) CN113427483A (zh)
WO (1) WO2022241806A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW202321847A (zh) 2021-11-01 2023-06-01 美商靈巧公司 控制多個機器人協同執行任務的機器人系統
CN114161402B (zh) * 2021-12-17 2023-11-10 深圳市优必选科技股份有限公司 机器人稳定控制方法、模型构建方法、装置和机器人
CN114789444B (zh) * 2022-05-05 2022-12-16 山东省人工智能研究院 一种基于深度强化学习和阻抗控制的柔顺人机接触方法
CN115257995A (zh) * 2022-05-19 2022-11-01 伍福人工智能(河南)有限公司 机器人的控制方法、装置、终端设备以及存储介质
WO2024050729A1 (en) * 2022-09-07 2024-03-14 Shanghai Flexiv Robotics Technology Co., Ltd. Robot teleoperation system and method
CN116069044B (zh) * 2023-03-29 2023-06-16 湖南大学 一种多机器人协同搬运力位混合控制方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5023808A (en) * 1987-04-06 1991-06-11 California Institute Of Technology Dual-arm manipulators with adaptive control
CN109358506A (zh) * 2018-10-26 2019-02-19 南京理工大学 一种基于干扰观测器的自适应模糊遥操作控制方法
CN110757454A (zh) * 2019-10-12 2020-02-07 广州中国科学院先进技术研究所 一种双机器人协同旋转的路径规划方法和装置
CN111890348A (zh) * 2019-05-06 2020-11-06 广州中国科学院先进技术研究所 一种双机器人协同搬运的控制方法及装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014152418A1 (en) * 2013-03-14 2014-09-25 Board Of Regents Of The University Of Nebraska Methods, systems, and devices relating to force control surgical systems
CN108153153B (zh) * 2017-12-19 2020-09-11 哈尔滨工程大学 一种学习变阻抗控制系统及控制方法
KR102067123B1 (ko) * 2018-02-13 2020-02-11 경북대학교 산학협력단 산업용 로봇 팔 제어를 위한 슈트형 다자유도 마스터 장치
CN110421547B (zh) * 2019-07-12 2022-10-28 中南大学 一种基于估计动力学模型的双臂机器人协同阻抗控制方法
CN112296995B (zh) * 2019-07-26 2023-08-08 广州中国科学院先进技术研究所 机器人协作搬运系统
CN111941421B (zh) * 2020-06-22 2022-02-18 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 一种基于多机器人协同操作的自适应模糊力跟踪控制方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5023808A (en) * 1987-04-06 1991-06-11 California Institute Of Technology Dual-arm manipulators with adaptive control
CN109358506A (zh) * 2018-10-26 2019-02-19 南京理工大学 一种基于干扰观测器的自适应模糊遥操作控制方法
CN111890348A (zh) * 2019-05-06 2020-11-06 广州中国科学院先进技术研究所 一种双机器人协同搬运的控制方法及装置
CN110757454A (zh) * 2019-10-12 2020-02-07 广州中国科学院先进技术研究所 一种双机器人协同旋转的路径规划方法和装置

Also Published As

Publication number Publication date
CN113427483A (zh) 2021-09-24

Similar Documents

Publication Publication Date Title
WO2022241806A1 (zh) 一种基于强化学习的双机器人力/位多元数据驱动方法
Yu et al. Bayesian estimation of human impedance and motion intention for human–robot collaboration
CN109848983B (zh) 一种高顺应性人引导机器人协同作业的方法
Chen et al. Tracking control of robot manipulators with unknown models: A jacobian-matrix-adaption method
Xu et al. Dynamic neural networks based kinematic control for redundant manipulators with model uncertainties
WO2021135405A1 (zh) 一种面向机械臂的无运动学模型轨迹跟踪方法及一种机械臂系统
Li et al. Learning object-level impedance control for robust grasping and dexterous manipulation
CN111941421B (zh) 一种基于多机器人协同操作的自适应模糊力跟踪控制方法
Lee et al. Relative impedance control for dual-arm robots performing asymmetric bimanual tasks
US20220371186A1 (en) Dual-robot position/force multivariate-data-driven method using reinforcement learning
CN111319036A (zh) 基于自适应算法的移动机械臂位置/力自抗扰控制方法
CN111890348B (zh) 一种双机器人协同搬运的控制方法及装置
Jiao et al. Adaptive hybrid impedance control for dual-arm cooperative manipulation with object uncertainties
Fan et al. Data-driven motion-force control scheme for redundant manipulators: A kinematic perspective
CN108555914B (zh) 一种基于腱驱动灵巧手的dnn神经网络自适应控制方法
CN111515928B (zh) 机械臂运动控制系统
CN113927599A (zh) 绝对精度补偿方法及系统、设备和计算机可读存储介质
CN115139301A (zh) 基于拓扑结构自适应神经网络的机械臂运动规划方法
Sun Kinematics model identification and motion control of robot based on fast learning neural network
CN114131617B (zh) 一种工业机器人的智能柔顺控制方法和装置
Mezouar et al. External hybrid vision/force control
Maghami et al. Calibration of multi-robot cooperative systems using deep neural networks
CN116069044B (zh) 一种多机器人协同搬运力位混合控制方法
Xia et al. Hybrid force/position control of industrial robotic manipulator based on Kalman filter
CN116810792A (zh) 基于神经网络的引信与传爆管双机器人装配柔性控制方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21940259

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21940259

Country of ref document: EP

Kind code of ref document: A1