CN114851201B - A six-degree-of-freedom visual closed-loop grasping method for robotic arm based on TSDF 3D reconstruction - Google Patents
A six-degree-of-freedom visual closed-loop grasping method for robotic arm based on TSDF 3D reconstruction Download PDFInfo
- Publication number
- CN114851201B CN114851201B CN202210551360.5A CN202210551360A CN114851201B CN 114851201 B CN114851201 B CN 114851201B CN 202210551360 A CN202210551360 A CN 202210551360A CN 114851201 B CN114851201 B CN 114851201B
- Authority
- CN
- China
- Prior art keywords
- grasping
- camera
- voxel
- coordinate system
- depth
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000000007 visual effect Effects 0.000 title claims abstract description 14
- 230000006870 function Effects 0.000 claims abstract description 44
- 230000002787 reinforcement Effects 0.000 claims abstract description 36
- 239000012636 effector Substances 0.000 claims abstract description 33
- 230000009471 action Effects 0.000 claims abstract description 27
- 230000007613 environmental effect Effects 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 5
- 238000001514 detection method Methods 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 23
- 238000013461 design Methods 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 8
- 239000003795 chemical substances by application Substances 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 7
- 238000003384 imaging method Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 238000002474 experimental method Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000005315 distribution function Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 230000008676 import Effects 0.000 claims description 3
- 238000004088 simulation Methods 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1679—Programme controls characterised by the tasks executed
- B25J9/1692—Calibration of manipulator
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1694—Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
- B25J9/1697—Vision controlled systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T90/00—Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Image Analysis (AREA)
Abstract
本发明涉及基于TSDF三维重建的机械臂六自由度视觉闭环抓取方法。包括如下步骤:步骤一:通过张正友相机标定法和ArUco Markers标定法标定机械臂底座坐标系和相机坐标系;步骤二:利用TSDF函数对获取的图像信息进行三维场景重建,以减少物体与物体之间的环境噪声点;步骤三:建立强化学习网络模型;步骤四:将预测的末端执行器抓取姿态反向投影至三维重建场景中,判断本次预测抓取质量;步骤五:通过机器人正逆运动学完成机械臂抓取移动;步骤六:进行强化学习模型训练,使得机械臂完成抓取动作;本发明克服现有技术的缺点,提出一种易实现、适用性高的,基于TSDF三维重建的机械臂六自由度视觉闭环抓取系统,此系统减少物体间遮挡、堆叠造成的环境噪声,降低了单一视觉传感器干扰导致的深度误差。此外,该系统在保证高精度的同时,可以实现快速实时的目标检测并完成抓取动作。
The invention relates to a six-degree-of-freedom visual closed-loop grasping method of a mechanical arm based on TSDF three-dimensional reconstruction. It includes the following steps: Step 1: Calibrate the coordinate system of the base of the manipulator and the coordinate system of the camera through Zhang Zhengyou’s camera calibration method and ArUco Markers calibration method; Step 2: Use the TSDF function to reconstruct the 3D scene from the acquired image information to reduce the distance between objects. environmental noise points between them; Step 3: Establish a reinforcement learning network model; Step 4: Back-project the predicted grasping posture of the end effector to the 3D reconstruction scene to judge the quality of the predicted grasping; Step 5: Through the robot Inverse kinematics completes the grasping movement of the robotic arm; Step 6: Carry out reinforcement learning model training to enable the robotic arm to complete the grasping action; the present invention overcomes the shortcomings of the prior art and proposes an easy-to-implement, high-applicability, three-dimensional based on TSDF The rebuilt six-degree-of-freedom visual closed-loop grasping system of the robotic arm reduces the environmental noise caused by occlusion and stacking between objects, and reduces the depth error caused by the interference of a single visual sensor. In addition, the system can realize fast real-time target detection and complete grasping action while ensuring high precision.
Description
技术领域technical field
本发明属于一种非结构环境下的机械臂六自由度抓取物体技术,具体机器人抓取控制领域。The invention belongs to a six-degree-of-freedom object grasping technology of a mechanical arm in a non-structural environment, specifically the field of robot grasping control.
背景技术Background technique
目标抓取一直是机器人学重要的问题,但始终没有令人满意地相应解决方案。得益于机器臂的多功能操纵系统,使其能够在三维空间内灵活控制具有高自由度的末端执行器,实现对于物体的灵巧抓取以及对环境变化的动态响应能力。最近,随着深度学习和强化学习的快速发展和相应体系的构建,为机械臂智能抓取方式提供了多方面的可行思路。Object grasping has always been an important problem in robotics, but there has been no satisfactory corresponding solution. Thanks to the multi-functional manipulation system of the robotic arm, it can flexibly control the end effector with a high degree of freedom in three-dimensional space, achieving dexterous grasping of objects and dynamic response to environmental changes. Recently, with the rapid development of deep learning and reinforcement learning and the construction of corresponding systems, various feasible ideas have been provided for intelligent grasping methods of manipulators.
尽管机械臂6自由度抓取控制具有较高的实用价值适用于更复杂的操作环境,但是目前多数数据驱动抓取算法仅在简单的桌面设置中进行自上而下的抓取(4Dof:x,y,z,yaw),或利用物理分析来掌握合适的抓取姿态。然而由于受限制于三维移动抓取,算法对于应用场景的限制很大,机械臂末端执行器只能从垂直往下的方向接近物体,在某些情况下,它不能沿着这个方向抓取物体。例如,夹持器很难抓手很难抓住水平放置的盘子。此外这些基于物理分析的方法不仅需要计算大量的实验环境数据,并且需要计算估计精确的目标模型,从而产生的必然是大量的时间和计算成本,而对于非结构化目标物体的物理模型往往是不通用的,因此大多算法很难直接应用于新颖物体的抓取,系统的鲁棒性和泛化性较差。Although the 6-DOF grasping control of the manipulator has high practical value and is suitable for more complex operating environments, most current data-driven grasping algorithms only perform top-down grasping in simple desktop settings (4Dof: x , y, z, yaw), or use physical analysis to master the appropriate grasping posture. However, due to the limitation of three-dimensional mobile grasping, the algorithm has great limitations on the application scenarios. The end effector of the manipulator can only approach the object from the vertical downward direction. In some cases, it cannot grasp the object along this direction. . For example, the gripper is difficult to grip a plate that is placed horizontally. In addition, these methods based on physical analysis not only need to calculate a large amount of experimental environment data, but also need to calculate and estimate an accurate target model, which inevitably results in a lot of time and calculation costs, and the physical model of an unstructured target object is often insufficient. General, so most algorithms are difficult to be directly applied to the grasping of novel objects, and the robustness and generalization of the system are poor.
由此,机械臂6自由度(6-DOF)抓取想法被提出。虽然,Hongzhuo Liang提出的PointnetGPD采用采样-评估的两步法,通过评估大量的样本以确定可靠的抓取姿态。但是,这种方法无疑是相当耗时的。(Hongzhuo Liang et al.“Pointnetgpd:Detecting graspconfigurations from point sets”.In:2019International Conference on Roboticsand Automation(ICRA).IEEE.2019,pp.3629–3635)。Florence等人从现有的抓取姿态中进行姿态转移。但是该算法对于数据集中未记录的目标对象和物体几何形状时,其成功率就会相对较为低下,故此不能推广到新的应用场景。(Peter Florence,Lucas Manuelli,andRuss Tedrake.“Dense Object Nets:Learning Dense Visual Object Descriptors Byand For Robotic Manipulation”.In:Conference on Robot Learning(CoRL)(2018))。最近,从RGBD相机获取的部分视图中重建了点云场景,然后使用设计的Pointnet++网络模型提取物体特征后,直接完成6Dof抓取姿势回归并在重建的点云场景上进行抓取规划。(P.Ni,W.Zhang,X.Zhu,and Q.Cao,“Learning an end-to-end spatial graspgeneration and refinement algorithm from simulation,”Machine Vision andApplications,vol.32,no.1,pp.1–12,2021.)。Therefore, the idea of 6-degree-of-freedom (6-DOF) grasping of the manipulator is proposed. Although, the PointnetGPD proposed by Hongzhuo Liang uses a sampling-evaluation two-step method to determine a reliable grasping pose by evaluating a large number of samples. However, this method is undoubtedly quite time-consuming. (Hongzhuo Liang et al. "Pointnetgpd: Detecting grasp configurations from point sets". In: 2019 International Conference on Robotics and Automation (ICRA). IEEE. 2019, pp. 3629–3635). Florence et al. perform pose transfer from existing grasping poses. However, the success rate of this algorithm is relatively low for target objects and object geometries that are not recorded in the data set, so it cannot be extended to new application scenarios. (Peter Florence, Lucas Manuelli, and Russ Tedrake. "Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation". In: Conference on Robot Learning (CoRL) (2018)). Recently, the point cloud scene is reconstructed from the partial view obtained by the RGBD camera, and then the designed Pointnet++ network model is used to extract the object features, and the 6Dof grasp pose regression is directly completed and the grasp planning is performed on the reconstructed point cloud scene. (P.Ni, W.Zhang, X.Zhu, and Q.Cao, "Learning an end-to-end spatial grasp generation and refinement algorithm from simulation," Machine Vision and Applications, vol.32, no.1, pp.1 –12, 2021.).
发明内容Contents of the invention
本发明克服现有技术的缺点,提出一种易实现、适用性高的机械臂六自由度抓取方法。本发明设计了特征提取网络和Policy Gradient强化学习框架,可以实现对于训练数据集中未记录物体的定位和六自由度计划抓取。The invention overcomes the disadvantages of the prior art, and proposes an easy-to-implement, high-applicability grasping method of a mechanical arm with six degrees of freedom. The present invention designs a feature extraction network and a Policy Gradient reinforcement learning framework, which can realize the positioning of unrecorded objects in the training data set and the six-degree-of-freedom plan grasping.
本发明以图像序列作为输入,首先通过张正友相机标定法和ArUco Markers标定法构建机械臂坐标系和相机坐标系。然后利用Intel Rrealsense D435深度相机获取抓取操作台的彩色和深度信息。通过Truncated Signed Distance Function(TSDF)对获取的图像信息进行三维场景重建,以减少物体与物体之间的环境噪声点。从上至下截取重建场景的二维俯视图像,并作为本文设计抓取神经网络模型的输入,模型对输入图像进行编码和特征提取,输入当前环境的预测抓取区域和机械臂抓取姿态。网络模型结构图见附图1。该模型采用Renet全卷积特征提取网络架构,学习从RGB-D部分观察中提取非结构化特征。模型将彩色特征和深度特征进行通道连接,采用2个额外的Batch Normalization(BN),Nonlinear activation functions(ReLU),以及1×1Convolutional Layers交织。然后使用平均池化(Average Pooling)来聚合同一像素单元的特征嵌入,并进行线性上采样,使得卷积特征在不同位置和方向上共享,输出图片和输入分辨率一致,选取最大抓取概率热点图块的所在坐标,将其对应至实验前定义的半球状三维抓取姿态点(见附图2),并将对应的6维姿态作为预测的6自由度抓取姿态。其次将模型输入的预测抓取姿态反向投影至重建的三维场景中,利用TSDF函数从三维场景中截取当前姿态下深度图像,并根据夹持器两端的深度信息判断本次抓取质量。最后,当渲染图像中目标坐标点深度小于等于夹持器两端指尖深度(15cm)时,将预测姿态输入机器人正运动学得到相应机械臂运动轨迹,完成机械臂本次抓取动作。如果本次抓取成功,则赋值给智能体奖励为1,反之赋值给智能体奖励为0。系统抓取流程图见附图3。The present invention takes the image sequence as input, and first constructs the mechanical arm coordinate system and the camera coordinate system through the Zhang Zhengyou camera calibration method and the ArUco Markers calibration method. Then use the Intel Rrealsense D435 depth camera to obtain the color and depth information of the grabbing platform. Through the Truncated Signed Distance Function (TSDF), the 3D scene reconstruction is performed on the acquired image information to reduce the environmental noise points between objects. The two-dimensional top-view image of the reconstructed scene is intercepted from top to bottom, and used as the input of the grasping neural network model designed in this paper. The model encodes the input image and extracts features, and inputs the predicted grasping area of the current environment and the grasping posture of the robotic arm. The structure diagram of the network model is shown in Figure 1. The model uses a Renet fully convolutional feature extraction network architecture to learn to extract unstructured features from RGB-D partial observations. The model connects color features and depth features to channels, using 2 additional Batch Normalization (BN), Nonlinear activation functions (ReLU), and 1×1 Convolutional Layers interleaved. Then use average pooling (Average Pooling) to aggregate the feature embedding of the same pixel unit, and perform linear upsampling, so that the convolution features are shared in different positions and directions, the output image is consistent with the input resolution, and the maximum capture probability hotspot is selected The coordinates of the block correspond to the hemispherical three-dimensional grasping attitude point defined before the experiment (see Figure 2), and the corresponding 6-dimensional attitude is used as the predicted 6-degree-of-freedom grasping attitude. Secondly, the predicted grasping posture input by the model is back-projected to the reconstructed 3D scene, and the depth image under the current posture is intercepted from the 3D scene by using the TSDF function, and the grasping quality is judged according to the depth information at both ends of the gripper. Finally, when the depth of the target coordinate point in the rendered image is less than or equal to the depth of the fingertips at both ends of the gripper (15cm), input the predicted pose into the forward kinematics of the robot to obtain the corresponding movement trajectory of the robotic arm, and complete the grasping action of the robotic arm. If this capture is successful, the reward assigned to the agent is 1, otherwise the reward assigned to the agent is 0. See Figure 3 for the flow chart of the system capture.
本发明为一种基于TSDF三维重建的机械臂六自由度视觉闭环抓取方法,具体步骤如下:The present invention is a six-degree-of-freedom visual closed-loop grasping method of a mechanical arm based on TSDF three-dimensional reconstruction, and the specific steps are as follows:
步骤1:张正友相机标定法和ArUco Markers标定法标定机械臂底座坐标系和相机坐标系:Step 1: Zhang Zhengyou camera calibration method and ArUco Markers calibration method to calibrate the coordinate system of the manipulator base and the camera coordinate system:
首先将Intel D415深度相机垂直固定于机械臂末端,使得相机与末端执行器夹爪之间呈固定姿态变换,使得双目相机能够采集抓取操作台上物体的图像信息。然后,张正友相机标定法和ArUco Markers标定法标定相机内参和外参,构建机械臂底座为原点的三维坐标系。相机内参和外参标定公式如下所示:First, the Intel D415 depth camera is vertically fixed at the end of the robotic arm, so that the camera and the end effector jaws are in a fixed attitude transformation, so that the binocular camera can collect image information of objects on the operating table. Then, Zhang Zhengyou camera calibration method and ArUco Markers calibration method were used to calibrate the internal and external parameters of the camera, and construct a three-dimensional coordinate system with the base of the manipulator as the origin. The calibration formulas of camera intrinsic and extrinsic parameters are as follows:
其中,xw,yw,zw为世界坐标系下三维坐标;xc,yc,zc为图像坐标系下坐标;u,v为像素坐标系;zm为假设的相机坐标系中的一点;p为图像坐标系中的成像点;f为成像距离;dx,dy为像素点在图像坐标系下坐标;R,T为旋转矩阵和平移矩阵。公式(1)为世界坐标系到相机坐标系公式。公式(2)为相机坐标系到理想图像坐标系公式。公式(3)为相机坐标系到像素坐标系矩阵,即内参矩阵。Among them, x w , y w , z w are the three-dimensional coordinates in the world coordinate system; x c , y c , z c are the coordinates in the image coordinate system; u, v are the pixel coordinate system; z m is the hypothetical camera coordinate system p is the imaging point in the image coordinate system; f is the imaging distance; d x , d y are the coordinates of the pixel point in the image coordinate system; R, T are the rotation matrix and translation matrix. Formula (1) is the formula from the world coordinate system to the camera coordinate system. Formula (2) is the formula from the camera coordinate system to the ideal image coordinate system. Formula (3) is the matrix from the camera coordinate system to the pixel coordinate system, that is, the internal reference matrix.
步骤2:利用TSDF函数对获取的图像信息进行三维场景重建,以减少物体与物体之间的环境噪声点:Step 2: Use the TSDF function to perform 3D scene reconstruction on the acquired image information to reduce the environmental noise points between objects:
步骤2.1:本发明在实际实验中采用单一传感器获取视觉信息,通过OpenCV获取双目相机对抓取操作台捕获的信息(彩色信息和深度信息)。Step 2.1: The present invention uses a single sensor to obtain visual information in actual experiments, and obtains the information (color information and depth information) captured by the binocular camera on the grasping console through OpenCV.
步骤2.2:由于光线,物体之间相互遮挡等环境因素导致传感器获取的深度图像中存在较大噪声点,从而影响网络模型对图像的抓取预测。首先本发明在每次抓取前对抓取操作台进行球型点采样,通过多次采样局部图像降低单一图像中深度误差。其次,利用TSDF函数将采样点姿态和采样图像信息融合成三维重建体素网格。对于单独一个体素而言,不仅包含直观的x,y,z坐标还用另外两个值(SDF和CAM)来表示体素到最表面的距离。其中SDF为符号距离函数用于深度图的融合,它用0表示表面,正值表示模型外部点,负值表示模型内部点,数值正比于点到最近表面的距离。表面容易提取为函数的零交叉点。Step 2.2: Due to environmental factors such as light and mutual occlusion between objects, there are large noise points in the depth image acquired by the sensor, which affects the image capture prediction of the network model. Firstly, the present invention samples the spherical points of the grabbing operation table before each grabbing, and reduces the depth error in a single image by sampling local images multiple times. Second, the TSDF function is used to fuse the pose of the sampling point and the information of the sampled image into a 3D reconstruction voxel grid. For a single voxel, it not only contains the intuitive x, y, z coordinates but also uses two other values (SDF and CAM) to represent the distance from the voxel to the most surface. Among them, SDF is a signed distance function for the fusion of depth maps. It uses 0 to represent the surface, positive values represent points outside the model, and negative values represent points inside the model. The value is proportional to the distance from the point to the nearest surface. The surface is easily extracted as the zero crossings of the function.
步骤2.3:定义上述体素栅格所在的坐标系为世界坐标系,体素坐标可表示为相机的位姿为/>相机的内参矩阵为K。Step 2.3: Define the coordinate system where the above voxel grid is located as the world coordinate system, and the voxel coordinates can be expressed as The pose of the camera is /> The internal parameter matrix of the camera is K.
所以根据ICP配准得到的变换矩阵将世界坐标系下的体素栅格投影到相机坐标系,再根据相机内参矩阵K转换到图像坐标系Iij,其中i∈(0,1,...,n),j∈(0,1,...,m)。Therefore, according to the transformation matrix obtained by ICP registration, the voxel grid in the world coordinate system Projected to the camera coordinate system, and then converted to the image coordinate system I ij according to the camera internal reference matrix K, where i∈(0,1,...,n),j∈(0,1,...,m).
将体素栅格投影至图像坐标系后,首先需要计算每个体素初始化SDF值,如公式(18)所示After projecting the voxel grid to the image coordinate system, it is first necessary to calculate the initial SDF value of each voxel, as shown in formula (18)
式中表示第j个体素在世界坐标系下的位置信息,/>表示第i个相机位姿在世界坐标系下的位置信息,dep(Iij)表示第j个体素在第i个相机深度图像坐标系下的深度值。In the formula Indicates the position information of the jth voxel in the world coordinate system, /> Indicates the position information of the i-th camera pose in the world coordinate system, and dep(I ij ) indicates the depth value of the j-th voxel in the i-th camera depth image coordinate system.
紧接着根据公式(19)截断每个体素的sdf值。Then truncate the sdf value of each voxel according to formula (19).
式中tsdfi为体素截断距离值,trunc表示人为设定的截断距离,可理解为相机深度信息的误差值,如果误差越大,则该值设置大一些,否则可能造成深度相机获取的很多信息丢掉,本项目trunc设定为1。In the formula, tsdf i is the voxel truncation distance value, and trunc represents the artificially set truncation distance, which can be understood as the error value of the camera depth information. If the error is larger, the value should be set larger, otherwise it may cause a lot of depth camera acquisition The information is lost, and the trunc of this item is set to 1.
将每个体素的sdf值截断后,根据公式(20)重新计算每个体素的tsdf值:After truncating the sdf value of each voxel, recalculate the tsdf value of each voxel according to formula (20):
其中ωj(x,y,z)为当前帧全局体素栅格中体素的权重,ωj-1(x,y,z)为上一帧体素栅格中体素的权重,tsdfj为当前帧全局数据立方体中体素到物体表面的距离,tsdfj-1为上一帧全局数据立方体中体素到物体表面的距离,V.z表示体素在相机坐标系下的z轴坐标,D(u,v)表示当前帧深度图像(u,v)处的深度值。Where ω j (x, y, z) is the weight of the voxel in the global voxel grid of the current frame, ω j-1 (x, y, z) is the weight of the voxel in the previous frame of the voxel grid, tsdf j is the distance from the voxel in the global data cube in the current frame to the object surface, tsdf j-1 is the distance from the voxel in the global data cube in the previous frame to the object surface, V .z represents the z-axis of the voxel in the camera coordinate system Coordinates, D(u,v) represents the depth value at the depth image (u,v) of the current frame.
公式(9)计算每个体素的权重值。Equation (9) calculates the weight value of each voxel.
wj=wj-1+wj (9)w j =w j-1 +w j (9)
从上至下渲染出抓取操作台的环境信息并以此作为网络模型的输入。The environment information of the grabbing console is rendered from top to bottom and used as the input of the network model.
步骤3:建立强化学习网络模型:Step 3: Build a reinforcement learning network model:
步骤3.1:设计机械臂抓取策略,定义强化学习的动作空间:本发明设计使用一个二维卷积网络通过升高网络维度直接回归6-DoF抓取姿态。夹持器的定义抓取动作见附图2。Step 3.1: Design the grasping strategy of the robotic arm and define the action space for reinforcement learning: the present invention designs and uses a two-dimensional convolutional network to directly return the 6-DoF grasping posture by increasing the network dimension. See attached drawing 2 for the gripping action of the gripper definition.
步骤3.2:将数组中每一元素转换为机械臂末端分别绕x,y,z三个坐标轴旋转的角度,具体转换公式如下所示:Step 3.2: Convert each element in the array to the angle at which the end of the robot arm rotates around the three coordinate axes of x, y, and z respectively. The specific conversion formula is as follows:
rz=(best_pix_ind[2]*θ) (12)r z =(best_pix_ind[2]*θ) (12)
其中ax表示为机械臂末端绕x轴旋转角度,即为末端执行器的侧倾角;by表示为机械臂末端绕y轴旋转角度,即为末端执行器的俯仰角;rz表示为机械臂末端绕z轴旋转角度,即为末端执行器的偏航角;为实验设定偏差值。where a x represents the rotation angle of the end of the mechanical arm around the x-axis, which is the roll angle of the end effector; b y represents the rotation angle of the end of the mechanical arm around the y-axis, which is the pitch angle of the end effector; r z represents the mechanical The rotation angle of the end of the arm around the z-axis is the yaw angle of the end effector; Set the bias value for the experiment.
步骤3.3:设计特征提取网络模型:模型采用Resnet全卷积特征提取网络架构,学习从RGB-D部分观察中提取非结构化特征。多层特征提取卷积层将彩色图像和深度图像的特征提取后,通过Concat函数将彩色特征和深度特征进行通道连接,采用2个额外的BatchNormalization(BN),Nonlinear activation functions(ReLU),1×1ConvolutionalLayers,以及上采样将输出热力图分辨率与输入图像保持一致。选择张量图中最大矢量值坐标对应动作空间中相应的姿态作为预测抓取姿态输出。Step 3.3: Design the feature extraction network model: The model uses the Resnet fully convolutional feature extraction network architecture to learn to extract unstructured features from RGB-D partial observations. After the multi-layer feature extraction convolutional layer extracts the features of the color image and the depth image, the color feature and the depth feature are channel-connected through the Concat function, and 2 additional BatchNormalization (BN), Nonlinear activation functions (ReLU), 1× 1ConvolutionalLayers, and upsampling will keep the resolution of the output heatmap consistent with the input image. The corresponding pose in the action space corresponding to the coordinate of the largest vector value in the tensor graph is selected as the predicted grasping pose output.
步骤3.4:通过以下公式对网络进行前向推理:Step 3.4: Perform forward inference on the network by the following formula:
其中公式(13)表示在状态s,动作a下的期望回报,其中gt表示t时刻采取的抓取动作,st表示t时刻的状态,rt表示t时刻的回报;公式(14)表示网络总的回报函数;公式(15)为状态分布函数;公式(16)表示状态-动作函数。Among them, formula (13) represents the expected return under state s and action a, where g t represents the grasping action taken at time t, s t represents the state at time t, and r t represents the reward at time t; formula (14) expresses The total reward function of the network; Formula (15) is the state distribution function; Formula (16) represents the state-action function.
步骤3.5设计强化学习网络损失函数,采用计算交叉熵函数,从而进一步提升算法的检测精度:Step 3.5 designs the reinforcement learning network loss function, and uses the calculation cross-entropy function to further improve the detection accuracy of the algorithm:
其中τ=s0a0s1a1...snan...表示马尔可夫过程。Where τ=s 0 a 0 s 1 a 1 ...s n a n ... represents a Markov process.
由于Pr{a|s}=π(s,a),故此可得公式(18).because P r {a|s}=π(s,a), so formula (18) can be obtained.
权重更新函数如下所示:The weight update function looks like this:
其中fω:S×A→R是对的近似函数,当fω取极小值,Δω=0时,可推导出公式(21)where f ω :S×A→R is the pair Approximate function of , when f ω takes the minimum value, Δω=0, formula (21) can be deduced
步骤3.6:设置强化学习奖励:如果本次抓取成功,则赋值给智能体奖励为1,反之赋值给智能体奖励为0。Step 3.6: Set reinforcement learning rewards: If the grasping is successful, the reward assigned to the agent is 1, otherwise the reward assigned to the agent is 0.
步骤4:将预测的末端执行器抓取姿态反向投影至三维重建场景中,判断本次预测抓取质量:将网络输出的6自由度抓取姿态反向投影至抓取前建立的三维体素场景中,通过视角锥体投影截取预测姿态下深度图像,视角锥体投影见附图4。其中视角锥体是一个三维体,他的位置和摄像机的外参矩阵相关,视锥体的形状决定了模型如何从像素空间投影到平面。透视投影使用棱锥作为视锥体,摄像机位于棱锥的椎顶。该棱锥被前后两个平面截断,形成一个棱台,叫做View Frustum,其模型见附图5所示,其中只有位于Frustum内部的模型才是可见的。在这个裁剪空间中有两个平面比较特殊,分别称为近裁剪平面(nearclip plane)和园裁剪平面(far clip plane)。Step 4: Back-project the predicted grasping posture of the end effector into the 3D reconstruction scene to judge the quality of the predicted grasping: back-project the 6-DOF grasping posture output by the network to the 3D volume created before grasping In the pixel scene, the depth image under the predicted pose is intercepted through the perspective cone projection. See Figure 4 for the perspective cone projection. The viewing frustum is a three-dimensional body whose position is related to the camera's extrinsic parameter matrix. The shape of the viewing frustum determines how the model is projected from the pixel space to the plane. Perspective projection uses a pyramid as the viewing frustum, and the camera is positioned at the apex of the pyramid. The pyramid is truncated by the front and rear planes to form a truss called View Frustum. Its model is shown in Figure 5, and only the model inside the Frustum is visible. There are two special planes in this clipping space, which are called the near clip plane and the far clip plane.
通过比较渲染图像中夹持器两端深度信息,推理平行夹爪与物体间的距离位置,从而判断预测抓取姿态的抓取质量,形成机械臂6自由度抓取系统闭环反馈。By comparing the depth information at both ends of the gripper in the rendered image, the distance position between the parallel grippers and the object is inferred, so as to judge the grasping quality of the predicted grasping posture, and form a closed-loop feedback of the 6-DOF grasping system of the robotic arm.
步骤5:通过机器人正逆运动学完成机械臂抓取移动:Step 5: Complete the grasping movement of the robotic arm through the forward and reverse kinematics of the robot:
首先,当上述步骤4渲染出的深度图像中夹持器两端深度小于等于夹持器指尖深度,即夹持器预测姿态位于物体两侧,判断本次预测姿态可以抓取。然后,控制机械臂根据预测抓取姿态执行抓取计划:通过机器人逆运动学求解出当前状态下机械臂的6个关节角度数。然后将所述步骤3中强化学习网络输出预测的机械臂末端6维抓取姿态输入机器人正运动学,求得机械臂从当前姿态变换至预测姿态点的末端执行器移动轨迹。当机械臂末端执行器运动至预测抓取姿态后,机器人发出关闭夹具信号,尝试进行抓取动作。夹持器关闭后,末端执行器垂直上移15cm,从Intel Rrealsense D435相机获取执行器上移后深度图像,通过计算夹持器两端的深度判断实际抓取是否成功。当抓取成功时,强化学习模型奖励赋值为1;抓取失败时,强化学习网络奖励赋值为0。First, when the depth of both ends of the gripper in the depth image rendered in step 4 above is less than or equal to the depth of the fingertip of the gripper, that is, the predicted posture of the gripper is located on both sides of the object, it is judged that the predicted posture can be grasped this time. Then, the manipulator is controlled to execute the grasping plan according to the predicted grasping posture: the 6 joint angles of the manipulator in the current state are obtained through inverse kinematics of the robot. Then input the 6-dimensional grasping posture of the end of the mechanical arm predicted by the reinforcement learning network in step 3 into the forward kinematics of the robot, and obtain the movement trajectory of the end effector of the mechanical arm from the current posture to the predicted posture point. When the end effector of the robotic arm moves to the predicted grasping posture, the robot sends a signal to close the clamp and tries to grasp. After the gripper is closed, the end effector moves up 15cm vertically, and the depth image after the actuator moves up is obtained from the Intel Rrealsense D435 camera, and the actual grasping is judged by calculating the depth at both ends of the gripper. When the grasping is successful, the reward of the reinforcement learning model is assigned a value of 1; when the grasping fails, the reward of the reinforcement learning network is assigned a value of 0.
步骤6:进行强化学习模型训练,使得机械臂完成抓取动作:Step 6: Carry out reinforcement learning model training so that the robotic arm can complete the grasping action:
在CoppeliaSim Edu仿真环境中不断重复上述步骤2至步骤5,通过强化学习奖励缩小强化学习模型中的损失函数,更新模型权重参数。最后,将训练好的权重参数导入到真实机械臂UR5中,进行实验调试,重复步骤1至5,完成机械臂六自由度闭环抓取任务。In the CoppeliaSim Edu simulation environment, the above steps 2 to 5 are repeated continuously, and the loss function in the reinforcement learning model is reduced through reinforcement learning rewards, and the model weight parameters are updated. Finally, import the trained weight parameters into the real robotic arm UR5 for experimental debugging, repeat steps 1 to 5, and complete the six-degree-of-freedom closed-loop grasping task of the robotic arm.
综上所述,本发明的优点在于,通过升高二维神经网络维度直接预测输出机械臂6维抓取姿态。同时本发明通过TSDF函数对复杂的抓取环境进行三维重建,减少物体间遮挡、堆叠造成的环境噪声,降低了单一视觉传感器干扰导致的深度误差。同时,针对该方法设计了强化学习网络,克服了通过传统物理推导机械臂抓取姿态计算繁琐,时间成本高的缺点,解决了机械臂6-DOF抓取姿态无法应用于数据集中未记录的目标对象问题。不仅保证了机械臂模型的较高抓取成功率,还得益于强化学习的泛化性,即该方法可以应用于新的抓取对象,解决了传统方法的耗时计算以及降低了输入部分点云模型的不稳定性。此发明实现了对抓取对象的实时检测并进行6-DOF抓取的功能。In summary, the present invention has the advantage of directly predicting and outputting the 6-dimensional grasping posture of the manipulator by increasing the dimension of the two-dimensional neural network. At the same time, the present invention performs three-dimensional reconstruction on the complex grasping environment through the TSDF function, reduces the environmental noise caused by occlusion and stacking between objects, and reduces the depth error caused by the interference of a single visual sensor. At the same time, a reinforcement learning network is designed for this method, which overcomes the shortcomings of cumbersome calculation and high time cost of the traditional physical derivation of the grasping attitude of the manipulator, and solves the problem that the 6-DOF grasping attitude of the manipulator cannot be applied to unrecorded targets in the data set. object problem. It not only ensures a high grasping success rate of the manipulator model, but also benefits from the generalization of reinforcement learning, that is, the method can be applied to new grasping objects, which solves the time-consuming calculation of traditional methods and reduces the input part. Instability of point cloud models. The invention realizes the real-time detection of the captured object and the function of 6-DOF grasping.
附图说明Description of drawings
图1是本发明中网络模型的结构图;Fig. 1 is the structural diagram of network model among the present invention;
图2是本发明定义的半球状三维抓取姿态角度图;Fig. 2 is a hemispherical three-dimensional grasping attitude angle diagram defined by the present invention;
图3是本发明系统抓取流程图;Fig. 3 is the grab flow chart of the system of the present invention;
图4是视角锥体投影图;Fig. 4 is the perspective cone projection diagram;
图5是View Frustum模型图;Figure 5 is a view of the View Frustum model;
图6是实验抓取示例图;Figure 6 is an example diagram of experimental capture;
具体实施方式Detailed ways
下面结合附图进一步说明本发明。Further illustrate the present invention below in conjunction with accompanying drawing.
本发明的基于TSDF三维重建的机械臂六自由度视觉闭环抓取方法,实物抓取示例图见附图6,具体过程如下:The 6-DOF visual closed-loop grasping method of the robotic arm based on the TSDF three-dimensional reconstruction of the present invention, an example diagram of the physical grasping is shown in Figure 6, and the specific process is as follows:
步骤1:张正友相机标定法和ArUco Markers标定法标定机械臂底座坐标系和相机坐标系:Step 1: Zhang Zhengyou camera calibration method and ArUco Markers calibration method to calibrate the coordinate system of the manipulator base and the camera coordinate system:
首先将Intel D415深度相机垂直固定于机械臂末端,使得相机与末端执行器夹爪之间呈固定姿态变换,使得双目相机能够采集抓取操作台上物体的图像信息。然后,张正友相机标定法和ArUco Markers标定法标定相机内参和外参,构建机械臂底座为原点的三维坐标系。相机内参和外参标定公式如下所示:First, the Intel D415 depth camera is vertically fixed at the end of the robotic arm, so that the camera and the end effector jaws are in a fixed attitude transformation, so that the binocular camera can collect image information of objects on the operating table. Then, Zhang Zhengyou camera calibration method and ArUco Markers calibration method were used to calibrate the internal and external parameters of the camera, and construct a three-dimensional coordinate system with the base of the manipulator as the origin. The calibration formulas of camera intrinsic and extrinsic parameters are as follows:
其中,xw,yw,zw为世界坐标系下三维坐标;xc,yc,zc为图像坐标系下坐标;u,v为像素坐标系;zm为假设的相机坐标系中的一点;p为图像坐标系中的成像点;f为成像距离;dx,dy为像素点在图像坐标系下坐标;R,T为旋转矩阵和平移矩阵。公式(1)为世界坐标系到相机坐标系公式。公式(2)为相机坐标系到理想图像坐标系公式。公式(3)为相机坐标系到像素坐标系矩阵,即内参矩阵。Among them, x w , y w , z w are the three-dimensional coordinates in the world coordinate system; x c , y c , z c are the coordinates in the image coordinate system; u, v are the pixel coordinate system; z m is the hypothetical camera coordinate system p is the imaging point in the image coordinate system; f is the imaging distance; d x , d y are the coordinates of the pixel point in the image coordinate system; R, T are the rotation matrix and translation matrix. Formula (1) is the formula from the world coordinate system to the camera coordinate system. Formula (2) is the formula from the camera coordinate system to the ideal image coordinate system. Formula (3) is the matrix from the camera coordinate system to the pixel coordinate system, that is, the internal reference matrix.
步骤2:利用TSDF函数对获取的图像信息进行三维场景重建,以减少物体与物体之间的环境噪声点:Step 2: Use the TSDF function to perform 3D scene reconstruction on the acquired image information to reduce the environmental noise points between objects:
步骤2.1:本发明在实际实验中采用单一传感器获取视觉信息,通过OpenCV获取双目相机对抓取操作台捕获的信息(彩色信息和深度信息)。Step 2.1: The present invention uses a single sensor to obtain visual information in actual experiments, and obtains the information (color information and depth information) captured by the binocular camera on the grasping console through OpenCV.
步骤2.2:由于光线,物体之间相互遮挡等环境因素导致传感器获取的深度图像中存在较大噪声点,从而影响网络模型对图像的抓取预测。首先本发明在每次抓取前对抓取操作台进行球型点采样,通过多次采样局部图像降低单一图像中深度误差。其次,利用TSDF函数将采样点姿态和采样图像信息融合成三维重建体素网格。对于单独一个体素而言,不仅包含直观的x,y,z坐标还用另外两个值(SDF和CAM)来表示体素到最表面的距离。其中SDF为符号距离函数用于深度图的融合,它用0表示表面,正值表示模型外部点,负值表示模型内部点,数值正比于点到最近表面的距离。表面容易提取为函数的零交叉点。如果第i帧深度图像D(u,v)处深度值D(u,v)不为0,则比较D(u,v)与体素相机坐标V(x,y,z)中的z相比较,如果D(u,v)大于z,说明此体素距离相机更近,用于重建场景外表面;如果D(u,v)小于z,说明此体素距离相机更远,为重建场景内表面。即体素点位于表面之外(更靠近相机一侧),则SDF值为正值,体素在表面之内,则SDF值为负值。Step 2.2: Due to environmental factors such as light and mutual occlusion between objects, there are large noise points in the depth image acquired by the sensor, which affects the image capture prediction of the network model. Firstly, the present invention samples the spherical points of the grabbing operation table before each grabbing, and reduces the depth error in a single image by sampling local images multiple times. Second, the TSDF function is used to fuse the pose of the sampling point and the information of the sampled image into a 3D reconstruction voxel grid. For a single voxel, it not only contains the intuitive x, y, z coordinates but also uses two other values (SDF and CAM) to represent the distance from the voxel to the most surface. Among them, SDF is a signed distance function for the fusion of depth maps. It uses 0 to represent the surface, positive values represent points outside the model, and negative values represent points inside the model. The value is proportional to the distance from the point to the nearest surface. The surface is easily extracted as the zero crossings of the function. If the depth value D(u,v) at the depth image D(u,v) of the i-th frame is not 0, compare D(u,v) with the z phase in the voxel camera coordinates V(x,y,z) In comparison, if D(u,v) is greater than z, it means that this voxel is closer to the camera and is used to reconstruct the outer surface of the scene; if D(u, v) is smaller than z, it means that this voxel is farther away from the camera, which is used to reconstruct the scene The inner surface. That is, if the voxel point is outside the surface (closer to the camera side), the SDF value is positive, and if the voxel is inside the surface, the SDF value is negative.
步骤2.3:定义上述体素栅格所在的坐标系为世界坐标系,体素坐标可表示为相机的位姿为/>相机的内参矩阵为K。Step 2.3: Define the coordinate system where the above voxel grid is located as the world coordinate system, and the voxel coordinates can be expressed as The pose of the camera is /> The internal parameter matrix of the camera is K.
所以根据ICP配准得到的变换矩阵将世界坐标系下的体素栅格投影到相机坐标系,再根据相机内参矩阵K转换到图像坐标系Iij,其中i∈(0,1,...,n),j∈(0,1,...,m)。Therefore, according to the transformation matrix obtained by ICP registration, the voxel grid in the world coordinate system Projected to the camera coordinate system, and then converted to the image coordinate system I ij according to the camera internal reference matrix K, where i∈(0,1,...,n),j∈(0,1,...,m).
将体素栅格投影至图像坐标系后,首先需要计算每个体素初始化SDF值,如公式(18)所示After projecting the voxel grid to the image coordinate system, it is first necessary to calculate the initial SDF value of each voxel, as shown in formula (18)
式中表示第j个体素在世界坐标系下的位置信息,/>表示第i个相机位姿在世界坐标系下的位置信息,dep(Iij)表示第j个体素在第i个相机深度图像坐标系下的深度值。In the formula Indicates the position information of the jth voxel in the world coordinate system, /> Indicates the position information of the i-th camera pose in the world coordinate system, and dep(I ij ) indicates the depth value of the j-th voxel in the i-th camera depth image coordinate system.
紧接着根据公式(19)截断每个体素的sdf值。Then truncate the sdf value of each voxel according to formula (19).
式中trunc表示人为设定的截断距离,可理解为相机深度信息的误差值,如果误差越大,则该值设置大一些,否则可能造成深度相机获取的很多信息丢掉,本项目trunc设定为1。In the formula, trunc represents the artificially set truncation distance, which can be understood as the error value of the camera’s depth information. If the error is larger, the value should be set larger. Otherwise, a lot of information acquired by the depth camera may be lost. The trunc of this project is set to 1.
将每个体素的sdf值截断后,根据公式(20)重新计算每个体素的tsdf值:After truncating the sdf value of each voxel, recalculate the tsdf value of each voxel according to formula (20):
其中ωj(x,y,z)为当前帧全局体素栅格中体素的权重,ωj-1(x,y,z)为上一帧体素栅格中体素的权重,tsdfj为当前帧全局数据立方体中体素到物体表面的距离,tsdfj-1为上一帧全局数据立方体中体素到物体表面的距离,V.z表示体素在相机坐标系下的z轴坐标,D(u,v)表示当前帧深度图像(u,v)处的深度值。Where ω j (x, y, z) is the weight of the voxel in the global voxel grid of the current frame, ω j-1 (x, y, z) is the weight of the voxel in the previous frame of the voxel grid, tsdf j is the distance from the voxel in the global data cube in the current frame to the object surface, tsdf j-1 is the distance from the voxel in the global data cube in the previous frame to the object surface, V .z represents the z-axis of the voxel in the camera coordinate system Coordinates, D(u,v) represents the depth value at the depth image (u,v) of the current frame.
公式(9)计算每个体素的权重值。Equation (9) calculates the weight value of each voxel.
wj=wj-1+wj (9)w j =w j-1 +w j (9)
从上至下渲染出抓取操作台的环境信息并以此作为网络模型的输入。The environment information of the grabbing console is rendered from top to bottom and used as the input of the network model.
步骤3:建立强化学习网络模型:Step 3: Build a reinforcement learning network model:
步骤3.1:设计机械臂抓取策略,定义强化学习的动作空间:通常将目标抓取分为两个抓取位置区域和三维旋转量,通过两个网络模型分别预测抓取区域和三维旋转量。而本发明设计使用一个二维卷积网络通过升高网络维度直接回归6-DoF抓取姿态。本发明定于单个物体的抓取动作空间为16*28*28个动作,其中16为末端执行器绕z坐标轴旋转动作(22.5°),28*28为末端执行在x,y坐标轴上旋转动作数量。夹持器的定义抓取动作见附图2。Step 3.1: Design the grasping strategy of the manipulator and define the action space for reinforcement learning: usually, the target grasping is divided into two grasping position areas and three-dimensional rotation, and the two network models are used to predict the grasping area and three-dimensional rotation respectively. However, the present invention is designed to use a two-dimensional convolutional network to directly return to the 6-DoF grasping posture by increasing the network dimension. In the present invention, the grasping action space of a single object is 16*28*28 actions, of which 16 are the rotation actions of the end effector around the z coordinate axis (22.5°), and 28*28 are the terminal execution on the x and y coordinate axes Number of rotation actions. See attached drawing 2 for the gripping action of the gripper definition.
步骤3.2:将数组中每一元素转换为机械臂末端分别绕x,y,z三个坐标轴旋转的角度,具体转换公式如下所示:Step 3.2: Convert each element in the array to the angle at which the end of the robot arm rotates around the three coordinate axes of x, y, and z respectively. The specific conversion formula is as follows:
rz=(best_pix_ind[2]*θ) (12)r z =(best_pix_ind[2]*θ) (12)
其中ax表示为机械臂末端绕x轴旋转角度,即为末端执行器的侧倾角;by表示为机械臂末端绕y轴旋转角度,即为末端执行器的俯仰角;rz表示为机械臂末端绕z轴旋转角度,即为末端执行器的偏航角。where a x represents the rotation angle of the end of the mechanical arm around the x-axis, which is the roll angle of the end effector; b y represents the rotation angle of the end of the mechanical arm around the y-axis, which is the pitch angle of the end effector; r z represents the mechanical The rotation angle of the end of the arm around the z-axis is the yaw angle of the end effector.
实际抓取示例中,带入偏差值θ=180/16,计算预测抓取姿态。In the actual crawling example, bring in the deviation value θ = 180/16, calculate the predicted grasp pose.
步骤3.3:设计特征提取网络模型:模型采用Resnet全卷积特征提取网络架构,学习从RGB-D部分观察中提取非结构化特征。多层特征提取卷积层将彩色图像和深度图像的特征提取后,通过Concat函数将彩色特征和深度特征进行通道连接,采用2个额外的BatchNormalization(BN),Nonlinear activation functions(ReLU),1×1ConvolutionalLayers,以及上采样将输出热力图分辨率与输入图像保持一致。选择张量图中最大矢量值坐标对应动作空间中相应的姿态作为预测抓取姿态输出。Step 3.3: Design the feature extraction network model: The model uses the Resnet fully convolutional feature extraction network architecture to learn to extract unstructured features from RGB-D partial observations. After the multi-layer feature extraction convolutional layer extracts the features of the color image and the depth image, the color feature and the depth feature are channel-connected through the Concat function, and 2 additional BatchNormalization (BN), Nonlinear activation functions (ReLU), 1× 1ConvolutionalLayers, and upsampling will keep the resolution of the output heatmap consistent with the input image. The corresponding pose in the action space corresponding to the coordinate of the largest vector value in the tensor graph is selected as the predicted grasping pose output.
步骤3.4:通过以下公式对网络进行前向推理:Step 3.4: Perform forward inference on the network by the following formula:
其中公式(13)表示在状态s,动作a下的期望回报,其中gt表示t时刻采取的抓取动作,st表示t时刻的状态,rt表示t时刻的回报;公式(14)表示网络总的回报函数;公式(15)为状态分布函数;公式(16)表示状态-动作函数。Among them, formula (13) represents the expected return under state s and action a, where g t represents the grasping action taken at time t, s t represents the state at time t, and r t represents the reward at time t; formula (14) expresses The total reward function of the network; Formula (15) is the state distribution function; Formula (16) represents the state-action function.
步骤3.5设计强化学习网络损失函数,采用计算交叉熵函数,从而进一步提升算法的检测精度:Step 3.5 designs the reinforcement learning network loss function, and uses the calculation cross-entropy function to further improve the detection accuracy of the algorithm:
其中τ=s0a0s1a1...snan...表示马尔可夫过程。Where τ=s 0 a 0 s 1 a 1 ...s n a n ... represents a Markov process.
由于Pr{a|s}=π(s,a),故此可得公式(18).because P r {a|s}=π(s,a), so formula (18) can be obtained.
权重更新函数如下所示:The weight update function looks like this:
其中fω:S×A→R是对的近似函数,当fω取极小值,Δω=0时,可推导出公式(21)where f ω :S×A→R is the pair Approximate function of , when f ω takes the minimum value, Δω=0, formula (21) can be deduced
步骤3.6:设置强化学习奖励:如果本次抓取成功,则赋值给智能体奖励为1,反之赋值给智能体奖励为0。Step 3.6: Set reinforcement learning rewards: If the grasping is successful, the reward assigned to the agent is 1, otherwise the reward assigned to the agent is 0.
步骤4:将预测的末端执行器抓取姿态反向投影至三维重建场景中,判断本次预测抓取质量:将网络输出的6自由度抓取姿态反向投影至抓取前建立的三维体素场景中,通过视角锥体投影截取预测姿态下深度图像,视角锥体投影见附图4。其中视角锥体是一个三维体,他的位置和摄像机的外参矩阵相关,视锥体的形状决定了模型如何从像素空间投影到平面。透视投影使用棱锥作为视锥体,摄像机位于棱锥的椎顶。该棱锥被前后两个平面截断,形成一个棱台,叫做View Frustum,其模型见附图5所示,其中只有位于Frustum内部的模型才是可见的。在这个裁剪空间中有两个平面比较特殊,分别称为近裁剪平面(nearclip plane)和园裁剪平面(far clip plane)。Step 4: Back-project the predicted grasping posture of the end effector into the 3D reconstruction scene to judge the quality of the predicted grasping: back-project the 6-DOF grasping posture output by the network to the 3D volume created before grasping In the pixel scene, the depth image under the predicted pose is intercepted through the perspective cone projection. See Figure 4 for the perspective cone projection. The viewing frustum is a three-dimensional body whose position is related to the camera's extrinsic parameter matrix. The shape of the viewing frustum determines how the model is projected from the pixel space to the plane. Perspective projection uses a pyramid as the viewing frustum, and the camera is positioned at the apex of the pyramid. The pyramid is truncated by the front and rear planes to form a truss called View Frustum. Its model is shown in Figure 5, and only the model inside the Frustum is visible. There are two special planes in this clipping space, which are called the near clip plane and the far clip plane.
通过比较渲染图像中夹持器两端深度信息,推理平行夹爪与物体间的距离位置,从而判断预测抓取姿态的抓取质量,形成机械臂6自由度抓取系统闭环反馈。By comparing the depth information at both ends of the gripper in the rendered image, the distance position between the parallel grippers and the object is inferred, so as to judge the grasping quality of the predicted grasping posture, and form a closed-loop feedback of the 6-DOF grasping system of the robotic arm.
步骤5:通过机器人正逆运动学完成机械臂抓取移动:Step 5: Complete the grasping movement of the robotic arm through the forward and reverse kinematics of the robot:
首先,当上述步骤4渲染出的深度图像中夹持器两端深度小于等于夹持器指尖深度,即夹持器预测姿态位于物体两侧,判断本次预测姿态可以抓取。然后,控制机械臂根据预测抓取姿态执行抓取计划:通过机器人逆运动学求解出当前状态下机械臂的6个关节角度数。然后将所述步骤3中强化学习网络输出预测的机械臂末端6维抓取姿态输入机器人正运动学,求得机械臂从当前姿态变换至预测姿态点的末端执行器移动轨迹。当机械臂末端执行器运动至预测抓取姿态后,机器人发出关闭夹具信号,尝试进行抓取动作。夹持器关闭后,末端执行器垂直上移15cm,从Intel Rrealsense D435相机获取执行器上移后深度图像,通过计算夹持器两端的深度判断实际抓取是否成功。当抓取成功时,强化学习模型奖励赋值为1;抓取失败时,强化学习网络奖励赋值为0。First, when the depth of both ends of the gripper in the depth image rendered in step 4 above is less than or equal to the depth of the fingertip of the gripper, that is, the predicted posture of the gripper is located on both sides of the object, it is judged that the predicted posture can be grasped this time. Then, the manipulator is controlled to execute the grasping plan according to the predicted grasping posture: the 6 joint angles of the manipulator in the current state are obtained through inverse kinematics of the robot. Then input the 6-dimensional grasping posture of the end of the mechanical arm predicted by the reinforcement learning network in step 3 into the forward kinematics of the robot, and obtain the movement trajectory of the end effector of the mechanical arm from the current posture to the predicted posture point. When the end effector of the robotic arm moves to the predicted grasping posture, the robot sends a signal to close the clamp and tries to grasp. After the gripper is closed, the end effector moves up 15cm vertically, and the depth image after the actuator moves up is obtained from the Intel Rrealsense D435 camera, and the actual grasping is judged by calculating the depth at both ends of the gripper. When the grasping is successful, the reward of the reinforcement learning model is assigned a value of 1; when the grasping fails, the reward of the reinforcement learning network is assigned a value of 0.
步骤6:进行强化学习模型训练,使得机械臂完成抓取动作:Step 6: Carry out reinforcement learning model training so that the robotic arm can complete the grasping action:
将训练好的模型权重参数导入到真实机械臂UR5中,进行实验调试,重复步骤1至5,完成机械臂六自由度闭环抓取任务。经过200轮抓取测试,本发明基于TSDF的机械臂六自由度抓取方法对于非结构化物体的抓取成功率为85.3%(10个目标物体随机堆叠情况)。当目标物体增加到15个随机堆叠情况时,本发明的抓取成功率依然有81.5%。综上所述,本发明设计了强化学习网络,克服了通过传统物理推导机械臂抓取姿态计算繁琐,时间成本高的缺点,解决了机械臂6-DOF抓取姿态无法应用于数据集中未记录的目标对象问题。不仅保证了机械臂模型的较高抓取成功率,还得益于强化学习的泛化性,本发明方法可以应用于未记录的抓取对象,解决了传统方法的耗时计算以及降低了输入部分点云模型的不稳定性。Import the trained model weight parameters into the real robotic arm UR5, conduct experimental debugging, repeat steps 1 to 5, and complete the six-degree-of-freedom closed-loop grasping task of the robotic arm. After 200 rounds of grasping tests, the TSDF-based six-degree-of-freedom grasping method of the present invention has a grasping success rate of 85.3% for unstructured objects (10 target objects are randomly stacked). When the number of target objects increases to 15 random stacking situations, the grabbing success rate of the present invention still has 81.5%. To sum up, the present invention designs a reinforcement learning network, which overcomes the cumbersome calculation of the grasping attitude of the manipulator through traditional physical derivation and the shortcomings of high time cost, and solves the problem that the 6-DOF grasping attitude of the manipulator cannot be applied to the unrecorded data set. target audience problem. It not only ensures a high grasping success rate of the manipulator model, but also benefits from the generalization of reinforcement learning. The method of the present invention can be applied to unrecorded grasping objects, which solves the time-consuming calculation of the traditional method and reduces the input Instability of some point cloud models.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210551360.5A CN114851201B (en) | 2022-05-18 | 2022-05-18 | A six-degree-of-freedom visual closed-loop grasping method for robotic arm based on TSDF 3D reconstruction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210551360.5A CN114851201B (en) | 2022-05-18 | 2022-05-18 | A six-degree-of-freedom visual closed-loop grasping method for robotic arm based on TSDF 3D reconstruction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114851201A CN114851201A (en) | 2022-08-05 |
CN114851201B true CN114851201B (en) | 2023-09-05 |
Family
ID=82638419
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210551360.5A Active CN114851201B (en) | 2022-05-18 | 2022-05-18 | A six-degree-of-freedom visual closed-loop grasping method for robotic arm based on TSDF 3D reconstruction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114851201B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115890744B (en) * | 2022-12-15 | 2024-07-26 | 武汉理工大学 | TD 3-based mechanical arm 6-DOF object manipulation training method and system |
CN115984388B (en) * | 2023-02-28 | 2023-06-06 | 江西省智能产业技术创新研究院 | Spatial positioning precision evaluation method, system, storage medium and computer |
CN116330284A (en) * | 2023-03-24 | 2023-06-27 | 杭州电子科技大学 | A method and system for autonomous control of a robotic arm suitable for grasping occluded objects |
CN116524217B (en) * | 2023-07-03 | 2023-08-25 | 北京七维视觉传媒科技有限公司 | Human body posture image matching method and device, electronic equipment and storage medium |
CN117549307B (en) * | 2023-12-15 | 2024-04-16 | 安徽大学 | Robot vision grabbing method and system in unstructured environment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9205562B1 (en) * | 2014-08-29 | 2015-12-08 | Google Inc. | Integration of depth points into a height map |
CN106548466A (en) * | 2015-09-16 | 2017-03-29 | 富士通株式会社 | The method and apparatus of three-dimensional reconstruction object |
CN110310362A (en) * | 2019-06-24 | 2019-10-08 | 中国科学院自动化研究所 | Method and system for 3D reconstruction of high dynamic scene based on depth map and IMU |
CN112476434A (en) * | 2020-11-24 | 2021-03-12 | 新拓三维技术(深圳)有限公司 | Visual 3D pick-and-place method and system based on cooperative robot |
CN112801988A (en) * | 2021-02-02 | 2021-05-14 | 上海交通大学 | Object grabbing pose detection method based on RGBD and deep neural network |
CN113192128A (en) * | 2021-05-21 | 2021-07-30 | 华中科技大学 | Mechanical arm grabbing planning method and system combined with self-supervision learning |
CN113752255A (en) * | 2021-08-24 | 2021-12-07 | 浙江工业大学 | A real-time grasping method of robotic arm with six degrees of freedom based on deep reinforcement learning |
-
2022
- 2022-05-18 CN CN202210551360.5A patent/CN114851201B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9205562B1 (en) * | 2014-08-29 | 2015-12-08 | Google Inc. | Integration of depth points into a height map |
CN106548466A (en) * | 2015-09-16 | 2017-03-29 | 富士通株式会社 | The method and apparatus of three-dimensional reconstruction object |
CN110310362A (en) * | 2019-06-24 | 2019-10-08 | 中国科学院自动化研究所 | Method and system for 3D reconstruction of high dynamic scene based on depth map and IMU |
CN112476434A (en) * | 2020-11-24 | 2021-03-12 | 新拓三维技术(深圳)有限公司 | Visual 3D pick-and-place method and system based on cooperative robot |
CN112801988A (en) * | 2021-02-02 | 2021-05-14 | 上海交通大学 | Object grabbing pose detection method based on RGBD and deep neural network |
CN113192128A (en) * | 2021-05-21 | 2021-07-30 | 华中科技大学 | Mechanical arm grabbing planning method and system combined with self-supervision learning |
CN113752255A (en) * | 2021-08-24 | 2021-12-07 | 浙江工业大学 | A real-time grasping method of robotic arm with six degrees of freedom based on deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN114851201A (en) | 2022-08-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114851201B (en) | A six-degree-of-freedom visual closed-loop grasping method for robotic arm based on TSDF 3D reconstruction | |
CN110480634B (en) | An arm-guided motion control method for robotic arm motion control | |
CN114912287B (en) | Robot autonomous grabbing simulation system and method based on target 6D pose estimation | |
CN111085997A (en) | Capturing training method and system based on point cloud acquisition and processing | |
CN110480637B (en) | An Image Recognition and Grabbing Method of Robot Arm Parts Based on Kinect Sensor | |
CN110211180A (en) | A kind of autonomous grasping means of mechanical arm based on deep learning | |
CN110298886B (en) | Dexterous hand grabbing planning method based on four-stage convolutional neural network | |
JP2019508273A (en) | Deep-layer machine learning method and apparatus for grasping a robot | |
CN110796700B (en) | Multi-object grabbing area positioning method based on convolutional neural network | |
CN113752255B (en) | Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning | |
JP7051751B2 (en) | Learning device, learning method, learning model, detection device and gripping system | |
CN115578460A (en) | Robot Grasping Method and System Based on Multimodal Feature Extraction and Dense Prediction | |
Kushwaha et al. | Generating quality grasp rectangle using Pix2Pix GAN for intelligent robot grasping | |
CN114700949B (en) | Mechanical arm smart grabbing planning method based on voxel grabbing network | |
CN115194774A (en) | Binocular vision-based control method for double-mechanical-arm gripping system | |
Wang et al. | Six-dimensional target pose estimation for robot autonomous manipulation: Methodology and verification | |
Du et al. | A multi-object grasping detection based on the improvement of YOLOv3 algorithm | |
Zhu et al. | A robotic semantic grasping method for pick-and-place tasks | |
Ren et al. | Autonomous manipulation learning for similar deformable objects via only one demonstration | |
Wang et al. | Robotic grasp pose detection method based on multiscale features | |
CN116616812A (en) | NeRF positioning-based ultrasonic autonomous navigation method | |
Liang et al. | Visual reconstruction and localization-based robust robotic 6-DoF grasping in the wild | |
Wanyan et al. | Scene prediction and manipulator grasp pose estimation based on YOLO-GraspNet | |
CN117103272B (en) | Horizontal pushing and multi-degree-of-freedom grasping control methods for multi-form obstacle sorting manipulators | |
DU et al. | ROBOT MANIPULATOR USING A VISION-BASED HUMAN--MANIPULATOR INTERFACE. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |