CN114851201B

CN114851201B - A six-degree-of-freedom visual closed-loop grasping method for robotic arm based on TSDF 3D reconstruction

Info

Publication number: CN114851201B
Application number: CN202210551360.5A
Authority: CN
Inventors: 欧林林; 徐靖; 禹鑫燚; 周利波; 魏岩
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2023-09-05
Anticipated expiration: 2042-05-18
Also published as: CN114851201A

Abstract

The invention relates to a six-degree-of-freedom visual closed-loop grasping method of a mechanical arm based on TSDF three-dimensional reconstruction. It includes the following steps: Step 1: Calibrate the coordinate system of the base of the manipulator and the coordinate system of the camera through Zhang Zhengyou’s camera calibration method and ArUco Markers calibration method; Step 2: Use the TSDF function to reconstruct the 3D scene from the acquired image information to reduce the distance between objects. environmental noise points between them; Step 3: Establish a reinforcement learning network model; Step 4: Back-project the predicted grasping posture of the end effector to the 3D reconstruction scene to judge the quality of the predicted grasping; Step 5: Through the robot Inverse kinematics completes the grasping movement of the robotic arm; Step 6: Carry out reinforcement learning model training to enable the robotic arm to complete the grasping action; the present invention overcomes the shortcomings of the prior art and proposes an easy-to-implement, high-applicability, three-dimensional based on TSDF The rebuilt six-degree-of-freedom visual closed-loop grasping system of the robotic arm reduces the environmental noise caused by occlusion and stacking between objects, and reduces the depth error caused by the interference of a single visual sensor. In addition, the system can realize fast real-time target detection and complete grasping action while ensuring high precision.

Description

A six-degree-of-freedom visual closed-loop grasping method for robotic arm based on TSDF 3D reconstruction

技术领域technical field

本发明属于一种非结构环境下的机械臂六自由度抓取物体技术，具体机器人抓取控制领域。The invention belongs to a six-degree-of-freedom object grasping technology of a mechanical arm in a non-structural environment, specifically the field of robot grasping control.

背景技术Background technique

目标抓取一直是机器人学重要的问题，但始终没有令人满意地相应解决方案。得益于机器臂的多功能操纵系统，使其能够在三维空间内灵活控制具有高自由度的末端执行器，实现对于物体的灵巧抓取以及对环境变化的动态响应能力。最近，随着深度学习和强化学习的快速发展和相应体系的构建，为机械臂智能抓取方式提供了多方面的可行思路。Object grasping has always been an important problem in robotics, but there has been no satisfactory corresponding solution. Thanks to the multi-functional manipulation system of the robotic arm, it can flexibly control the end effector with a high degree of freedom in three-dimensional space, achieving dexterous grasping of objects and dynamic response to environmental changes. Recently, with the rapid development of deep learning and reinforcement learning and the construction of corresponding systems, various feasible ideas have been provided for intelligent grasping methods of manipulators.

尽管机械臂6自由度抓取控制具有较高的实用价值适用于更复杂的操作环境，但是目前多数数据驱动抓取算法仅在简单的桌面设置中进行自上而下的抓取(4Dof：x，y，z，yaw)，或利用物理分析来掌握合适的抓取姿态。然而由于受限制于三维移动抓取，算法对于应用场景的限制很大，机械臂末端执行器只能从垂直往下的方向接近物体，在某些情况下，它不能沿着这个方向抓取物体。例如，夹持器很难抓手很难抓住水平放置的盘子。此外这些基于物理分析的方法不仅需要计算大量的实验环境数据，并且需要计算估计精确的目标模型，从而产生的必然是大量的时间和计算成本，而对于非结构化目标物体的物理模型往往是不通用的，因此大多算法很难直接应用于新颖物体的抓取，系统的鲁棒性和泛化性较差。Although the 6-DOF grasping control of the manipulator has high practical value and is suitable for more complex operating environments, most current data-driven grasping algorithms only perform top-down grasping in simple desktop settings (4Dof: x , y, z, yaw), or use physical analysis to master the appropriate grasping posture. However, due to the limitation of three-dimensional mobile grasping, the algorithm has great limitations on the application scenarios. The end effector of the manipulator can only approach the object from the vertical downward direction. In some cases, it cannot grasp the object along this direction. . For example, the gripper is difficult to grip a plate that is placed horizontally. In addition, these methods based on physical analysis not only need to calculate a large amount of experimental environment data, but also need to calculate and estimate an accurate target model, which inevitably results in a lot of time and calculation costs, and the physical model of an unstructured target object is often insufficient. General, so most algorithms are difficult to be directly applied to the grasping of novel objects, and the robustness and generalization of the system are poor.

由此，机械臂6自由度(6-DOF)抓取想法被提出。虽然，Hongzhuo Liang提出的PointnetGPD采用采样-评估的两步法，通过评估大量的样本以确定可靠的抓取姿态。但是，这种方法无疑是相当耗时的。(Hongzhuo Liang et al.“Pointnetgpd:Detecting graspconfigurations from point sets”.In:2019International Conference on Roboticsand Automation(ICRA).IEEE.2019,pp.3629–3635)。Florence等人从现有的抓取姿态中进行姿态转移。但是该算法对于数据集中未记录的目标对象和物体几何形状时，其成功率就会相对较为低下，故此不能推广到新的应用场景。(Peter Florence,Lucas Manuelli,andRuss Tedrake.“Dense Object Nets:Learning Dense Visual Object Descriptors Byand For Robotic Manipulation”.In:Conference on Robot Learning(CoRL)(2018))。最近，从RGBD相机获取的部分视图中重建了点云场景，然后使用设计的Pointnet++网络模型提取物体特征后，直接完成6Dof抓取姿势回归并在重建的点云场景上进行抓取规划。(P.Ni,W.Zhang,X.Zhu,and Q.Cao,“Learning an end-to-end spatial graspgeneration and refinement algorithm from simulation,”Machine Vision andApplications,vol.32,no.1,pp.1–12,2021.)。Therefore, the idea of 6-degree-of-freedom (6-DOF) grasping of the manipulator is proposed. Although, the PointnetGPD proposed by Hongzhuo Liang uses a sampling-evaluation two-step method to determine a reliable grasping pose by evaluating a large number of samples. However, this method is undoubtedly quite time-consuming. (Hongzhuo Liang et al. "Pointnetgpd: Detecting grasp configurations from point sets". In: 2019 International Conference on Robotics and Automation (ICRA). IEEE. 2019, pp. 3629–3635). Florence et al. perform pose transfer from existing grasping poses. However, the success rate of this algorithm is relatively low for target objects and object geometries that are not recorded in the data set, so it cannot be extended to new application scenarios. (Peter Florence, Lucas Manuelli, and Russ Tedrake. "Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation". In: Conference on Robot Learning (CoRL) (2018)). Recently, the point cloud scene is reconstructed from the partial view obtained by the RGBD camera, and then the designed Pointnet++ network model is used to extract the object features, and the 6Dof grasp pose regression is directly completed and the grasp planning is performed on the reconstructed point cloud scene. (P.Ni, W.Zhang, X.Zhu, and Q.Cao, "Learning an end-to-end spatial grasp generation and refinement algorithm from simulation," Machine Vision and Applications, vol.32, no.1, pp.1 –12, 2021.).

发明内容Contents of the invention

本发明克服现有技术的缺点，提出一种易实现、适用性高的机械臂六自由度抓取方法。本发明设计了特征提取网络和Policy Gradient强化学习框架，可以实现对于训练数据集中未记录物体的定位和六自由度计划抓取。The invention overcomes the disadvantages of the prior art, and proposes an easy-to-implement, high-applicability grasping method of a mechanical arm with six degrees of freedom. The present invention designs a feature extraction network and a Policy Gradient reinforcement learning framework, which can realize the positioning of unrecorded objects in the training data set and the six-degree-of-freedom plan grasping.

本发明以图像序列作为输入，首先通过张正友相机标定法和ArUco Markers标定法构建机械臂坐标系和相机坐标系。然后利用Intel Rrealsense D435深度相机获取抓取操作台的彩色和深度信息。通过Truncated Signed Distance Function(TSDF)对获取的图像信息进行三维场景重建，以减少物体与物体之间的环境噪声点。从上至下截取重建场景的二维俯视图像，并作为本文设计抓取神经网络模型的输入，模型对输入图像进行编码和特征提取，输入当前环境的预测抓取区域和机械臂抓取姿态。网络模型结构图见附图1。该模型采用Renet全卷积特征提取网络架构，学习从RGB-D部分观察中提取非结构化特征。模型将彩色特征和深度特征进行通道连接，采用2个额外的Batch Normalization(BN)，Nonlinear activation functions(ReLU)，以及1×1Convolutional Layers交织。然后使用平均池化(Average Pooling)来聚合同一像素单元的特征嵌入，并进行线性上采样，使得卷积特征在不同位置和方向上共享，输出图片和输入分辨率一致，选取最大抓取概率热点图块的所在坐标，将其对应至实验前定义的半球状三维抓取姿态点(见附图2)，并将对应的6维姿态作为预测的6自由度抓取姿态。其次将模型输入的预测抓取姿态反向投影至重建的三维场景中，利用TSDF函数从三维场景中截取当前姿态下深度图像，并根据夹持器两端的深度信息判断本次抓取质量。最后，当渲染图像中目标坐标点深度小于等于夹持器两端指尖深度(15cm)时，将预测姿态输入机器人正运动学得到相应机械臂运动轨迹，完成机械臂本次抓取动作。如果本次抓取成功，则赋值给智能体奖励为1，反之赋值给智能体奖励为0。系统抓取流程图见附图3。The present invention takes the image sequence as input, and first constructs the mechanical arm coordinate system and the camera coordinate system through the Zhang Zhengyou camera calibration method and the ArUco Markers calibration method. Then use the Intel Rrealsense D435 depth camera to obtain the color and depth information of the grabbing platform. Through the Truncated Signed Distance Function (TSDF), the 3D scene reconstruction is performed on the acquired image information to reduce the environmental noise points between objects. The two-dimensional top-view image of the reconstructed scene is intercepted from top to bottom, and used as the input of the grasping neural network model designed in this paper. The model encodes the input image and extracts features, and inputs the predicted grasping area of the current environment and the grasping posture of the robotic arm. The structure diagram of the network model is shown in Figure 1. The model uses a Renet fully convolutional feature extraction network architecture to learn to extract unstructured features from RGB-D partial observations. The model connects color features and depth features to channels, using 2 additional Batch Normalization (BN), Nonlinear activation functions (ReLU), and 1×1 Convolutional Layers interleaved. Then use average pooling (Average Pooling) to aggregate the feature embedding of the same pixel unit, and perform linear upsampling, so that the convolution features are shared in different positions and directions, the output image is consistent with the input resolution, and the maximum capture probability hotspot is selected The coordinates of the block correspond to the hemispherical three-dimensional grasping attitude point defined before the experiment (see Figure 2), and the corresponding 6-dimensional attitude is used as the predicted 6-degree-of-freedom grasping attitude. Secondly, the predicted grasping posture input by the model is back-projected to the reconstructed 3D scene, and the depth image under the current posture is intercepted from the 3D scene by using the TSDF function, and the grasping quality is judged according to the depth information at both ends of the gripper. Finally, when the depth of the target coordinate point in the rendered image is less than or equal to the depth of the fingertips at both ends of the gripper (15cm), input the predicted pose into the forward kinematics of the robot to obtain the corresponding movement trajectory of the robotic arm, and complete the grasping action of the robotic arm. If this capture is successful, the reward assigned to the agent is 1, otherwise the reward assigned to the agent is 0. See Figure 3 for the flow chart of the system capture.

本发明为一种基于TSDF三维重建的机械臂六自由度视觉闭环抓取方法，具体步骤如下：The present invention is a six-degree-of-freedom visual closed-loop grasping method of a mechanical arm based on TSDF three-dimensional reconstruction, and the specific steps are as follows:

步骤1：张正友相机标定法和ArUco Markers标定法标定机械臂底座坐标系和相机坐标系：Step 1: Zhang Zhengyou camera calibration method and ArUco Markers calibration method to calibrate the coordinate system of the manipulator base and the camera coordinate system:

首先将Intel D415深度相机垂直固定于机械臂末端，使得相机与末端执行器夹爪之间呈固定姿态变换，使得双目相机能够采集抓取操作台上物体的图像信息。然后，张正友相机标定法和ArUco Markers标定法标定相机内参和外参，构建机械臂底座为原点的三维坐标系。相机内参和外参标定公式如下所示：First, the Intel D415 depth camera is vertically fixed at the end of the robotic arm, so that the camera and the end effector jaws are in a fixed attitude transformation, so that the binocular camera can collect image information of objects on the operating table. Then, Zhang Zhengyou camera calibration method and ArUco Markers calibration method were used to calibrate the internal and external parameters of the camera, and construct a three-dimensional coordinate system with the base of the manipulator as the origin. The calibration formulas of camera intrinsic and extrinsic parameters are as follows:

其中，x_w,y_w,z_w为世界坐标系下三维坐标；x_c,y_c,z_c为图像坐标系下坐标；u,v为像素坐标系；z_m为假设的相机坐标系中的一点；p为图像坐标系中的成像点；f为成像距离；d_x,d_y为像素点在图像坐标系下坐标；R,T为旋转矩阵和平移矩阵。公式(1)为世界坐标系到相机坐标系公式。公式(2)为相机坐标系到理想图像坐标系公式。公式(3)为相机坐标系到像素坐标系矩阵，即内参矩阵。Among them, x _w , y _w , z _w are the three-dimensional coordinates in the world coordinate system; x _c , y _c , z _c are the coordinates in the image coordinate system; u, v are the pixel coordinate system; z _m is the hypothetical camera coordinate system p is the imaging point in the image coordinate system; f is the imaging distance; d _x , d _y are the coordinates of the pixel point in the image coordinate system; R, T are the rotation matrix and translation matrix. Formula (1) is the formula from the world coordinate system to the camera coordinate system. Formula (2) is the formula from the camera coordinate system to the ideal image coordinate system. Formula (3) is the matrix from the camera coordinate system to the pixel coordinate system, that is, the internal reference matrix.

步骤2：利用TSDF函数对获取的图像信息进行三维场景重建，以减少物体与物体之间的环境噪声点：Step 2: Use the TSDF function to perform 3D scene reconstruction on the acquired image information to reduce the environmental noise points between objects:

步骤2.1：本发明在实际实验中采用单一传感器获取视觉信息，通过OpenCV获取双目相机对抓取操作台捕获的信息(彩色信息和深度信息)。Step 2.1: The present invention uses a single sensor to obtain visual information in actual experiments, and obtains the information (color information and depth information) captured by the binocular camera on the grasping console through OpenCV.

步骤2.2：由于光线，物体之间相互遮挡等环境因素导致传感器获取的深度图像中存在较大噪声点，从而影响网络模型对图像的抓取预测。首先本发明在每次抓取前对抓取操作台进行球型点采样，通过多次采样局部图像降低单一图像中深度误差。其次，利用TSDF函数将采样点姿态和采样图像信息融合成三维重建体素网格。对于单独一个体素而言，不仅包含直观的x,y,z坐标还用另外两个值(SDF和CAM)来表示体素到最表面的距离。其中SDF为符号距离函数用于深度图的融合，它用0表示表面，正值表示模型外部点，负值表示模型内部点，数值正比于点到最近表面的距离。表面容易提取为函数的零交叉点。Step 2.2: Due to environmental factors such as light and mutual occlusion between objects, there are large noise points in the depth image acquired by the sensor, which affects the image capture prediction of the network model. Firstly, the present invention samples the spherical points of the grabbing operation table before each grabbing, and reduces the depth error in a single image by sampling local images multiple times. Second, the TSDF function is used to fuse the pose of the sampling point and the information of the sampled image into a 3D reconstruction voxel grid. For a single voxel, it not only contains the intuitive x, y, z coordinates but also uses two other values (SDF and CAM) to represent the distance from the voxel to the most surface. Among them, SDF is a signed distance function for the fusion of depth maps. It uses 0 to represent the surface, positive values represent points outside the model, and negative values represent points inside the model. The value is proportional to the distance from the point to the nearest surface. The surface is easily extracted as the zero crossings of the function.

步骤2.3：定义上述体素栅格所在的坐标系为世界坐标系，体素坐标可表示为相机的位姿为/>相机的内参矩阵为K。Step 2.3: Define the coordinate system where the above voxel grid is located as the world coordinate system, and the voxel coordinates can be expressed as The pose of the camera is /> The internal parameter matrix of the camera is K.

所以根据ICP配准得到的变换矩阵将世界坐标系下的体素栅格投影到相机坐标系，再根据相机内参矩阵K转换到图像坐标系I_ij，其中i∈(0,1,...,n),j∈(0,1,...,m)。Therefore, according to the transformation matrix obtained by ICP registration, the voxel grid in the world coordinate system Projected to the camera coordinate system, and then converted to the image coordinate system I _ij according to the camera internal reference matrix K, where i∈(0,1,...,n),j∈(0,1,...,m).

将体素栅格投影至图像坐标系后，首先需要计算每个体素初始化SDF值，如公式(18)所示After projecting the voxel grid to the image coordinate system, it is first necessary to calculate the initial SDF value of each voxel, as shown in formula (18)

式中表示第j个体素在世界坐标系下的位置信息，/>表示第i个相机位姿在世界坐标系下的位置信息，dep(I_ij)表示第j个体素在第i个相机深度图像坐标系下的深度值。In the formula Indicates the position information of the jth voxel in the world coordinate system, /> Indicates the position information of the i-th camera pose in the world coordinate system, and dep(I _ij ) indicates the depth value of the j-th voxel in the i-th camera depth image coordinate system.

紧接着根据公式(19)截断每个体素的sdf值。Then truncate the sdf value of each voxel according to formula (19).

式中tsdf_i为体素截断距离值，trunc表示人为设定的截断距离，可理解为相机深度信息的误差值，如果误差越大，则该值设置大一些，否则可能造成深度相机获取的很多信息丢掉，本项目trunc设定为1。In the formula, tsdf _i is the voxel truncation distance value, and trunc represents the artificially set truncation distance, which can be understood as the error value of the camera depth information. If the error is larger, the value should be set larger, otherwise it may cause a lot of depth camera acquisition The information is lost, and the trunc of this item is set to 1.

将每个体素的sdf值截断后，根据公式(20)重新计算每个体素的tsdf值：After truncating the sdf value of each voxel, recalculate the tsdf value of each voxel according to formula (20):

其中ω_j(x,y,z)为当前帧全局体素栅格中体素的权重，ω_j-1(x,y,z)为上一帧体素栅格中体素的权重，tsdf_j为当前帧全局数据立方体中体素到物体表面的距离，tsdf_j-1为上一帧全局数据立方体中体素到物体表面的距离，V_.z表示体素在相机坐标系下的z轴坐标，D(u,v)表示当前帧深度图像(u,v)处的深度值。Where ω _j (x, y, z) is the weight of the voxel in the global voxel grid of the current frame, ω _j-1 (x, y, z) is the weight of the voxel in the previous frame of the voxel grid, tsdf _j is the distance from the voxel in the global data cube in the current frame to the object surface, tsdf _j-1 is the distance from the voxel in the global data cube in the previous frame to the object surface, V _.z represents the z-axis of the voxel in the camera coordinate system Coordinates, D(u,v) represents the depth value at the depth image (u,v) of the current frame.

公式(9)计算每个体素的权重值。Equation (9) calculates the weight value of each voxel.

w_j＝w_j-1+w_j (9)w _j =w _j-1 +w _j (9)

从上至下渲染出抓取操作台的环境信息并以此作为网络模型的输入。The environment information of the grabbing console is rendered from top to bottom and used as the input of the network model.

步骤3：建立强化学习网络模型：Step 3: Build a reinforcement learning network model:

步骤3.1：设计机械臂抓取策略，定义强化学习的动作空间：本发明设计使用一个二维卷积网络通过升高网络维度直接回归6-DoF抓取姿态。夹持器的定义抓取动作见附图2。Step 3.1: Design the grasping strategy of the robotic arm and define the action space for reinforcement learning: the present invention designs and uses a two-dimensional convolutional network to directly return the 6-DoF grasping posture by increasing the network dimension. See attached drawing 2 for the gripping action of the gripper definition.

步骤3.2：将数组中每一元素转换为机械臂末端分别绕x,y,z三个坐标轴旋转的角度，具体转换公式如下所示：Step 3.2: Convert each element in the array to the angle at which the end of the robot arm rotates around the three coordinate axes of x, y, and z respectively. The specific conversion formula is as follows:

r_z＝(best_pix_ind[2]*θ) (12)r _z ＝(best_pix_ind[2]*θ) (12)

其中a_x表示为机械臂末端绕x轴旋转角度，即为末端执行器的侧倾角；b_y表示为机械臂末端绕y轴旋转角度，即为末端执行器的俯仰角；r_z表示为机械臂末端绕z轴旋转角度，即为末端执行器的偏航角；为实验设定偏差值。where a _x represents the rotation angle of the end of the mechanical arm around the x-axis, which is the roll angle of the end effector; b _y represents the rotation angle of the end of the mechanical arm around the y-axis, which is the pitch angle of the end effector; r _z represents the mechanical The rotation angle of the end of the arm around the z-axis is the yaw angle of the end effector; Set the bias value for the experiment.

步骤3.3：设计特征提取网络模型：模型采用Resnet全卷积特征提取网络架构，学习从RGB-D部分观察中提取非结构化特征。多层特征提取卷积层将彩色图像和深度图像的特征提取后，通过Concat函数将彩色特征和深度特征进行通道连接，采用2个额外的BatchNormalization(BN)，Nonlinear activation functions(ReLU)，1×1ConvolutionalLayers，以及上采样将输出热力图分辨率与输入图像保持一致。选择张量图中最大矢量值坐标对应动作空间中相应的姿态作为预测抓取姿态输出。Step 3.3: Design the feature extraction network model: The model uses the Resnet fully convolutional feature extraction network architecture to learn to extract unstructured features from RGB-D partial observations. After the multi-layer feature extraction convolutional layer extracts the features of the color image and the depth image, the color feature and the depth feature are channel-connected through the Concat function, and 2 additional BatchNormalization (BN), Nonlinear activation functions (ReLU), 1× 1ConvolutionalLayers, and upsampling will keep the resolution of the output heatmap consistent with the input image. The corresponding pose in the action space corresponding to the coordinate of the largest vector value in the tensor graph is selected as the predicted grasping pose output.

步骤3.4：通过以下公式对网络进行前向推理：Step 3.4: Perform forward inference on the network by the following formula:

其中公式(13)表示在状态s，动作a下的期望回报，其中g_t表示t时刻采取的抓取动作，s_t表示t时刻的状态，r_t表示t时刻的回报；公式(14)表示网络总的回报函数；公式(15)为状态分布函数；公式(16)表示状态-动作函数。Among them, formula (13) represents the expected return under state s and action a, where g _t represents the grasping action taken at time t, s _t represents the state at time t, and r _t represents the reward at time t; formula (14) expresses The total reward function of the network; Formula (15) is the state distribution function; Formula (16) represents the state-action function.

步骤3.5设计强化学习网络损失函数，采用计算交叉熵函数，从而进一步提升算法的检测精度：Step 3.5 designs the reinforcement learning network loss function, and uses the calculation cross-entropy function to further improve the detection accuracy of the algorithm:

其中τ＝s₀a₀s₁a₁...s_na_n...表示马尔可夫过程。Where τ=s ₀ a ₀ s ₁ a ₁ ...s _n a _n ... represents a Markov process.

由于P_r{a|s}＝π(s,a)，故此可得公式(18).because P _r {a|s}=π(s,a), so formula (18) can be obtained.

权重更新函数如下所示：The weight update function looks like this:

其中f_ω:S×A→R是对的近似函数，当f_ω取极小值，Δω＝0时，可推导出公式(21)where f _ω :S×A→R is the pair Approximate function of , when f _ω takes the minimum value, Δω=0, formula (21) can be deduced

步骤3.6：设置强化学习奖励：如果本次抓取成功，则赋值给智能体奖励为1，反之赋值给智能体奖励为0。Step 3.6: Set reinforcement learning rewards: If the grasping is successful, the reward assigned to the agent is 1, otherwise the reward assigned to the agent is 0.

步骤4：将预测的末端执行器抓取姿态反向投影至三维重建场景中，判断本次预测抓取质量：将网络输出的6自由度抓取姿态反向投影至抓取前建立的三维体素场景中，通过视角锥体投影截取预测姿态下深度图像，视角锥体投影见附图4。其中视角锥体是一个三维体，他的位置和摄像机的外参矩阵相关，视锥体的形状决定了模型如何从像素空间投影到平面。透视投影使用棱锥作为视锥体，摄像机位于棱锥的椎顶。该棱锥被前后两个平面截断，形成一个棱台，叫做View Frustum，其模型见附图5所示，其中只有位于Frustum内部的模型才是可见的。在这个裁剪空间中有两个平面比较特殊，分别称为近裁剪平面(nearclip plane)和园裁剪平面(far clip plane)。Step 4: Back-project the predicted grasping posture of the end effector into the 3D reconstruction scene to judge the quality of the predicted grasping: back-project the 6-DOF grasping posture output by the network to the 3D volume created before grasping In the pixel scene, the depth image under the predicted pose is intercepted through the perspective cone projection. See Figure 4 for the perspective cone projection. The viewing frustum is a three-dimensional body whose position is related to the camera's extrinsic parameter matrix. The shape of the viewing frustum determines how the model is projected from the pixel space to the plane. Perspective projection uses a pyramid as the viewing frustum, and the camera is positioned at the apex of the pyramid. The pyramid is truncated by the front and rear planes to form a truss called View Frustum. Its model is shown in Figure 5, and only the model inside the Frustum is visible. There are two special planes in this clipping space, which are called the near clip plane and the far clip plane.

通过比较渲染图像中夹持器两端深度信息，推理平行夹爪与物体间的距离位置，从而判断预测抓取姿态的抓取质量，形成机械臂6自由度抓取系统闭环反馈。By comparing the depth information at both ends of the gripper in the rendered image, the distance position between the parallel grippers and the object is inferred, so as to judge the grasping quality of the predicted grasping posture, and form a closed-loop feedback of the 6-DOF grasping system of the robotic arm.

步骤5：通过机器人正逆运动学完成机械臂抓取移动：Step 5: Complete the grasping movement of the robotic arm through the forward and reverse kinematics of the robot:

首先，当上述步骤4渲染出的深度图像中夹持器两端深度小于等于夹持器指尖深度，即夹持器预测姿态位于物体两侧，判断本次预测姿态可以抓取。然后，控制机械臂根据预测抓取姿态执行抓取计划：通过机器人逆运动学求解出当前状态下机械臂的6个关节角度数。然后将所述步骤3中强化学习网络输出预测的机械臂末端6维抓取姿态输入机器人正运动学，求得机械臂从当前姿态变换至预测姿态点的末端执行器移动轨迹。当机械臂末端执行器运动至预测抓取姿态后，机器人发出关闭夹具信号，尝试进行抓取动作。夹持器关闭后，末端执行器垂直上移15cm，从Intel Rrealsense D435相机获取执行器上移后深度图像，通过计算夹持器两端的深度判断实际抓取是否成功。当抓取成功时，强化学习模型奖励赋值为1；抓取失败时，强化学习网络奖励赋值为0。First, when the depth of both ends of the gripper in the depth image rendered in step 4 above is less than or equal to the depth of the fingertip of the gripper, that is, the predicted posture of the gripper is located on both sides of the object, it is judged that the predicted posture can be grasped this time. Then, the manipulator is controlled to execute the grasping plan according to the predicted grasping posture: the 6 joint angles of the manipulator in the current state are obtained through inverse kinematics of the robot. Then input the 6-dimensional grasping posture of the end of the mechanical arm predicted by the reinforcement learning network in step 3 into the forward kinematics of the robot, and obtain the movement trajectory of the end effector of the mechanical arm from the current posture to the predicted posture point. When the end effector of the robotic arm moves to the predicted grasping posture, the robot sends a signal to close the clamp and tries to grasp. After the gripper is closed, the end effector moves up 15cm vertically, and the depth image after the actuator moves up is obtained from the Intel Rrealsense D435 camera, and the actual grasping is judged by calculating the depth at both ends of the gripper. When the grasping is successful, the reward of the reinforcement learning model is assigned a value of 1; when the grasping fails, the reward of the reinforcement learning network is assigned a value of 0.

步骤6：进行强化学习模型训练，使得机械臂完成抓取动作：Step 6: Carry out reinforcement learning model training so that the robotic arm can complete the grasping action:

在CoppeliaSim Edu仿真环境中不断重复上述步骤2至步骤5，通过强化学习奖励缩小强化学习模型中的损失函数，更新模型权重参数。最后，将训练好的权重参数导入到真实机械臂UR5中，进行实验调试，重复步骤1至5，完成机械臂六自由度闭环抓取任务。In the CoppeliaSim Edu simulation environment, the above steps 2 to 5 are repeated continuously, and the loss function in the reinforcement learning model is reduced through reinforcement learning rewards, and the model weight parameters are updated. Finally, import the trained weight parameters into the real robotic arm UR5 for experimental debugging, repeat steps 1 to 5, and complete the six-degree-of-freedom closed-loop grasping task of the robotic arm.

综上所述，本发明的优点在于，通过升高二维神经网络维度直接预测输出机械臂6维抓取姿态。同时本发明通过TSDF函数对复杂的抓取环境进行三维重建，减少物体间遮挡、堆叠造成的环境噪声，降低了单一视觉传感器干扰导致的深度误差。同时，针对该方法设计了强化学习网络，克服了通过传统物理推导机械臂抓取姿态计算繁琐，时间成本高的缺点，解决了机械臂6-DOF抓取姿态无法应用于数据集中未记录的目标对象问题。不仅保证了机械臂模型的较高抓取成功率，还得益于强化学习的泛化性，即该方法可以应用于新的抓取对象，解决了传统方法的耗时计算以及降低了输入部分点云模型的不稳定性。此发明实现了对抓取对象的实时检测并进行6-DOF抓取的功能。In summary, the present invention has the advantage of directly predicting and outputting the 6-dimensional grasping posture of the manipulator by increasing the dimension of the two-dimensional neural network. At the same time, the present invention performs three-dimensional reconstruction on the complex grasping environment through the TSDF function, reduces the environmental noise caused by occlusion and stacking between objects, and reduces the depth error caused by the interference of a single visual sensor. At the same time, a reinforcement learning network is designed for this method, which overcomes the shortcomings of cumbersome calculation and high time cost of the traditional physical derivation of the grasping attitude of the manipulator, and solves the problem that the 6-DOF grasping attitude of the manipulator cannot be applied to unrecorded targets in the data set. object problem. It not only ensures a high grasping success rate of the manipulator model, but also benefits from the generalization of reinforcement learning, that is, the method can be applied to new grasping objects, which solves the time-consuming calculation of traditional methods and reduces the input part. Instability of point cloud models. The invention realizes the real-time detection of the captured object and the function of 6-DOF grasping.

附图说明Description of drawings

图1是本发明中网络模型的结构图；Fig. 1 is the structural diagram of network model among the present invention;

图2是本发明定义的半球状三维抓取姿态角度图；Fig. 2 is a hemispherical three-dimensional grasping attitude angle diagram defined by the present invention;

图3是本发明系统抓取流程图；Fig. 3 is the grab flow chart of the system of the present invention;

图4是视角锥体投影图；Fig. 4 is the perspective cone projection diagram;

图5是View Frustum模型图；Figure 5 is a view of the View Frustum model;

图6是实验抓取示例图；Figure 6 is an example diagram of experimental capture;

具体实施方式Detailed ways

下面结合附图进一步说明本发明。Further illustrate the present invention below in conjunction with accompanying drawing.

本发明的基于TSDF三维重建的机械臂六自由度视觉闭环抓取方法，实物抓取示例图见附图6，具体过程如下：The 6-DOF visual closed-loop grasping method of the robotic arm based on the TSDF three-dimensional reconstruction of the present invention, an example diagram of the physical grasping is shown in Figure 6, and the specific process is as follows:

步骤2.2：由于光线，物体之间相互遮挡等环境因素导致传感器获取的深度图像中存在较大噪声点，从而影响网络模型对图像的抓取预测。首先本发明在每次抓取前对抓取操作台进行球型点采样，通过多次采样局部图像降低单一图像中深度误差。其次，利用TSDF函数将采样点姿态和采样图像信息融合成三维重建体素网格。对于单独一个体素而言，不仅包含直观的x,y,z坐标还用另外两个值(SDF和CAM)来表示体素到最表面的距离。其中SDF为符号距离函数用于深度图的融合，它用0表示表面，正值表示模型外部点，负值表示模型内部点，数值正比于点到最近表面的距离。表面容易提取为函数的零交叉点。如果第i帧深度图像D(u,v)处深度值D(u,v)不为0，则比较D(u,v)与体素相机坐标V(x,y,z)中的z相比较，如果D(u,v)大于z，说明此体素距离相机更近，用于重建场景外表面；如果D(u,v)小于z，说明此体素距离相机更远，为重建场景内表面。即体素点位于表面之外(更靠近相机一侧)，则SDF值为正值，体素在表面之内，则SDF值为负值。Step 2.2: Due to environmental factors such as light and mutual occlusion between objects, there are large noise points in the depth image acquired by the sensor, which affects the image capture prediction of the network model. Firstly, the present invention samples the spherical points of the grabbing operation table before each grabbing, and reduces the depth error in a single image by sampling local images multiple times. Second, the TSDF function is used to fuse the pose of the sampling point and the information of the sampled image into a 3D reconstruction voxel grid. For a single voxel, it not only contains the intuitive x, y, z coordinates but also uses two other values (SDF and CAM) to represent the distance from the voxel to the most surface. Among them, SDF is a signed distance function for the fusion of depth maps. It uses 0 to represent the surface, positive values represent points outside the model, and negative values represent points inside the model. The value is proportional to the distance from the point to the nearest surface. The surface is easily extracted as the zero crossings of the function. If the depth value D(u,v) at the depth image D(u,v) of the i-th frame is not 0, compare D(u,v) with the z phase in the voxel camera coordinates V(x,y,z) In comparison, if D(u,v) is greater than z, it means that this voxel is closer to the camera and is used to reconstruct the outer surface of the scene; if D(u, v) is smaller than z, it means that this voxel is farther away from the camera, which is used to reconstruct the scene The inner surface. That is, if the voxel point is outside the surface (closer to the camera side), the SDF value is positive, and if the voxel is inside the surface, the SDF value is negative.

式中trunc表示人为设定的截断距离，可理解为相机深度信息的误差值，如果误差越大，则该值设置大一些，否则可能造成深度相机获取的很多信息丢掉，本项目trunc设定为1。In the formula, trunc represents the artificially set truncation distance, which can be understood as the error value of the camera’s depth information. If the error is larger, the value should be set larger. Otherwise, a lot of information acquired by the depth camera may be lost. The trunc of this project is set to 1.

w_j＝w_j-1+w_j (9)w _j =w _j-1 +w _j (9)

步骤3.1：设计机械臂抓取策略，定义强化学习的动作空间：通常将目标抓取分为两个抓取位置区域和三维旋转量，通过两个网络模型分别预测抓取区域和三维旋转量。而本发明设计使用一个二维卷积网络通过升高网络维度直接回归6-DoF抓取姿态。本发明定于单个物体的抓取动作空间为16*28*28个动作，其中16为末端执行器绕z坐标轴旋转动作(22.5°)，28*28为末端执行在x,y坐标轴上旋转动作数量。夹持器的定义抓取动作见附图2。Step 3.1: Design the grasping strategy of the manipulator and define the action space for reinforcement learning: usually, the target grasping is divided into two grasping position areas and three-dimensional rotation, and the two network models are used to predict the grasping area and three-dimensional rotation respectively. However, the present invention is designed to use a two-dimensional convolutional network to directly return to the 6-DoF grasping posture by increasing the network dimension. In the present invention, the grasping action space of a single object is 16*28*28 actions, of which 16 are the rotation actions of the end effector around the z coordinate axis (22.5°), and 28*28 are the terminal execution on the x and y coordinate axes Number of rotation actions. See attached drawing 2 for the gripping action of the gripper definition.

r_z＝(best_pix_ind[2]*θ) (12)r _z ＝(best_pix_ind[2]*θ) (12)

其中a_x表示为机械臂末端绕x轴旋转角度，即为末端执行器的侧倾角；b_y表示为机械臂末端绕y轴旋转角度，即为末端执行器的俯仰角；r_z表示为机械臂末端绕z轴旋转角度，即为末端执行器的偏航角。where a _x represents the rotation angle of the end of the mechanical arm around the x-axis, which is the roll angle of the end effector; b _y represents the rotation angle of the end of the mechanical arm around the y-axis, which is the pitch angle of the end effector; r _z represents the mechanical The rotation angle of the end of the arm around the z-axis is the yaw angle of the end effector.

实际抓取示例中，带入偏差值θ＝180/16，计算预测抓取姿态。In the actual crawling example, bring in the deviation value θ = 180/16, calculate the predicted grasp pose.

其中公式(13)表示在状态_s，动作a下的期望回报，其中g_t表示t时刻采取的抓取动作，s_t表示t时刻的状态，r_t表示t时刻的回报；公式(14)表示网络总的回报函数；公式(15)为状态分布函数；公式(16)表示状态-动作函数。Among them, formula (13) represents the expected return under state _s and action a, where g _t represents the grasping action taken at time t, s _t represents the state at time t, and r _t represents the reward at time t; formula (14) expresses The total reward function of the network; Formula (15) is the state distribution function; Formula (16) represents the state-action function.

权重更新函数如下所示：The weight update function looks like this:

将训练好的模型权重参数导入到真实机械臂UR5中，进行实验调试，重复步骤1至5，完成机械臂六自由度闭环抓取任务。经过200轮抓取测试，本发明基于TSDF的机械臂六自由度抓取方法对于非结构化物体的抓取成功率为85.3％(10个目标物体随机堆叠情况)。当目标物体增加到15个随机堆叠情况时，本发明的抓取成功率依然有81.5％。综上所述，本发明设计了强化学习网络，克服了通过传统物理推导机械臂抓取姿态计算繁琐，时间成本高的缺点，解决了机械臂6-DOF抓取姿态无法应用于数据集中未记录的目标对象问题。不仅保证了机械臂模型的较高抓取成功率，还得益于强化学习的泛化性，本发明方法可以应用于未记录的抓取对象，解决了传统方法的耗时计算以及降低了输入部分点云模型的不稳定性。Import the trained model weight parameters into the real robotic arm UR5, conduct experimental debugging, repeat steps 1 to 5, and complete the six-degree-of-freedom closed-loop grasping task of the robotic arm. After 200 rounds of grasping tests, the TSDF-based six-degree-of-freedom grasping method of the present invention has a grasping success rate of 85.3% for unstructured objects (10 target objects are randomly stacked). When the number of target objects increases to 15 random stacking situations, the grabbing success rate of the present invention still has 81.5%. To sum up, the present invention designs a reinforcement learning network, which overcomes the cumbersome calculation of the grasping attitude of the manipulator through traditional physical derivation and the shortcomings of high time cost, and solves the problem that the 6-DOF grasping attitude of the manipulator cannot be applied to the unrecorded data set. target audience problem. It not only ensures a high grasping success rate of the manipulator model, but also benefits from the generalization of reinforcement learning. The method of the present invention can be applied to unrecorded grasping objects, which solves the time-consuming calculation of the traditional method and reduces the input Instability of some point cloud models.

Claims

1. A six-DOF vision closed-loop grasping method based on TSDF three-dimensional reconstruction, is characterized in that: comprise the steps:

Step 1: Calibrate the coordinate system of the manipulator base and the camera coordinate system through the Zhang Zhengyou camera calibration method and the ArUco Markers calibration method;

Step 2: Use the TSDF function to perform 3D scene reconstruction on the acquired image information to reduce the environmental noise points between objects; the specific steps are as follows:

2.1): A single sensor is used to obtain visual information, and the information captured by the binocular camera on the grabbing console is obtained through OpenCV. The captured information includes color information and depth information;

2.2): Due to the environmental factors of mutual occlusion between light and objects, there are large noise points in the depth image acquired by the sensor, which affects the image capture prediction of the network model; first, the capture operation is performed before each capture Spherical point sampling is carried out on the platform, and the depth error in a single image is reduced by sampling local images multiple times; secondly, the TSDF function is used to fuse the sampling point pose and sampled image information into a 3D reconstruction voxel grid; for a single voxel, It not only contains the intuitive x, y, z coordinates, but also uses SDF and CAM two values to represent the distance from the voxel to the most surface, where SDF is a symbolic distance function used for the fusion of depth maps. It uses 0 to represent the surface and a positive value Indicates the external point of the model, the negative value indicates the internal point of the model, and the value is proportional to the distance from the point to the nearest surface; the surface is easily extracted as the zero-crossing point of the function; if the depth value D(u , v) is not 0, then compare D(u, v) with z in the voxel camera coordinates V(x, y, z), if D(u, v) is greater than z, it means that this voxel is far from the camera Closer, used to reconstruct the outer surface of the scene; if D(u, v) is less than z, it means that this voxel is farther away from the camera, which is to reconstruct the inner surface of the scene; the voxel point is located outside the surface and closer to the camera side, then SDF The value is positive; if the voxel is within the surface, the SDF value is negative;

2.3): Define the coordinate system where the above voxel grid is located as the world coordinate system, and the voxel coordinates are expressed as The pose of the camera is /> The internal parameter matrix of the camera is K;

Therefore, according to the transformation matrix obtained by ICP registration, the voxel grid in the world coordinate system Project to the camera coordinate system, and then convert to the image coordinate system I _ij according to the camera internal reference matrix K, where i∈(0,1,...,n), j∈(0,1,...,m);

After projecting the voxel grid to the image coordinate system, it is first necessary to calculate the initial SDF value of each voxel, as shown in formula (6)

In the formula Indicates the position information of the jth voxel in the world coordinate system, /> Indicates the position information of the i-th camera pose in the world coordinate system, dep(I _ij ) indicates the depth value of the j-th voxel in the i-th camera depth image coordinate system;

The sdf value of each voxel is truncated according to formula (7);

In the formula, tsdf _i is the voxel truncated distance value, and trunc represents the artificially set truncated distance, which can be understood as the error value of the camera depth information. If the error is larger, the trunc setting should be larger, otherwise it may cause a lot of information obtained by the depth camera Lost, set trunc to 1;

After truncating the sdf value of each voxel, recalculate the tsdf value of each voxel according to formula (8):

where ω _j (x, y, z) is the weight of voxels in the global voxel grid of the current frame, ω _j-1 (x, y, z) is the weight of voxels in the previous frame of voxel grid, tsdf _j is the distance from the voxel in the global data cube in the current frame to the object surface, tsdf _j-1 is the distance from the voxel in the global data cube in the previous frame to the object surface, V _.z represents the z-axis of the voxel in the camera coordinate system Coordinates, D(u, v) represent the depth value at the depth image (u, v) of the current frame;

Formula (9) calculates the weight value of each voxel;

w _j =w _j-1 +w _j (9)

Render the environment information of the grab console from top to bottom and use it as the input of the network model;

Step 3: Establish a reinforcement learning network model;

Step 4: Back-project the predicted grasping posture of the end effector into the 3D reconstruction scene to judge the quality of the predicted grasping; the specific steps are as follows:

Back-project the predicted grasping posture of the end effector into the 3D reconstruction scene to judge the predicted grasping quality: back-project the 6-DOF grasping posture output by the network into the 3D voxel scene created before grasping , the depth image under the predicted attitude is intercepted through the perspective cone projection; by comparing the depth information at both ends of the gripper in the rendered image, the distance between the parallel jaws and the object is deduced, so as to judge the grasping quality of the predicted grasping attitude and form a mechanical Closed-loop feedback of grasping system with 6 degrees of freedom of the arm;

Step 5: Complete the grasping movement of the robotic arm through the forward and reverse kinematics of the robot; the specific steps are as follows:

When the depth of both ends of the gripper in the rendered depth image is less than or equal to the depth of the fingertip of the gripper, that is, the predicted posture of the gripper is located on both sides of the object, it is judged that the predicted posture can be grasped; then, control the manipulator according to the predicted posture Execute the grasping plan: solve the 6 joint angles of the manipulator in the current state through the inverse kinematics of the robot; then input the 6-dimensional grasping posture at the end of the manipulator predicted by the reinforcement learning network model in step 3 into the forward motion of the robot Learning, to obtain the movement trajectory of the end effector of the manipulator from the current attitude to the predicted attitude point; when the end effector of the manipulator moves to the predicted attitude, the robot sends a signal to close the gripper and tries to grasp the action; the gripper After closing, the end effector moves up vertically by 15cm, obtains the depth image of the end effector after moving up from the binocular camera, and judges whether the actual grasp is successful by calculating the depth at both ends of the gripper; when the grasp is successful, the reinforcement learning model rewards The assigned value is 1; when the capture fails, the reinforcement learning network reward is assigned a value of 0;

Step 6: Perform reinforcement learning model training to make the robotic arm complete the grasping action.

2. The TSDF three-dimensional reconstruction-based robotic arm six-degree-of-freedom visual closed-loop grasping method according to claim 1, characterized in that: the specific steps of step 1 are as follows:

First, fix the binocular camera vertically at the end of the robotic arm. The binocular camera uses an Intel D415 depth camera, so that the camera and the end effector are in a fixed attitude transformation, so that the binocular camera can collect and capture image information of objects on the console; The end effector is a gripper. Then, the internal and external parameters of the camera are calibrated by Zhang Zhengyou’s camera calibration method and ArUco Markers calibration method, and a three-dimensional coordinate system with the base of the manipulator as the origin is constructed. The calibration formulas of the internal and external parameters of the camera are as follows:

Among them, x _w , y _w , z _w are three-dimensional coordinates in the world coordinate system; x _c , y _c , z _c are coordinates in the image coordinate system; u, v are pixel coordinate systems; (x _m , y _m , z _m ) is a point in the assumed camera coordinate system; (x _p , y _p , z _p ) is the imaging point in the image coordinate system; f is the imaging distance; d _x , d _y are the coordinates of the pixel point in the image coordinate system; R, T are the rotation matrix and translation matrix; formula (1) is the formula from the world coordinate system to the camera coordinate system, formula (2) is the formula from the camera coordinate system to the ideal image coordinate system, and formula (3) is the formula from the camera coordinate system to the pixel coordinates Coordinate matrix, that is, internal reference matrix.

3. The TSDF three-dimensional reconstruction-based robotic arm six-degree-of-freedom visual closed-loop grasping method according to claim 1, characterized in that: the specific steps of step 3 are as follows:

3.1): Design the grasping strategy of the manipulator and define the action space of reinforcement learning: design and use a two-dimensional convolutional network, and directly return the 6-DoF grasping posture by increasing the network dimension, so as to distinguish it from dividing the target grasping into two parts. A grasping position area and three-dimensional rotation amount, respectively predicting the grasping area and three-dimensional rotation amount through two network models; defining the grasping action space of a single object as 16*28*28 actions, 16 of which are end effectors Rotate around the z coordinate axis, 28*28 is the number of rotations of the end effector on the x and y coordinate axes;

3.2): Convert each element in the array to the angle at which the end of the robot arm rotates around the three coordinate axes of x, y, and z respectively. The specific conversion formula is as follows:

a _x = ((best_pix_ind[0]-14)*30/28)-pi (10)

b _y =((best_pix_ind[1]-14)*30/28) (11)

r _z =(best_pix_ind[2]*180/16) (12)

where a _x represents the rotation angle of the end of the mechanical arm around the x-axis, which is the roll angle of the end effector; b _y represents the rotation angle of the end of the mechanical arm around the y-axis, which is the pitch angle of the end effector; r _z represents the mechanical The rotation angle of the end of the arm around the z-axis is the yaw angle of the end effector, and θ is the deviation value set in the experiment;

3.3): Design feature extraction network model: The model adopts Resnet full convolution feature extraction network architecture, learning to extract unstructured features from RGB-D partial observations; multi-layer feature extraction convolution layer extracts the features of color images and depth images Finally, the color features and depth features are channel-connected through the Concat function, and 2 additional BatchNormalization, Nonlinear activation functions, 1×1 Convolutional Layers are used to interweave, and upsampling is used to keep the resolution of the output heat map consistent with the input image;

Select the corresponding gesture in the action space corresponding to the coordinate of the largest vector value in the tensor graph as the output of the predicted grasping gesture;

3.4): Perform forward reasoning on the network by the following formula:

Among them, formula (13) represents the expected return under state s and action a, where g _t represents the grasping action taken at time t, s _t represents the state at time t, and r _t represents the reward at time t; formula (14) expresses The total reward function of the network; formula (15) is the state distribution function; formula (16) represents the state-action function;

3.5): Design the reinforcement learning network loss function, and use the calculation cross entropy loss function to further improve the detection accuracy of the algorithm:

Where τ=s ₀ a ₀ s ₁ a ₁ ... s _n a _n ... represents a Markov process;

because P _r {a|s}=π(s, a), so formula (18) is obtained;

The weight update function looks like this:

where f _ω : S×A→R is the pair The approximate function of , when f _ω takes the minimum value, Δω=0, formula (20) can be deduced;

3.6): Set reinforcement learning rewards: if the capture is successful, the reward assigned to the agent is 1, otherwise the reward assigned to the agent is 0.

4. The TSDF three-dimensional reconstruction-based robotic arm six-degree-of-freedom visual closed-loop grasping method according to claim 1, characterized in that: the specific steps of step 6 are as follows:

In the CoppeliaSim Edu simulation environment, repeat the above steps 2 to 5, reduce the cross-entropy loss function in the reinforcement learning network model through reinforcement learning rewards, and update the model weight parameters; finally, import the trained weight parameters into the real robot arm UR5 , carry out experimental debugging, repeat steps 1 to 5, and complete the six-degree-of-freedom closed-loop grasping task of the robotic arm.