CN112894815B

CN112894815B - Optimal pose detection method for object grasping by visual servo robotic arm

Info

Publication number: CN112894815B
Application number: CN202110097875.8A
Authority: CN
Inventors: 田军委; 闫明涛; 张震; 苏宇; 赵鹏; 徐浩铭; 杨寒
Original assignee: Xian Technological University
Current assignee: Dongguan Ruidong Intelligent Technology Co ltd
Priority date: 2021-01-25
Filing date: 2021-01-25
Publication date: 2022-09-27
Anticipated expiration: 2041-01-25
Also published as: CN112894815A

Abstract

The invention discloses a method for detecting the optimal grabbing pose of a visual servo mechanical arm, which mainly comprises the following steps: step one, reading a photo; step two, extracting SURF characteristic points; thirdly, generating a feature vector of the image according to the SURF feature points; step four, a matching pair (containing a wild value) is initially established; step five, preventing affine change and removing outliers which do not meet the change; step six, acquiring a polygonal frame of the target object; resolving an optimal pose under a two-dimensional coordinate system; and step eight, controlling the mechanical arm to grab the article by the servo control. The method combines monocular vision with the servo mechanical arm, and further obtains the optimal pose angle of the target object by utilizing the geometric relation of slope obtained between two points, namely the angle of rotation grasped by the servo mechanical arm, so that the method is simple and easy to implement, has few occupation problems, is high in calculation speed, and has wide application prospect.

Description

Optimal pose detection method for object grasping by visual servo robotic arm

技术领域technical field

本发明涉及视觉处理技术领域，尤其涉及视觉伺服机械臂物品抓取最佳位姿检测方法。The invention relates to the technical field of visual processing, in particular to a method for detecting the optimal posture and attitude of a visual servo robotic arm for object grasping.

背景技术Background technique

下水道、勾缝等狭窄位置的手机拾取是一个广泛存在的问题，利用机械臂结构进行物品抓取是较为有效的方案，但一直缺乏针对此类问题与机械臂结构相配合的物品定位及动作指挥方法。机器人在对目标物体进行自动抓取之前，首先应确定被抓物体的位姿，即确定合适的抓取位置及抓取姿态，而机器视觉检测是最常用的目标位姿检测方法；典型的机器视觉位姿提取方法主要包括单目视觉和双目视觉两大类。单目视觉操作方便，处理数据信息速度快，但缺乏深度信息，难以完成测量深度的工作；双目视觉利用空间同一点在两个摄像机上成像点的匹配，可以进行深度测量，并可以对简单环境中的形状简单的物体定位定姿，它的缺点是计算过程涉及大量的图像匹配和目标识别算法，只能针对具有简单几何关系的目标使用，无法在复杂背景下应用。而狭窄位置的手机拾取这类物体形状简单的情况刚好适用于单目视觉方式。Picking up mobile phones in narrow locations such as sewers and seams is a widespread problem. Using the mechanical arm structure to grab items is a more effective solution. However, there has been a lack of item positioning and action commands that match the mechanical arm structure for such problems. method. Before the robot automatically grasps the target object, it should first determine the pose of the object to be grasped, that is, to determine the appropriate grasping position and grasping posture, and machine vision detection is the most commonly used target posture detection method; Visual pose extraction methods mainly include two categories: monocular vision and binocular vision. Monocular vision is easy to operate and fast to process data information, but it lacks depth information, which makes it difficult to measure depth; binocular vision uses the same point in space to match imaging points on two cameras, which can measure depth, and can perform simple The disadvantage of positioning and attitude determination of objects with simple shapes in the environment is that the calculation process involves a large number of image matching and target recognition algorithms, which can only be used for targets with simple geometric relationships, and cannot be applied in complex backgrounds. The simple shape of such objects picked up by the mobile phone in a narrow position is just suitable for monocular vision.

所以有必要发明一种基于单目视觉的、计算速度快、占用资源少的视觉伺服机械臂物品抓取最佳位姿检测方法。Therefore, it is necessary to invent a method based on monocular vision, which has the advantages of fast calculation speed and low resource consumption.

发明内容SUMMARY OF THE INVENTION

发明目的：为了克服现有技术中存在的不足，本发明提供一种基于单目视觉的、计算速度快、占用资源少的视觉伺服机械臂物品抓取最佳位姿检测方法。Purpose of the invention: In order to overcome the deficiencies in the prior art, the present invention provides a method for detecting the optimal position and attitude of a visual servo robotic arm for object grasping based on monocular vision, with fast calculation speed and low resource occupation.

技术方案：为实现上述目的，本发明的视觉伺服机械臂抓取最佳位姿检测方法，其特征在于：包括以下步骤，Technical solution: In order to achieve the above purpose, the visual servo robot arm grabbing the best posture detection method of the present invention is characterized in that: it includes the following steps:

步骤一，读取照片；Step 1, read the photo;

步骤二，提取SURF特征点；Step 2, extract SURF feature points;

步骤三，根据SURF特征点生成图像的特征向量；Step 3, generate the feature vector of the image according to the SURF feature points;

步骤四，初步建立一个匹配对(含野值)；Step 4, initially establish a matching pair (including outliers);

步骤五，预防仿射变化，去除不满足变化的野值；Step 5: Prevent affine changes and remove outliers that do not satisfy the changes;

步骤六，获取目标物的多边形框；Step 6, obtaining the polygon frame of the target object;

步骤七，二维坐标系下最佳位姿解算；Step 7: Calculation of the optimal pose in the two-dimensional coordinate system;

步骤八，伺服控制机械臂抓取物品。Step 8, the servo-controlled robotic arm grabs the item.

进一步地，在步骤一中，采用相机成像模型进行照片拍摄和相机标定；所述相机成像模型由世界坐标系O_w-X_wY_wZ_w、相机坐标系O_c-X_cY_cZ_c、像素坐标系O_p-uv和图像坐标系O_i-X_iY_i构成；其中P是相机坐标系中的一点，坐标是(X_c,Y_c,Z_c)；P’是在图像中的成像点，在图像坐标系的坐标是(x,y)，在像素坐标系是(u,v)；相机标定是为了确定相机坐标系、图像坐标系、像素坐标系与真实坐标系之间的关系。Further, in step 1, the camera imaging model is used for photo shooting and camera calibration; the camera imaging model is composed of the world coordinate system _Ow - _XwYwZw , the _camera _coordinate _system _Oc _- _XcYcZc , the pixel coordinate system Op-uv and the image coordinate system O _i -X _i Y _i ; where _P is a point in the camera coordinate system, and the coordinates are (X _c , Y _c , Z _c ); P' is in the image The imaging point of the image coordinate system is (x, y) in the image coordinate system and (u, v) in the pixel coordinate system; camera calibration is to determine the camera coordinate system, image coordinate system, pixel coordinate system and the real coordinate system. Relationship.

进一步地，在步骤二中，包括以下子步骤，Further, in step 2, include the following sub-steps,

子步骤一，建立积分图像；通过简单的加减法去计算任一区域的像素之和，矩形ABCD内所有像素的灰度值之和计算公式为：Sub-step 1, establish an integral image; calculate the sum of pixels in any area by simple addition and subtraction, and the calculation formula of the sum of gray values of all pixels in the rectangle ABCD is:

∑＝A-B-C+D∑=A-B-C+D

子步骤二，建立Hessian矩阵；一个像素点(x,y)，设函数f(x,y)的Hessian矩阵定义为：Sub-step 2, establish a Hessian matrix; a pixel point (x, y), let the Hessian matrix of the function f(x, y) be defined as:

可以看出H矩阵由函数f的二阶的二阶偏导数构成，每一个像素点都可以解算出一个H矩阵；式中：L_xx(x,σ)为图像f与尺度为σ的高斯二阶导数的卷积其定义如下：It can be seen that the H matrix is composed of the second-order second-order partial derivative of the function f, and each pixel point can solve an H-matrix; in the formula: L _xx (x, σ) is the image f and the Gaussian with scale σ. The convolution of the first derivative is defined as follows:

对于L_xy(x,σ)和L_yy(x,σ)，定义相似。The definitions are similar for L _xy (x,σ) and L _yy (x,σ).

子步骤三，建立尺度空间；SUFR特征极值点的检测算法和提取是基于尺度空间理；通过改变使用高斯滤波器的尺度，不是改变图像本身来构成对不同尺度的响应。Sub-step 3, establish the scale space; the detection algorithm and extraction of SUFR feature extreme points are based on the scale space theory; by changing the scale of the Gaussian filter, not changing the image itself, the response to different scales is formed.

子步骤四，提取特征点；利用非极值去除掉确定特征点，在海森行列式图像的每个像素点与3维邻域的26个点进行大小比较，设定一个阈值，用3维线性插值法得到亚像素级的特征点后，去掉特征值小于阈值的点，得出更稳定的点。Sub-step 4, extract feature points; use non-extreme values to remove certain feature points, compare the size of each pixel point of the Hessian determinant image with the 26 points in the 3-dimensional neighborhood, set a threshold, and use 3-dimensional After the linear interpolation method obtains the sub-pixel-level feature points, the points whose feature values are less than the threshold are removed to obtain more stable points.

子步骤五，选取特征点的方向；以特征点为中心，检测兴趣点确定尺度s，统计半径扇形内所有点的Haar小波响应，并给靠近特征点的像素赋予较大的权重；统计该区域所有Harr小波响应之和，形成矢量，即扇形的方向；遍历所有的扇形区域，选择最长矢量方向作为该特征点的方向。Sub-step 5, select the direction of the feature point; take the feature point as the center, detect the point of interest to determine the scale s, count the Haar wavelet responses of all points within the radius sector, and give greater weight to the pixels close to the feature point; count the area The sum of all Harr wavelet responses forms a vector, that is, the direction of the fan; traverse all the fan regions, and select the longest vector direction as the direction of the feature point.

子步骤六，构造SURF的特征点描述子；以特征点为中心，构造出以20s为边长的正方形窗口，s为特征点所在的尺度；将其分为16个4×4的子区域，每个子区域统计包含有25个像素元的水平x和垂直y的Haar小波特征之和分别为∑H_x和∑H_y；统计出每个子区域的4维描述子V＝[∑H_x,∑H_y,∑|H_x|,∑|H_y|]，得出特征向量长度为64维的特征描述子。Sub-step six, construct the feature point descriptor of SURF; take the feature point as the center, construct a square window with 20s as the side length, s is the scale where the feature point is located; divide it into 16 4×4 sub-regions, The sum of the Haar wavelet features of the horizontal x and vertical y of 25 pixel elements in each sub-region is ∑H _x and ∑H _y respectively; the 4-dimensional descriptor V=[∑H _x ,∑ H _y ,∑|H _x |,∑|H _y |], the feature descriptor with the feature vector length of 64 dimensions is obtained.

进一步地，在步骤七中，Further, in step seven,

将图像在像素坐标系O_p-uv中位置关系转化为图像的物理坐标O_i-X_iY_i，其目的是在图像的物理坐标系中可以进行几何关系的计算，转化公式为矩阵形式(4)：Convert the positional relationship of the image in the pixel coordinate system _Op-uv to the physical coordinates O _i -X _i Y _i of the image, the purpose is to calculate the geometric relationship in the physical coordinate system of the image, and the conversion formula is in the form of a matrix ( 4):

其中(u,v)是在水平方向上和垂直方向的像素，设O_i(u₀,v₀)是图像的坐标系中心的像素，dx和dy是每个像素在x轴和y轴的方向上的实际物理尺寸，由相机的标定可以知u₀,v₀,dx,dy；SURF不变特征点算法匹配到的物品矩形框CDEF中利用图像算法可求出物品的质心O₄(u_s,v_s)和CDEF的像素坐标，根据几何关系进而可以求出物品的抓持点A(u₁,v₁)和B(u₂,v₂)的像素坐标，其匹配到的物品矩形框；where (u, v) are the pixels in the horizontal and vertical directions, let O _i (u ₀ , v ₀ ) be the pixel at the center of the coordinate system of the image, and dx and dy are the x and y axes of each pixel The actual physical size in the direction can be known from the calibration of the camera u ₀ , v ₀ , dx, dy; in the rectangular frame CDEF of the item matched by the SURF invariant feature point algorithm, the center of mass of the item O ₄ (u 4 (u _s , v _s ) and the pixel coordinates of CDEF, according to the geometric relationship, the pixel coordinates of the grasping points A (u ₁ , v ₁ ) and B (u ₂ , v ₂ ) of the item can be obtained, and the matched item rectangle frame;

将物品的抓持点A(u₁,v₁)和B(u₂,v₂)的像素坐标带入公式(4)推导可得A和B的物理坐标，如公式(5)：Bring the pixel coordinates of the grasping points A(u ₁ , v ₁ ) and B(u ₂ , v ₂ ) of the object into formula (4) to derive the physical coordinates of A and B, such as formula (5):

以及公式(6)：and formula (6):

所求出A(x₁,y₁)和B(x₂,y₂)的图像物理坐标，可以解出夹角α即最佳位姿角，理论上伺服给机械手抓的旋转角度也应为α，但由于机械臂的系统误差使得旋转角度小于或大于α，故取旋转角度为β；取机械手抓相对物品顺时针旋转为正，逆时针旋转为负，利用在坐标系中由两点之间的几何关系可求其斜率，可得公式(7)：The obtained image physical coordinates of A(x ₁ , y ₁ ) and B(x ₂ , y ₂ ) can solve the included angle α, that is, the optimal pose angle. In theory, the rotation angle of the servo to the manipulator should also be α, but the rotation angle is less than or greater than α due to the systematic error of the manipulator, so the rotation angle is taken as β; the clockwise rotation of the manipulator grasping the relative object is positive, and the counterclockwise rotation is negative. The slope of the geometric relationship between can be calculated, and the formula (7) can be obtained:

由公式(5)(6)(7)可得公式(8)：Formula (8) can be obtained from formula (5) (6) (7):

求出公式(8)中的角度α，取值范围[0,90°]，若为正则机械手抓顺时针旋转，负则逆时针旋转。α是物品在二维平面的位姿角也是机械手抓旋转的夹持角，其夹持点为A和B两点。由公式(8)可知，dx,dy是相机内参数，故α只与夹持点A和B两点的像素点的差值有关，简化了图像中复杂的计算，从而提高视觉伺服效果。Find the angle α in formula (8), the value range is [0, 90°], if it is positive, the manipulator will rotate clockwise, and if it is negative, it will rotate counterclockwise. α is the pose angle of the object in the two-dimensional plane and also the gripping angle of the gripping rotation of the manipulator, and its gripping points are two points A and B. It can be seen from formula (8) that dx and dy are internal parameters of the camera, so α is only related to the difference between the pixel points of the clamping points A and B, which simplifies the complex calculation in the image, thereby improving the visual servo effect.

有益效果：本发明的视觉伺服机械臂抓取最佳位姿检测方法，该方法将单目视觉与伺服机械臂相结合，通过相机标定，图像采集、图像预处理等过程，实现SURF不变特征点匹配到目标物体，完成目标检测。通过摄像机成像原理，计算出从像素坐标系到图像坐标系的位置变换关系，利用两点间求斜率的几何关系进而求出目标物体的最佳位姿角，即伺服机械手抓应旋转的角度，本算法简单易行，可有效避免双目视觉位姿测量过程中图像匹配资源占用问题，计算速度快。实验验证了该算法的有效性和准确性，可以完成物品最佳位姿检测及机械手抓抓持点的确定，满足视觉伺服机械臂抓取要求。同时，该测量算法可以应用到工业零件定位、码垛机器人搬运等领域，具有广阔的应用前景。Beneficial effects: The visual servo robotic arm grasps the best pose detection method of the present invention, which combines monocular vision and servo robotic arms, and realizes SURF invariant features through processes such as camera calibration, image acquisition, and image preprocessing. The point is matched to the target object to complete the target detection. Through the principle of camera imaging, the position transformation relationship from the pixel coordinate system to the image coordinate system is calculated, and the optimal pose angle of the target object is obtained by using the geometric relationship between the two points to find the slope, that is, the angle that the servo manipulator should rotate. The algorithm is simple and easy to implement, can effectively avoid the problem of image matching resource occupancy in the process of binocular vision pose measurement, and the calculation speed is fast. Experiments have verified the effectiveness and accuracy of the algorithm, which can complete the detection of the best pose of the object and the determination of the grasping point of the manipulator, which meets the requirements of the visual servo manipulator for grasping. At the same time, the measurement algorithm can be applied to industrial parts positioning, palletizing robot handling and other fields, and has broad application prospects.

附图说明Description of drawings

图1为机械臂抓取物品系统整体示意图；Figure 1 is an overall schematic diagram of a robotic arm grabbing an item system;

图2为摄像机成像模型示意图；Figure 2 is a schematic diagram of a camera imaging model;

图3为标记流程示意图；3 is a schematic diagram of a marking process;

图4为积分图快速计算原理图；Figure 4 is a schematic diagram of the fast calculation of the integral graph;

图5为位姿解算原理图；Figure 5 is a schematic diagram of the pose solution;

图6为匹配物品矩形框示意图；6 is a schematic diagram of a rectangular frame of matching items;

图7为MATLAB工具箱标定实验概括图；Fig. 7 is the summary diagram of MATLAB toolbox calibration experiment;

图8为单一背景下位姿检测实验操作示意图；FIG. 8 is a schematic diagram of the experimental operation of pose detection under a single background;

图9为复杂背景下位姿检测实验操作示意图；FIG. 9 is a schematic diagram of the experimental operation of pose detection in a complex background;

图10为位姿解算与物品抓取实物演示图。Figure 10 is a demonstration of the pose solution and object grasping.

具体实施方式Detailed ways

下面结合附图对本发明作更进一步的说明。The present invention will be further described below in conjunction with the accompanying drawings.

机械臂与相机位置一般有两种关系：一种相机在机械臂上eye-in-hand，这种可操作性高并且可以获取较高的图像数据；一种是相机在机械臂外eye-to-hand，其优点是相机获取的视野广、灵活性强，但机械臂易受相机的位置影响。本文采用相机在机械臂上eye-in-hand，经过相机与机械臂末端执行器的标定之后可以获得固定的变换关系，O1O2O3O4分别表示机械臂坐标系、机械手抓坐标系、相机坐标系、物品坐标系的原点。该方案原理如图1。There are generally two relationships between the position of the robotic arm and the camera: one is that the camera is eye-in-hand on the robotic arm, which is highly maneuverable and can obtain high image data; the other is that the camera is eye-to-hand outside the robotic arm -hand, its advantages are that the camera has a wide field of view and high flexibility, but the robotic arm is easily affected by the position of the camera. In this paper, the camera is used in the eye-in-hand of the manipulator. After the calibration of the camera and the end effector of the manipulator, a fixed transformation relationship can be obtained. O1O2O3O4 respectively represent the coordinate system of the manipulator, the coordinate system of the manipulator, the coordinate system of the camera, and the coordinate of the item. origin of the system. The principle of the scheme is shown in Figure 1.

视觉伺服机械臂抓取最佳位姿检测方法，其特征在于：包括以下步骤，The method for detecting the optimal pose and posture for grasping by a visual servo manipulator is characterized by comprising the following steps:

步骤一，读取照片；Step 1, read the photo;

步骤二，提取SURF特征点；Step 2, extract SURF feature points;

在步骤一中，采用相机成像模型进行照片拍摄和相机标定；所述相机成像模型由世界坐标系O_w-X_wY_wZ_w、相机坐标系O_c-X_cY_cZ_c、像素坐标系O_p-uv和图像坐标系O_i-X_iY_i构成；其中P是相机坐标系中的一点，坐标是(X_c,Y_c,Z_c)；P’是在图像中的成像点，在图像坐标系的坐标是(x,y)，在像素坐标系是(u,v)；成像原理模型如图2所示；相机标定是为了确定相机坐标系、图像坐标系、像素坐标系与真实坐标系之间的关系。具体而言即由相机的成像原理创建数学模型，借助已知特征的像素坐标和对应的世界坐标推导相机的模型参数，相机的内外参数进行标定流程如图3所示。In step 1, the camera imaging model is used for photo shooting and camera calibration; the camera imaging model is composed of the world coordinate system O _w -X _w Y _w Z _w , the camera coordinate system O _c -X _c Y _c Z _c , the pixel coordinates The system O _p -uv and the image coordinate system O _i -X _i Y _i are composed; where P is a point in the camera coordinate system, and the coordinates are (X _c , Y _c , Z _c ); P' is the imaging point in the image , the coordinates in the image coordinate system are (x, y), and the pixel coordinate system is (u, v); the imaging principle model is shown in Figure 2; the camera calibration is to determine the camera coordinate system, image coordinate system, pixel coordinate system relation to the real coordinate system. Specifically, the mathematical model is created by the imaging principle of the camera, and the model parameters of the camera are deduced with the help of the pixel coordinates of the known features and the corresponding world coordinates. The internal and external parameters of the camera are calibrated as shown in Figure 3.

在步骤二中，包括以下子步骤，In step 2, the following sub-steps are included,

子步骤一，建立积分图像；通过简单的加减法去计算任一区域的像素之和，如图4所示，矩形ABCD内所有像素的灰度值之和计算公式(1)为：Sub-step 1, establish an integral image; calculate the sum of the pixels in any area by simple addition and subtraction, as shown in Figure 4, the calculation formula (1) of the sum of the gray values of all pixels in the rectangle ABCD is:

∑＝A-B-C+D∑=A-B-C+D

子步骤二，建立Hessian矩阵；一个像素点(x,y)，设函数f(x,y)的Hessian矩阵定义(2)为：Sub-step 2, establish a Hessian matrix; a pixel point (x, y), set the Hessian matrix definition (2) of the function f(x, y) as:

可以看出H矩阵由函数f的二阶的二阶偏导数构成，每一个像素点都可以解算出一个H矩阵；式中：L_xx(x,σ)为图像f与尺度为σ的高斯二阶导数的卷积其定义(3)如下：It can be seen that the H matrix is composed of the second-order second-order partial derivative of the function f, and each pixel point can solve an H-matrix; in the formula: L _xx (x, σ) is the image f and the Gaussian with scale σ. The definition of the convolution of the order derivative (3) is as follows:

在步骤七中，In step seven,

物品的最佳位姿检测包括SURF不变特征点匹配的目标检测和其位姿解算。对机械臂上相机标定完,解出相机的内外参数矩阵，对SURF不变特征点匹配检测到物品图像在二维坐标系下进行最佳位姿检测与解算，解算原理过程如图5(a)(b)所示。The optimal pose detection of items includes object detection with SURF invariant feature point matching and its pose solution. After calibrating the camera on the robotic arm, solve the internal and external parameter matrix of the camera, and perform the best pose detection and calculation of the image of the object detected by matching the SURF invariant feature points in a two-dimensional coordinate system. The principle process of the solution is shown in Figure 5. (a) (b).

原理过程如下：在图像坐标系Oi-XiYi中，设物品的质心坐标为O4(sx,sy)，设机械臂末端手抓中心O2(zx,zy)与相机光轴中心Oi所标定的偏差距离e，标定机械手抓的手指姿态为PQ，与相机的成像的投影关系如图5(a)所示，其中Op-u v是像素坐标系。The principle process is as follows: In the image coordinate system Oi-XiYi, set the barycentric coordinate of the object as O4(sx,sy), set the deviation distance between the grip center O2(zx,zy) at the end of the manipulator and the optical axis center Oi of the camera e, the gesture of the finger grasped by the calibration robot is PQ, and the projection relationship with the imaging of the camera is shown in Figure 5(a), where Op-u v is the pixel coordinate system.

伺服机械臂从(a)移动至(b)状态，使其手抓的中心O2与物品的质心O4重合，如图5(b)所示。物品姿态的y轴与图像坐标系Oi-XiYi的Xi轴夹角是α，物品不动驱动机械手抓旋转α角度，使其手指PQ与物品姿态的y轴在投影上重合，即机械手抓旋转α角度后手指PQ的新姿态为P1Q1，抓取点为A(x1,y1)和B(x2,y2)两点。旋转角度α和手指的新姿态P1Q1为伺服机械臂抓取物品提供了准确性。The servo manipulator moves from (a) to (b) state so that the center O2 of its grasp coincides with the center of mass O4 of the object, as shown in Figure 5(b). The angle between the y-axis of the object's attitude and the Xi-axis of the image coordinate system Oi-XiYi is α. The object does not move and drives the manipulator to grasp and rotate by an angle of α, so that the finger PQ coincides with the y-axis of the object's attitude in projection, that is, the manipulator grasps and rotates α. The new posture of the finger PQ after the angle is P1Q1, and the grasping points are A(x1, y1) and B(x2, y2). The rotation angle α and the new pose P1Q1 of the fingers provide the accuracy for the servo robotic arm to grasp items.

将图像在像素坐标系O_{p-u v}中位置关系转化为图像的物理坐标O_i-X_iY_i，其目的是在图像的物理坐标系中可以进行几何关系的计算，转化公式为矩阵形式(4)：Convert the positional relationship of the image in the pixel coordinate system O _{pu v} to the physical coordinates O _i -X _i Y _i of the image, the purpose is to calculate the geometric relationship in the physical coordinate system of the image, and the conversion formula is in matrix form (4 ):

其中(u,v)是在水平方向上和垂直方向的像素，设O_i(u₀,v₀)是图像的坐标系中心的像素，dx和dy是每个像素在x轴和y轴的方向上的实际物理尺寸，由相机的标定可以知u₀,v₀,dx,dy；SURF不变特征点算法匹配到的物品矩形框CDEF中利用图像算法可求出物品的质心O₄(u_s,v_s)和CDEF的像素坐标，根据几何关系进而可以求出物品的抓持点A(u₁,v₁)和B(u₂,v₂)的像素坐标，其匹配到的物品矩形框；where (u, v) are the pixels in the horizontal and vertical directions, let O _i (u ₀ , v ₀ ) be the pixel at the center of the coordinate system of the image, and dx and dy are the x and y axes of each pixel The actual physical size in the direction can be known from the calibration of the camera, u ₀ , v ₀ , dx, dy; in the rectangular frame CDEF of the item matched by the SURF invariant feature point algorithm, the center of mass O ₄ (u 4 (u _s , v _s ) and the pixel coordinates of CDEF, according to the geometric relationship, the pixel coordinates of the grasping points A (u ₁ , v ₁ ) and B (u ₂ , v ₂ ) of the item can be obtained, and the matched item rectangle frame;

以及公式(6)：and formula (6):

从算法的设计原理和求解结果来看，相比传统的位姿检测算法，在计算上有较大的改进。如最早由fischler等于1981年提出典型PNP问题，是其根据目标点的图像坐标和相机成像模型解算目标相对于相机的位姿问题，而在其平面PNP问题的求解过程中会用到非线性求解算法，该算法虽法精度较高，但一般需要迭代，计算量较大，且存在稳定性问题。而在后期的研究深入过程中如典型的算法有DENIS OBERKAMPF提出的POSIT算法；GeraldSchweighofer提出的平面目标位姿估计算法(SP)；Lepetit提出的EPnP算法，Lu提出的正交迭代算法等。这些典型算法在位姿求解过程中都属于非线性求解，因此会占用图像解算过程中的资源问题，而本算法的提出可以良好的避免复杂的迭代问题，减少占用资源问题。From the design principle and solution results of the algorithm, compared with the traditional pose detection algorithm, there is a great improvement in calculation. For example, the typical PNP problem was first proposed by Fischler et al. in 1981, which is to solve the pose problem of the target relative to the camera according to the image coordinates of the target point and the camera imaging model, and in the process of solving the plane PNP problem, nonlinearity will be used. Although the algorithm has high accuracy, it generally requires iteration, requires a large amount of calculation, and has stability problems. In the process of further research in the later stage, typical algorithms include the POSIT algorithm proposed by DENIS OBERKAMPF; the plane target pose estimation algorithm (SP) proposed by Gerald Schweighofer; the EPnP algorithm proposed by Lepetit, and the orthogonal iterative algorithm proposed by Lu. These typical algorithms are nonlinear in the process of solving pose and pose, so they will occupy the resource problem in the process of image solving. The proposed algorithm can avoid complex iterative problems and reduce the problem of resource occupation.

相机标定和目标检测实验选用MATLAB 2018b版软件，以及MATLAB硬件支持包usbwebcams，操作系统为64位Win10系统。位姿解算与伺服机械手抓抓取实验选用Rethink双臂机器人的右臂，其操作系统为64位Linux系统Ubuntu16.04，进行抓取验证。MATLAB 2018b version software and MATLAB hardware support package usbwebcams were selected for camera calibration and target detection experiments, and the operating system was 64-bit Win10 system. The right arm of the Rethink dual-arm robot is selected for the pose calculation and grasping experiments of the servo manipulator, and its operating system is Ubuntu16.04, a 64-bit Linux system, for grasping verification.

Rethink双臂机器人的右臂进行位姿检测和物品抓取实验之前需要先完成以下四点：第一：右臂上相机的内外参数的标定；第二：手抓中心与相机中心的位置校正；第三：将双臂机器人的UI显示界面窗口大小标定为640×480像素；第四：右臂末端初始位置的标定，使其在沿水平面方向移动搜索。The following four points need to be completed before the right arm of the Rethink dual-arm robot performs the pose detection and object grasping experiments: first: calibration of the internal and external parameters of the camera on the right arm; second: the position correction between the grip center and the camera center; The third: the UI display interface window size of the dual-arm robot is calibrated to 640×480 pixels; the fourth: the initial position of the end of the right arm is calibrated so that it can move and search along the horizontal plane.

相机标定实验Camera calibration experiment

机械臂上的摄像头最大分辨率为1280×800像素的摄像头，标定板为10×7的棋盘格，各格边长28mm×28mm，其采集20张像素尺寸为480×640图片，导入到MATLAB工具箱中进行摄像机标定。The camera on the robotic arm has a maximum resolution of 1280×800 pixels, the calibration board is a 10×7 checkerboard, and the side length of each grid is 28mm×28mm. It collects 20 pictures with a pixel size of 480×640 and imports them into the MATLAB tool Camera calibration in the box.

MATLAB工具箱标定实验中，需对相机标定结果进行评估，可以通过相机的投影误差分析、以摄像机为中心和以标定板为中心的三维外部参数可视化进行分析，标定过程具体如图7所示。In the calibration experiment of the MATLAB toolbox, the camera calibration results need to be evaluated, which can be analyzed through the projection error analysis of the camera and the visualization of the three-dimensional external parameters centered on the camera and the calibration plate. The specific calibration process is shown in Figure 7.

在MATLAB工具箱标定实验后，在其命令行中输入cameraParams.IntrinsicMatrix，可以求出相机内参数矩阵，输入RadialDistortion和TangentialDistortion可以求出相机的径向畸变和切向畸变系数，具体相机的物理参数如表1所示。After calibrating the experiment in the MATLAB toolbox, enter cameraParams.IntrinsicMatrix in its command line to obtain the camera's internal parameter matrix, and enter RadialDistortion and TangentialDistortion to obtain the radial distortion and tangential distortion coefficients of the camera. The specific physical parameters of the camera are as follows shown in Table 1.

SURF特征点目标检测与位姿解算实验SURF feature point target detection and pose calculation experiment

为了验证提出算法对伺服机械臂抓取物品的有效性，完成位姿检测实验，该实验包括SURF不变特征点的目标检测实验和位姿解算实验。同时为使实验更具说服力，笔者完成单一背景下和复杂背景(没有遮挡)下的两组位姿检测对比实验，观察复杂背景下是否会影响位姿检测的算法实验效果。分别取五组单一背景和五组复杂背景下的位姿检测实验的实验结果进行分析对比及评价，实验结果分析表，如表2所示。单一背景下位姿检测实验过程和复杂背景下位姿检测实验过程，如图8和9所示。In order to verify the effectiveness of the proposed algorithm for grasping objects by the servo manipulator, the pose detection experiment was completed, which included the target detection experiment of SURF invariant feature points and the pose solution experiment. At the same time, in order to make the experiment more convincing, the author completed two sets of pose detection comparison experiments under a single background and a complex background (without occlusion) to observe whether the complex background will affect the algorithm experiment effect of pose detection. The experimental results of the pose detection experiments under five groups of single backgrounds and five groups of complex backgrounds are respectively taken for analysis, comparison and evaluation. The analysis table of the experimental results is shown in Table 2. The experimental process of pose detection in a single background and the experimental process of pose detection in a complex background are shown in Figures 8 and 9.

表1相机标定结果Table 1 Camera calibration results

实验表明，单一背景中的SURF不变特征点提取的特征点数相对较多，但易出现交叉点；复杂背景中的提取的特征点数虽较少，但出现的交叉点也较少；二者在特征点匹配的正确率、匹配时间上相当，在匹配效果上单一背景的总体稍好于复杂背景，但从整体的实验结果而言，复杂背景下该算法可以满足检测到物品。无论是在单一背景下还是在复杂背景下，其SURF不变特征点匹配实验的特征点匹配速度和效果都相对准确稳定，并且可以匹配到物品在o-xy平面上的任意偏转角度，即物品在二维平面上的位姿，利用图像算法和提出的位姿解算算法可以求解出两个理论的机械手指夹持点A和B，进而得到位姿角α。Experiments show that the number of feature points extracted from SURF invariant feature points in a single background is relatively large, but the intersection points are prone to occur; although the number of extracted feature points in complex backgrounds is small, there are also fewer intersection points. The correct rate and matching time of feature point matching are comparable. The overall matching effect of a single background is slightly better than that of a complex background, but from the overall experimental results, the algorithm can meet the requirements of detecting objects in a complex background. Whether in a single background or in a complex background, the feature point matching speed and effect of the SURF invariant feature point matching experiment are relatively accurate and stable, and can be matched to any deflection angle of the item on the o-xy plane, that is, the item For the pose on the two-dimensional plane, the two theoretical gripping points A and B of the robotic finger can be solved by using the image algorithm and the proposed pose calculation algorithm, and then the pose angle α can be obtained.

通过物品的抓取实验来验证解算出的位姿角α的准确性，其中抓取实验是以手抓为参考对象，取顺、逆时针物品状态各6张，其像素尺寸为640×480像素，是其在复杂背景下的二维坐标系的位姿状态。由相机标定实验和SURF特征点匹配实验可求出相机的内参数u0,v0,dx,dy及机械手抓抓取物品的两个夹持点A和B，代入公式(8)解出最佳位姿角α，即机械臂末端手抓应旋转角度β。解算出的位姿角α和机械臂末端手抓旋转的角度β存在一定的误差，为验证该误差是否会影响物品抓取，列出位姿角与手抓旋转角度误差分析表，如表3所示。The accuracy of the calculated pose angle α is verified through the grasping experiment of the object. The grasping experiment is based on the hand grasping as the reference object, taking six clockwise and anti-clockwise object states, and the pixel size is 640×480 pixels. , is the pose state of the two-dimensional coordinate system in the complex background. From the camera calibration experiment and the SURF feature point matching experiment, the internal parameters u0, v0, dx, dy of the camera and the two gripping points A and B of the manipulator grasping the object can be obtained, and the optimal position can be solved by substituting into formula (8). Attitude angle α, that is, the hand grasping angle β at the end of the manipulator should be rotated. There is a certain error between the calculated pose angle α and the angle β of the grip rotation at the end of the manipulator. In order to verify whether the error will affect the grasping of the item, the error analysis table of the pose angle and the grip rotation angle is listed, as shown in Table 3. shown.

表2单一背景下与复杂背景下的位姿检测结果分析Table 2 Analysis of pose detection results in single background and complex background

在表3中，位姿角α与输出给机械手抓旋转角度β存在一定的误差，分析出主要误差来源是双臂机器人机械臂的末端手抓中舵机旋转的精度，该舵机的旋转精度只能精确到0.1位，属于系统的绝对误差，其不可避免；经过实验抓取发现其相对误差来源于光线、物品表面的纹理等。实验抓取过程中发现，误差小于1～2°，不影响物品的抓取。为避免机械臂在抓取中出现碰撞，笔者先在PC端仿真抓取物品模拟，最后将编译的代码下载到双臂机器系统中。选取的12组物品抓取实验，良好的验证了提出算法的有效性，为视觉抓取提供了准确性。位姿解算与抓取物品实验，如图10所示。In Table 3, there is a certain error between the pose angle α and the rotation angle β output to the manipulator, and it is analyzed that the main source of the error is the rotation accuracy of the steering gear in the grip at the end of the double-arm robot manipulator. The rotation accuracy of the steering gear It can only be accurate to 0.1 bit, which belongs to the absolute error of the system, which is inevitable; after experimental grasping, it is found that the relative error comes from the light, the texture of the surface of the object, etc. During the experimental grasping process, it is found that the error is less than 1 to 2°, which does not affect the grasping of items. In order to avoid the collision of the robotic arm during grasping, the author first simulates the grasping of objects on the PC side, and finally downloads the compiled code to the dual-arm robotic system. The selected 12 groups of object grasping experiments have well verified the effectiveness of the proposed algorithm and provided accuracy for visual grasping. The pose calculation and grasping experiments are shown in Figure 10.

利用单目视觉系统测量物品的最佳位姿检测算法，该算法将单目视觉与伺服机械臂相结合，通过相机标定，图像采集、图像预处理等过程，实现SURF不变特征点匹配到目标物体，完成目标检测。通过摄像机成像原理，计算出从像素坐标系到图像坐标系的位置变换关系，利用两点间求斜率的几何关系进而求出目标物体的最佳位姿角，即伺服机械手抓应旋转的角度，本算法简单易行，可有效避免双目视觉位姿测量过程中图像匹配资源占用问题。实验验证了该算法的有效性和准确性，可以完成物品最佳位姿检测及机械手抓抓持点的确定，满足视觉伺服机械臂抓取要求。同时，该测量算法可以应用到工业零件定位、码垛机器人搬运等领域，具有广阔的应用前景。Using the monocular vision system to measure the best pose detection algorithm of the object, the algorithm combines monocular vision and servo manipulator, through the process of camera calibration, image acquisition, image preprocessing, etc., to achieve SURF invariant feature points matching to the target object, complete target detection. Through the principle of camera imaging, the position transformation relationship from the pixel coordinate system to the image coordinate system is calculated, and the optimal pose angle of the target object is obtained by using the geometric relationship between the two points to find the slope, that is, the angle that the servo manipulator should rotate. The algorithm is simple and easy to implement, and can effectively avoid the problem of image matching resource occupancy in the process of binocular vision pose measurement. Experiments have verified the effectiveness and accuracy of the algorithm, which can complete the detection of the best pose of the object and the determination of the grasping point of the manipulator, which meets the requirements of the visual servo manipulator for grasping. At the same time, the measurement algorithm can be applied to industrial parts positioning, palletizing robot handling and other fields, and has broad application prospects.

表3位姿角与手抓旋转角度误差分析表Table 3 Error analysis table of pose angle and hand rotation angle

以上所述仅是本发明的优选实施方式，应当指出：对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above is only the preferred embodiment of the present invention, it should be pointed out that: for those skilled in the art, without departing from the principle of the present invention, several improvements and modifications can also be made, and these improvements and modifications are also It should be regarded as the protection scope of the present invention.

Claims

1. The method for detecting the best position and posture of an object grasped by a visual servo manipulator is characterized in that: comprising the following steps,

Step 1, read the photo;

Step 2, extract SURF feature points;

Step 3, generate the feature vector of the image according to the SURF feature points;

Step 4: Preliminarily establish a matching pair with outliers;

Step 5: Prevent affine changes and remove outliers that do not satisfy the changes;

Step 6, obtaining the polygon frame of the target object;

Step 7: Calculation of the optimal pose in the two-dimensional coordinate system;

Step 8: Servo control the robotic arm to grab the item;

In step 1, the camera imaging model is used for photo shooting and camera calibration; the camera imaging model is composed of the world coordinate system O _w -X _w Y _w Z _w , the camera coordinate system O _c -X _c Y _c Z _c , the pixel coordinates The system O _p -uv and the image coordinate system O _i -X _i Y _i are composed; where P is a point in the camera coordinate system, and the coordinates are (X _c , Y _c , Z _c ); P' is the imaging point in the image , the coordinates in the image coordinate system are (x, y), and the pixel coordinate system is (u, v); the camera calibration is to determine the relationship between the camera coordinate system, the image coordinate system, the pixel coordinate system and the real coordinate system;

In step 2, the following sub-steps are included,

Sub-step 1, establish an integral image; calculate the sum of pixels in any area by simple addition and subtraction, and the calculation formula of the sum of gray values of all pixels in the rectangle ABCD is:

Σ=A-B-C+D

Sub-step 2, establish a Hessian matrix; a pixel point (x, y), let the Hessian matrix of the function f(x, y) be defined as:

It can be seen that the H matrix is composed of the second-order partial derivative of the function f, and each pixel point can solve an H matrix; in the formula: L _xx (x,σ) is the function f and the Gaussian second-order derivative of the scale σ. Convolution is defined as follows:

The definitions are similar for L _xy (x,σ) and L _yy (x,σ);

Sub-step 3, establish a scale space; the detection algorithm and extraction of SUFR feature extreme points are based on the scale space theory; by changing the scale of the Gaussian filter, not changing the image itself, the response to different scales is formed;

Sub-step 4, extract feature points; use non-extreme values to remove certain feature points, compare the size of each pixel point of the Hessian determinant image with the 26 points in the 3-dimensional neighborhood, set a threshold, and use 3-dimensional After the linear interpolation method obtains the sub-pixel level feature points, the points whose feature values are less than the threshold are removed to obtain more stable points;

Sub-step 5, select the direction of the feature point; take the feature point as the center, detect the point of interest to determine the scale s, count the Haar wavelet responses of all points in the sector, and give greater weight to the pixels close to the feature point; count all the points in the area. The sum of the Haar wavelet responses forms a vector, that is, the direction of the fan; traverse all the fan regions, and select the longest vector direction as the direction of the feature point;

Sub-step six, construct the feature point descriptor of SURF; take the feature point as the center, construct a square window with 20s as the side length, s is the scale where the feature point is located; divide it into 16 4×4 sub-regions, The sum of the Haar wavelet features of horizontal x and vertical y of 25 pixel elements in each sub-region is ΣH _x and ΣH _y respectively; the 4-dimensional descriptor V=[ΣH _x ,ΣH _y of each sub-region is counted ,∑|H _x |,∑|H _y |], the feature descriptor whose feature vector length is 64 dimensions is obtained;

In step seven,

Convert the positional relationship of the image in the pixel coordinate system _Op-uv to the physical coordinates O _i -X _i Y _i of the image, the purpose is to calculate the geometric relationship in the physical coordinate system of the image, and the conversion formula is in the form of a matrix ( 4):

where (u, v) are the pixels in the horizontal and vertical directions, let O _i (u ₀ , v ₀ ) be the pixel at the center of the coordinate system of the image, and dx and dy are the x and y axes of each pixel The actual physical size in the direction can be known from the calibration of the camera, u ₀ , v ₀ , dx, dy; in the rectangular frame CDEF of the item matched by the SURF invariant feature point algorithm, the center of mass O ₄ (u 4 (u _s , v _s ) and the pixel coordinates of CDEF, according to the geometric relationship, the pixel coordinates of the grasping points A (u ₁ , v ₁ ) and B (u ₂ , v ₂ ) of the item can be obtained, and the matched item rectangle frame;

Bring the pixel coordinates of the grasping points A(u ₁ , v ₁ ) and B(u ₂ , v ₂ ) of the object into formula (4) to derive the physical coordinates of A and B, such as formula (5):

and formula (6):

The obtained image physical coordinates of A(x ₁ , y ₁ ) and B(x ₂ , y ₂ ) can solve the included angle α, that is, the optimal pose angle. In theory, the rotation angle of the servo to the manipulator should also be α, but the rotation angle is less than or greater than α due to the systematic error of the manipulator, so the rotation angle is taken as β; the clockwise rotation of the manipulator grasping the relative object is positive, and the counterclockwise rotation is negative. The slope of the geometric relationship between can be calculated, and the formula (7) can be obtained:

Formula (8) can be obtained from formula (5) (6) (7):

Find the angle α in formula (8), the value range is [0, 90°], if it is positive, the robot rotates clockwise, and if it is negative, it rotates counterclockwise; α is the pose angle of the object in the two-dimensional plane, which is also the robot grasp. The clamping angle of the rotation, the clamping points are A and B; from formula (8), dx and dy are internal parameters of the camera, so α is only the difference between the pixel points of the clamping points A and B. Related, simplifies the complex calculations in the image, thereby improving the visual servo effect.