CN112045676A - Method for grabbing transparent object by robot based on deep learning - Google Patents

Method for grabbing transparent object by robot based on deep learning Download PDF

Info

Publication number
CN112045676A
CN112045676A CN202010755192.2A CN202010755192A CN112045676A CN 112045676 A CN112045676 A CN 112045676A CN 202010755192 A CN202010755192 A CN 202010755192A CN 112045676 A CN112045676 A CN 112045676A
Authority
CN
China
Prior art keywords
coordinate system
camera
robot
depth
grabbing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010755192.2A
Other languages
Chinese (zh)
Inventor
雷渠江
徐杰
李秀昊
桂光超
王雨禾
潘艺芃
周纪民
王卫军
韩彰秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Institute of Advanced Technology of CAS
Original Assignee
Guangzhou Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Institute of Advanced Technology of CAS filed Critical Guangzhou Institute of Advanced Technology of CAS
Priority to CN202010755192.2A priority Critical patent/CN112045676A/en
Publication of CN112045676A publication Critical patent/CN112045676A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for grabbing a transparent object by a robot based on deep learning, which comprises the following steps: s1: completing the establishment of a hardware environment of a system for grabbing the transparent object by the robot; s2: completing the calibration of a camera of a system for grabbing the transparent object by the robot; s3: and finishing the training of a grasping planning model based on the convolutional neural network and the grasping of the robot in a real environment. The specific implementation method of S3 includes: utilizing a depth camera to scan and capture a color image and a depth image of a transparent object; filtering the acquired image; completing transparent object detection and segmentation by using a ClearGrasp depth learning algorithm; and searching and scoring the grabbing position of the object by using a contact line searching method, and accurately grabbing the object after the optimal grabbing position is obtained. The method can accurately predict the 3D data of the high-transparency object through the RGB-D camera, accurately calculate the normal of the curved surface of the transparent object through the reflected light spot, and improve the prediction accuracy of the transparent object.

Description

Method for grabbing transparent object by robot based on deep learning
Technical Field
The invention relates to the technical field of robot grabbing, in particular to a method for grabbing a transparent object by a robot based on deep learning.
Background
For the service robot, it is most important to be able to grasp the target object more quickly and accurately in a home environment, and only then, the service robot can help people with mobility disabilities better. The key of successful grabbing is the identification and positioning of targets, and at present, a vision sensor is generally adopted on a robot to identify objects. Among a plurality of grabbed objects, transparent objects are very common in life, and whether the transparent objects can be effectively identified and positioned plays a crucial role in the grabbing efficiency of the objects. However, when the robot recognizes the transparent object by using vision, the transparent object area is sensitive to light changes, and does not have enough texture features to extract, and has the reasons of dependence on the background environment, external influence on the strength gradient features and the like, so that the recognition of the transparent object is always a problem which is difficult to effectively solve.
At present, the commonly used transparent object detection methods include a non-visual method and transparent object detection based on an RGB two-dimensional image. Among them, the non-visual method is complicated to use, and makes the robot cost very high, is inconvenient for the service robot to use; the two-dimensional object obtained by the RGB image method has weak robustness, the detection condition is harsh, and the spatial position of the object cannot be obtained.
Disclosure of Invention
In view of the above, there is a need to provide a method for a robot to grab a transparent object based on deep learning, that is, a three-dimensional geometry of the transparent object is accurately estimated from an RGB-D image by using a deep learning method for the robot to operate, so as to solve the task of grabbing the transparent object in a home scene by the robot.
In order to realize the purpose, the invention is realized according to the following technical scheme:
a method for grabbing a transparent object by a robot based on deep learning comprises the following steps:
step S1: completing the establishment of a hardware environment of a system for grabbing the transparent object by the robot;
step S2: completing the calibration of a camera of a system for grabbing the transparent object by the robot;
step S3: and finishing the training of a grasping planning model based on the convolutional neural network and the grasping of the robot in a real environment.
Further, the hardware environment of the robot grasping transparent object system comprises a depth camera, at least one computer with ROS dynamics, at least one robot with a gripper, and at least one object to be grasped;
the depth camera is used for acquiring 3D visual data and is installed on the robot;
the computer is used for finishing the training of grabbing the network model;
the robot is used for grabbing an object to be grabbed.
Further, when the camera shoots an object, the camera captures a depth image and a color image at the same time, when the camera is calibrated, the color image and the depth image need to be calibrated, and each pixel point of the depth image corresponds to each pixel point of the color image through calibration, where the step S2 specifically includes the following steps:
step S21: determining internal parameters and external parameters of a binocular camera through camera calibration, and completing the transformation from a world coordinate system to a camera coordinate system;
step S22: and determining the relative position between the camera and the end effector through hand-eye calibration, and finishing the transformation of a camera coordinate system and a robot end effector coordinate system.
Further, the specific implementation method of step S21 includes:
the transformation of the world coordinate system into the camera coordinate system is described using the rotation matrix R and the translation vector T, as shown in equation (1):
Figure BDA0002611308060000031
in the formula (1), R1、T1Is an external reference of the Levoeye camera, R2、T2Is external reference of the right eye camera, which is obtained by camera calibration, (X)W,YW,ZW) Is the coordinate of a point in space under the world coordinate system, (X)1,Y1,Z1) Is the coordinate of a point in space under the coordinate system of the eye-lens camera (X)2,Y2,Z2) A point in space is a coordinate under a coordinate system of a right-eye camera;
taking the left eye camera coordinate system as a reference, taking the rotation matrix from the right eye camera coordinate system to the left eye camera coordinate system as R ', taking the translation vector as T', then:
Figure BDA0002611308060000032
according to formula (1) and formula (2):
Figure BDA0002611308060000033
the position of the calibration plate is kept unchanged when the binocular camera is used for shooting, the left-eye camera and the right-eye camera shoot images of the calibration plate at the same time, a plurality of groups of image pairs are collected and then led into the tool box, the tool box automatically calculates a rotation matrix and a translation vector between the two cameras, and the rotation matrix and the translation vector are used for completing the transformation from a world coordinate system to a camera coordinate system.
Further, the specific implementation method of step S22 includes:
the method comprises the following steps of solving transformation from a camera coordinate system to a robot end effector coordinate system through hand-eye calibration, wherein a hand represents an end effector, an eye represents a camera, and in the hand-eye calibration process, 4 coordinate systems are involved, namely a calibration plate coordinate system B, a camera coordinate system C, an end effector coordinate system T and a robot base coordinate system R;
using transformation matrices
Figure BDA0002611308060000041
Describing the transformation of the calibration plate coordinate system B to the robot base coordinate system R,
Figure BDA0002611308060000042
is represented as follows:
Figure BDA0002611308060000043
in the formula (4), the reaction mixture is,
Figure BDA0002611308060000044
expressing a transformation matrix from the coordinate system B of the calibration plate to the coordinate system C of the camera, namely camera external parameters, and obtaining the transformation matrix through camera calibration;
Figure BDA0002611308060000045
a transformation matrix representing the coordinate system T of the end effector to the coordinate system R of the robot base is obtained through parameters on the robot demonstrator;
Figure BDA0002611308060000046
a hand-eye matrix to be solved is obtained;
in the calibration process, the position of the calibration plate is kept unchanged, the robot is controlled to shoot images of the calibration plate from different positions, and two positions are selected for analysis, so that the following formula (5) can be obtained:
Figure BDA0002611308060000047
in the formula (5), the reaction mixture is,
Figure BDA0002611308060000048
calibration board for respectively representing position i and position i +1A transformation matrix from coordinate system B to robot base coordinate system R,
Figure BDA0002611308060000049
respectively representing transformation matrixes from a position i and a position i +1 of the end effector coordinate system T to a robot base coordinate system R,
Figure BDA00026113080600000410
respectively representing the hand-eye matrix to be solved at the position i and the position i +1,
Figure BDA00026113080600000411
respectively representing transformation matrixes from a position i and a position i +1 calibration board coordinate system B to a camera coordinate system C; because the relative position between the calibration plate and the robot base is not changed, and the relative position between the robot end effector and the camera is not changed, the method comprises the following steps
Figure BDA00026113080600000412
This is obtained simultaneously for formula (6):
Figure BDA00026113080600000413
in the formula (6), the reaction mixture is,
Figure BDA00026113080600000414
are all known quantities, and are finally solved to obtain
Figure BDA00026113080600000415
I.e. a transformation matrix from the camera coordinate system to the robot end effector coordinate system.
Further, the specific implementation method of step S3 includes:
s31: utilizing a depth camera to scan and capture a color image and a depth image of a transparent object;
s32: filtering the acquired image;
s33: completing transparent object detection and segmentation by using a ClearGrasp deep learning algorithm;
s34: and searching and scoring the grabbing position of the object by using a contact line searching method, and accurately grabbing the object after the optimal grabbing position is obtained.
Further, in step S32, a gaussian filtering algorithm with balanced speed and effect is selected to filter the acquired image, where the gaussian filtering formula is shown in equation (7):
Figure BDA0002611308060000051
in equation (7), f (x, y) represents a gaussian function value, the squares of x and y represent the distances between other pixels in the neighborhood and the center pixel in the neighborhood, respectively, and σ represents a standard deviation.
Further, the specific implementation method of step S33 includes:
predicting a surface normal, identifying a boundary and segmenting a transparent object from the filtered image by adopting a ClearGrasp deep learning method, wherein the segmented mask is used for modifying the input depth image; then, the depth of all the surfaces of the high-transparency objects in the scene is reconstructed by using a global optimization algorithm, and the edges, the occlusion and the segmentation of the 3D reconstruction are optimized by using the predicted surface normal.
Further, in step S33, the cleargraph includes 3 neural networks, and the outputs of the 3 neural networks are integrated for global optimization;
the 3 neural networks include: a transparent object segmentation network, an edge identification network and a surface normal vector estimation network;
transparent object segmentation network: inputting a single RGB picture, and outputting a pixel Mask of a transparent object in a scene, namely judging that each pixel point belongs to a transparent or non-transparent object, and removing the pixel judged as the transparent object in subsequent optimization to obtain a modified depth map;
edge identification network: for a single RGB picture, outputting information of a shielding edge and a connected edge, which helps a network to better distinguish different edges in the picture and make more accurate prediction on the edge with discontinuous depth;
surface normal vector estimation: using the RGB picture as input, and performing L2 regularization on the output;
reconstructing the three-dimensional surface of the missing depth area of the transparent object by using the global optimization algorithm, filling the removed depth area by using the normal vector of the surface of the predicted transparent object, and observing the depth discontinuity of the information displayed by the shielding edge, wherein the depth discontinuity is expressed by the following formula:
E=λDEDSESNENB (8)
in the formula (8), E represents the predicted depth, EDDistance representing predicted depth and observed original depth, ESRepresenting depth differences of adjacent points, ENDenotes the consistency of the normal vector of the predicted depth and the predicted surface, B denotes the boundary occlusion based on whether the pixel occludes the boundary, lambdaD、λS、λNRepresenting the correlation coefficient.
Further, in step S34, the direction of the best capture position is the main direction of the object image gradient, the main position extraction is performed on the depth image of the object to increase the speed of selecting the capture position, that is, gradient values are calculated on the x-axis and the y-axis, respectively, and the gradient direction of each pixel is calculated and arranged and counted through a histogram, wherein the method for calculating the object gradient and calculating the gradient direction is as follows:
using [ -1,0,1 [ ]]And [ -1,0,1 [ -1]TThe two convolution kernels perform two-dimensional convolution on the image to calculate the object gradient;
the gradient magnitude and direction are calculated as follows:
Figure BDA0002611308060000061
Figure BDA0002611308060000062
in the above formula, gxAnd gyRespectively representing gradient values in x and y directions, g representing gradient magnitude, and theta representing gradient direction;
after obtaining the gradient, a threshold value g is setThreshAt 250 f, the robot has enough depth to place the splint for effective grasping only if the gradient is greater than the threshold value, i.e. the robot has sufficient depth to place the splint for effective grasping
Figure BDA0002611308060000063
In the process of grabbing a transparent object by a robot, two contact lines exist when a clamping jaw is in contact with the object, and the conditions for selecting the proper contact lines are as follows:
the gradient directions of two contact lines are basically opposite;
the distance between the two contact lines does not exceed the maximum opening distance of the gripper;
the depth of the two contact lines is not more than 1/2 of the maximum depth in the clamping jaw;
the depth difference between the shallowest point in the area contained between the two contact lines and the shallowest point of the contact line does not exceed the internal depth of the clamping jaw;
the following formula was used to evaluate the grasping reliability of a pair of contact wires:
Figure BDA0002611308060000071
wherein G represents the grasping reliability,/1、l2Respectively showing the lengths of two contact lines of the clamping jaw and the transparent object to be grabbed, L showing the width of the clamping jaw,
Figure BDA0002611308060000072
for the purpose of evaluating the length of the contact line,
Figure BDA0002611308060000073
evaluation of the ratio of the lengths of the two contact lines,/maxIndicating the long strip in the contact line, lminWhich represents the short strip of the strip,
Figure BDA0002611308060000074
for evaluating contact line fitting degree of paw,dlRepresenting the shallowest point of the contact line, dsRepresenting the shallowest point in the rectangular frame area, and using sin theta to evaluate the error degree of two contact lines, wherein theta is an acute angle formed by a connecting line of the midpoints of the two contact lines and the contact lines;
all contact line combinations are traversed through equation (12), and the combination with the highest score is selected as the best grasping position.
The invention has the advantages and positive effects that: aiming at the problem that the transparent object is difficult to grasp, the invention provides a clearGrasp-based deep learning algorithm which is characterized in that 3D data of a high-transparency object can be accurately predicted through an RGB-D camera.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for grabbing a transparent object by a robot based on deep learning according to the present invention;
FIG. 2 is a system hardware diagram of the robot based on deep learning for grabbing transparent objects according to the present invention;
fig. 3 is a schematic diagram of a cleargrass algorithm model network structure according to the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It should be noted that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by those skilled in the art without any inventive work based on the embodiments of the present invention belong to the protection scope of the present invention.
Examples
Fig. 1 is a schematic flow chart of a method for grabbing a transparent object by a robot based on deep learning according to the present invention, and as shown in fig. 1, the present invention provides a method for grabbing a transparent object by a robot based on deep learning, which includes the following steps:
step S1: completing the establishment of a hardware environment of a system for grabbing the transparent object by the robot;
step S2: completing the calibration of a camera of a system for grabbing the transparent object by the robot;
step S3: and finishing the training of a grasping planning model based on the convolutional neural network and the grasping of the robot in a real environment.
Specifically, the hardware environment of the system for grabbing the transparent object by the robot is shown in fig. 2, and comprises an Inter Realsense depth camera, at least one ROS dynamic Ubantu18.04 computer, at least one UR5 robot with a gripper and at least one object to be grabbed;
the Inter Realsense depth camera is used for collecting 3D visual data and is installed on the UR5 robot;
the Ubantu18.04 computer is used for finishing the training of grabbing the network model;
the UR5 robot is used to grab objects to be grabbed.
Specifically, when the depth camera shoots an object, the depth camera captures a depth image and a color image at the same time, when the camera is calibrated, the color image and the depth image need to be calibrated, and each pixel point of the depth image corresponds to each pixel point of the color image through calibration, where the step S2 specifically includes the following steps:
step S21: determining internal parameters and external parameters of a binocular camera through camera calibration, and completing the transformation from a world coordinate system to a camera coordinate system;
step S22: and determining the relative position between the camera and the end effector through hand-eye calibration, and finishing the transformation of a camera coordinate system and a robot end effector coordinate system.
Specifically, the method for implementing step S21 includes:
the transformation of the world coordinate system into the camera coordinate system is described using the rotation matrix R and the translation vector T, as shown in equation (1):
Figure BDA0002611308060000091
in the formula (1), R1、T1Is an external reference of the Levoeye camera, R2、T2Is an external reference of the right eye camera, which can be obtained by camera calibration, (X)W,YW,ZW) Is the coordinate of a point in space under the world coordinate system, (X)1,Y1,Z1) Is the coordinate of a point in space under the coordinate system of the eye-lens camera (X)2,Y2,Z2) A point in space is a coordinate under a coordinate system of a right-eye camera;
taking the left eye camera coordinate system as a reference, taking the rotation matrix from the right eye camera coordinate system to the left eye camera coordinate system as R ', taking the translation vector as T', then:
Figure BDA0002611308060000101
according to formula (1) and formula (2):
Figure BDA0002611308060000102
the position of the calibration plate is kept unchanged when the binocular camera is used for shooting, the left-eye camera and the right-eye camera shoot images of the calibration plate at the same time, a plurality of groups of image pairs are collected and then led into a tool kit of Matlab, the tool kit automatically calculates a rotation matrix and a translation vector between the two cameras, and the transformation from a world coordinate system to a camera coordinate system can be completed by using the rotation matrix and the translation vector.
Specifically, the method for implementing step S22 includes:
the method comprises the following steps of solving transformation from a camera coordinate system to a robot end effector coordinate system through hand-eye calibration, wherein a hand represents an end effector, an eye represents a camera, and in the hand-eye calibration process, 4 coordinate systems are involved, namely a calibration plate coordinate system B, a camera coordinate system C, an end effector coordinate system T and a robot base coordinate system R;
using transformation matrices
Figure BDA0002611308060000107
Describing the transformation of the calibration plate coordinate system B to the robot base coordinate system R,
Figure BDA0002611308060000108
is represented as follows:
Figure BDA0002611308060000103
in the formula (4), the reaction mixture is,
Figure BDA0002611308060000104
a transformation matrix representing the coordinate system B of the calibration plate to the coordinate system C of the camera, namely camera external parameters, can be obtained through camera calibration;
Figure BDA0002611308060000105
a transformation matrix representing the coordinate system T of the end effector to the coordinate system R of the robot base can be obtained through parameters on the robot demonstrator;
Figure BDA0002611308060000106
a hand-eye matrix to be solved is obtained;
in the calibration process, the position of the calibration plate is kept unchanged, the robot is controlled to shoot images of the calibration plate from different positions, and two positions are selected for analysis, so that the following formula (5) can be obtained:
Figure BDA0002611308060000111
in the formula (5), the reaction mixture is,
Figure BDA0002611308060000112
respectively representing transformation matrixes from a coordinate system B of the calibration board at the position i and a coordinate system R of the robot base at the position i +1,
Figure BDA0002611308060000113
respectively representing transformation matrixes from a position i and a position i +1 of the end effector coordinate system T to a robot base coordinate system R,
Figure BDA0002611308060000114
respectively representing the hand-eye matrix to be solved at the position i and the position i +1,
Figure BDA0002611308060000115
respectively representing transformation matrixes from a position i and a position i +1 calibration board coordinate system B to a camera coordinate system C; because the relative position between the calibration plate and the robot base is not changed, and the relative position between the robot end effector and the camera is not changed, the method comprises the following steps
Figure BDA0002611308060000116
This is obtained simultaneously for formula (6):
Figure BDA0002611308060000117
in the formula (6), the reaction mixture is,
Figure BDA0002611308060000118
and
Figure BDA0002611308060000119
are all known quantities, and are finally solved to obtain
Figure BDA00026113080600001110
I.e. a transformation matrix from the camera coordinate system to the robot end effector coordinate system.
Specifically, the method for implementing step S3 includes:
s31: utilizing a RealSense RGB-D camera to scan and capture a color image and a depth image of a transparent object;
s32: filtering the acquired image;
s33: completing transparent object detection and segmentation by using a ClearGrasp deep learning algorithm;
s34: and searching and scoring the grabbing position of the object by using a contact line searching method, and accurately grabbing the object after the optimal grabbing position is obtained.
Specifically, in step S32, a gaussian filtering algorithm with balanced speed and effect is selected to filter the acquired image, where the gaussian filtering formula is shown in equation (7):
Figure BDA00026113080600001111
in equation (7), f (x, y) represents a gaussian function value, the squares of x and y represent the distances between other pixels in the neighborhood and the center pixel in the neighborhood, respectively, and σ represents a standard deviation.
Specifically, a cleargraph deep learning algorithm model network structure is shown in fig. 3, and the specific implementation method of step S33 includes:
predicting a surface normal, identifying a boundary and segmenting a transparent object from the filtered image by adopting a ClearGrasp deep learning method, wherein the segmented mask is used for modifying the input depth image; then, the depth of all the surfaces of the high-transparency objects in the scene is reconstructed by using a global optimization algorithm, and the edges, the occlusion and the segmentation of the 3D reconstruction are optimized by using the predicted surface normal.
Specifically, in step S33, the cleargraph includes 3 neural networks, and the outputs of the 3 neural networks are integrated for global optimization;
the 3 neural networks include: a transparent object segmentation network, an edge identification network and a surface normal vector estimation network;
transparent object segmentation network: inputting a single RGB picture, and outputting a pixel Mask of a transparent object in a scene, namely judging that each pixel point belongs to a transparent or non-transparent object, and removing the pixel judged as the transparent object in subsequent optimization to obtain a modified depth map;
edge identification network: for a single RGB picture, outputting information of a shielding edge and a connected edge, which helps a network to better distinguish different edges in the picture and make more accurate prediction on the edge with discontinuous depth;
surface normal vector estimation: using the RGB picture as input, and performing L2 regularization on the output;
reconstructing the three-dimensional surface of the missing depth area of the transparent object by using the global optimization algorithm, filling the removed depth area by using the normal vector of the surface of the predicted transparent object, and observing the depth discontinuity of the information displayed by the shielding edge, wherein the depth discontinuity can be expressed by the following formula:
E=λDEDSESNENB (8)
in the formula (8), E represents the predicted depth, EDDistance representing predicted depth and observed original depth, ESRepresenting depth differences of adjacent points, ENDenotes the consistency of the normal vector of the predicted depth and the predicted surface, B denotes the boundary occlusion based on whether the pixel occludes the boundary, lambdaD、λS、λNRepresenting the correlation coefficient.
Specifically, in step S34, the direction of the best capture position is the main direction of the object image gradient, the main position extraction is performed on the depth image of the object to accelerate the selection speed of the capture position, that is, gradient values are calculated on the x-axis and the y-axis, respectively, and the gradient direction of each pixel is calculated, and the gradient directions are arranged and counted through a histogram, wherein the method for calculating the object gradient and calculating the gradient direction is as follows:
using [ -1,0,1 [ ]]And [ -1,0,1 [ -1]TThe two convolution kernels perform two-dimensional convolution on the image to calculate the object gradient;
the gradient magnitude and direction are calculated as follows:
Figure BDA0002611308060000131
Figure BDA0002611308060000132
in the above formula, gxAnd gyRespectively representing gradient values in x and y directions, g representing gradient magnitude, and theta representing gradient direction;
after obtaining the gradient, a threshold value g is setThreshAt 250 f, the robot has enough depth to place the splint for effective grasping only if the gradient is greater than the threshold value, i.e. the robot has sufficient depth to place the splint for effective grasping
Figure BDA0002611308060000133
In the process of grabbing a transparent object by a robot, two contact lines exist when a clamping jaw is in contact with the object, and the conditions for selecting the proper contact lines are as follows:
the gradient directions of two contact lines are basically opposite;
the distance between the two contact lines does not exceed the maximum opening distance of the gripper;
the depth of the two contact lines is not more than 1/2 of the maximum depth in the clamping jaw;
the depth difference between the shallowest point in the area contained between the two contact lines and the shallowest point of the contact line does not exceed the internal depth of the clamping jaw;
the following formula was used to evaluate the grasping reliability of a pair of contact wires:
Figure BDA0002611308060000141
wherein G represents the grasping reliability,/1、l2Respectively showing the lengths of two contact lines of the clamping jaw and the transparent object to be grabbed, L showing the width of the clamping jaw,
Figure BDA0002611308060000142
for the purpose of evaluating the length of the contact line,
Figure BDA0002611308060000143
evaluation of twoLength ratio of strip contact line, /)maxIndicating the long strip in the contact line, lminWhich represents the short strip of the strip,
Figure BDA0002611308060000144
for evaluating the contact line engaging the paw, dlRepresenting the shallowest point of the contact line, dsRepresenting the shallowest point in the rectangular frame area, and using sin theta to evaluate the error degree of two contact lines, wherein theta is an acute angle formed by a connecting line of the midpoints of the two contact lines and the contact lines;
all contact line combinations are traversed through equation (12), and the combination with the highest score is selected as the best grasping position.
The invention has the advantages and positive effects that: aiming at the problem that the transparent object is difficult to grasp, the invention provides a clearGrasp-based deep learning algorithm which is characterized in that 3D data of a high-transparency object can be accurately predicted through an RGB-D camera.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the appended claims.

Claims (10)

1. A method for grabbing a transparent object by a robot based on deep learning is characterized by comprising the following steps:
step S1: completing the establishment of a hardware environment of a system for grabbing the transparent object by the robot;
step S2: completing the calibration of a camera of a system for grabbing the transparent object by the robot;
step S3: and finishing the training of a grasping planning model based on the convolutional neural network and the grasping of the robot in a real environment.
2. The method for grabbing transparent objects by a robot based on deep learning of claim 1, wherein the hardware environment of the system for grabbing transparent objects by a robot comprises a depth camera, at least one computer with ROS dynamics, at least one robot with gripper and at least one object to be grabbed;
the depth camera is used for acquiring 3D visual data and is installed on the robot;
the computer is used for finishing the training of grabbing the network model;
the robot is used for grabbing an object to be grabbed.
3. The method for grabbing a transparent object by a robot based on deep learning of claim 1, wherein when the camera shoots the object, the camera captures a depth image and a color image at the same time, when the camera is calibrated, both the color image and the depth image need to be calibrated, and each pixel point of the depth image corresponds to each pixel point of the color image through calibration, the step S2 specifically includes the following steps:
step S21: determining internal parameters and external parameters of a binocular camera through camera calibration, and completing the transformation from a world coordinate system to a camera coordinate system;
step S22: and determining the relative position between the camera and the end effector through hand-eye calibration, and finishing the transformation of a camera coordinate system and a robot end effector coordinate system.
4. The method for grabbing the transparent object by the deep learning based robot according to claim 3, wherein the step S21 is implemented by the method comprising:
the transformation of the world coordinate system into the camera coordinate system is described using the rotation matrix R and the translation vector T, as shown in equation (1):
Figure FDA0002611308050000021
in the formula (1), R1、T1Is an external reference of the Levoeye camera, R2、T2Is external reference of the right eye camera, which is obtained by camera calibration, (X)W,YW,ZW) Is the coordinate of a point in space under the world coordinate system, (X)1,Y1,Z1) Is the coordinate of a point in space under the coordinate system of the eye-lens camera (X)2,Y2,Z2) A point in space is a coordinate under a coordinate system of a right-eye camera;
taking the left eye camera coordinate system as a reference, taking the rotation matrix from the right eye camera coordinate system to the left eye camera coordinate system as R ', taking the translation vector as T', then:
Figure FDA0002611308050000022
according to formula (1) and formula (2):
Figure FDA0002611308050000023
the position of the calibration plate is kept unchanged when the binocular camera is used for shooting, the left-eye camera and the right-eye camera shoot images of the calibration plate at the same time, a plurality of groups of image pairs are collected and then led into the tool box, the tool box automatically calculates a rotation matrix and a translation vector between the two cameras, and the rotation matrix and the translation vector are used for completing the transformation from a world coordinate system to a camera coordinate system.
5. The method for grabbing the transparent object by the deep learning based robot according to claim 3, wherein the step S22 is implemented by the method comprising:
the method comprises the following steps of solving transformation from a camera coordinate system to a robot end effector coordinate system through hand-eye calibration, wherein a hand represents an end effector, an eye represents a camera, and in the hand-eye calibration process, 4 coordinate systems are involved, namely a calibration plate coordinate system B, a camera coordinate system C, an end effector coordinate system T and a robot base coordinate system R;
using transformation matrices
Figure FDA0002611308050000031
Describing the transformation of the calibration plate coordinate system B to the robot base coordinate system R,
Figure FDA0002611308050000032
is represented as follows:
Figure FDA0002611308050000033
in the formula (4), the reaction mixture is,
Figure FDA0002611308050000034
expressing a transformation matrix from the coordinate system B of the calibration plate to the coordinate system C of the camera, namely camera external parameters, and obtaining the transformation matrix through camera calibration;
Figure FDA0002611308050000035
a transformation matrix representing the coordinate system T of the end effector to the coordinate system R of the robot base is obtained through parameters on the robot demonstrator;
Figure FDA0002611308050000036
a hand-eye matrix to be solved is obtained;
in the calibration process, the position of the calibration plate is kept unchanged, the robot is controlled to shoot images of the calibration plate from different positions, and two positions are selected for analysis, so that the following formula (5) can be obtained:
Figure FDA0002611308050000037
in the formula (5), the reaction mixture is,
Figure FDA0002611308050000038
respectively representing a position i and a position i +1 of the calibration boardA transformation matrix from the coordinate system B to the robot base coordinate system R,
Figure FDA0002611308050000039
respectively representing transformation matrixes from a position i and a position i +1 of the end effector coordinate system T to a robot base coordinate system R,
Figure FDA00026113080500000310
respectively representing the hand-eye matrix to be solved at the position i and the position i +1,
Figure FDA00026113080500000311
respectively representing transformation matrixes from a position i and a position i +1 calibration board coordinate system B to a camera coordinate system C; because the relative position between the calibration plate and the robot base is not changed, and the relative position between the robot end effector and the camera is not changed, the method comprises the following steps
Figure FDA00026113080500000312
This is obtained simultaneously for formula (6):
Figure FDA00026113080500000313
in the formula (6), the reaction mixture is,
Figure FDA00026113080500000314
and
Figure FDA00026113080500000315
are all known quantities, and are finally solved to obtain
Figure FDA00026113080500000316
I.e. a transformation matrix from the camera coordinate system to the robot end effector coordinate system.
6. The method for grabbing the transparent object by the deep learning based robot according to claim 1, wherein the step S3 is implemented by a method comprising:
s31: utilizing a depth camera to scan and capture a color image and a depth image of a transparent object;
s32: filtering the acquired image;
s33: completing transparent object detection and segmentation by using a ClearGrasp deep learning algorithm;
s34: and searching and scoring the grabbing position of the object by using a contact line searching method, and accurately grabbing the object after the optimal grabbing position is obtained.
7. The method for grabbing a transparent object by a deep learning based robot according to claim 6, wherein in step S32, a gaussian filter algorithm with balanced speed and effect is selected to filter the captured image, and the gaussian filter formula is shown in equation (7):
Figure FDA0002611308050000041
in equation (7), f (x, y) represents a gaussian function value, the squares of x and y represent the distances between other pixels in the neighborhood and the center pixel in the neighborhood, respectively, and σ represents a standard deviation.
8. The method for grabbing the transparent object by the deep learning based robot as claimed in claim 6, wherein the specific implementation method of step S33 includes:
predicting a surface normal, identifying a boundary and segmenting a transparent object from the filtered image by adopting a ClearGrasp deep learning method, wherein the segmented mask is used for modifying the input depth image; then, the depth of all the surfaces of the high-transparency objects in the scene is reconstructed by using a global optimization algorithm, and the edges, the occlusion and the segmentation of the 3D reconstruction are optimized by using the predicted surface normal.
9. The method for grabbing the transparent object by the deep learning based robot as claimed in claim 6, wherein in step S33, the cleargraph includes 3 neural networks, and the outputs of the 3 neural networks are collected for global optimization;
the 3 neural networks include: a transparent object segmentation network, an edge identification network and a surface normal vector estimation network;
transparent object segmentation network: inputting a single RGB picture, and outputting a pixel Mask of a transparent object in a scene, namely judging that each pixel point belongs to a transparent or non-transparent object, and removing the pixel judged as the transparent object in subsequent optimization to obtain a modified depth map;
edge identification network: for a single RGB picture, outputting information of a shielding edge and a connected edge, which helps a network to better distinguish different edges in the picture and make more accurate prediction on the edge with discontinuous depth;
surface normal vector estimation: using the RGB picture as input, and performing L2 regularization on the output;
reconstructing the three-dimensional surface of the missing depth area of the transparent object by using the global optimization algorithm, filling the removed depth area by using the normal vector of the surface of the predicted transparent object, and observing the depth discontinuity of the information displayed by the shielding edge, wherein the depth discontinuity is expressed by the following formula:
E=λDEDSESNENB (8)
in the formula (8), E represents the predicted depth, EDDistance representing predicted depth and observed original depth, ESRepresenting depth differences of adjacent points, ENDenotes the consistency of the normal vector of the predicted depth and the predicted surface, B denotes the boundary occlusion based on whether the pixel occludes the boundary, lambdaD、λS、λNRepresenting the correlation coefficient.
10. The method for grabbing transparent objects by a robot based on deep learning as claimed in claim 6, wherein in step S34, the direction of the best grabbing position is the main direction of the object image gradient, the main position extraction is performed on the depth image of the object to accelerate the speed of selecting the grabbing position, i.e. the gradient values are calculated on the x-axis and the y-axis, and the gradient direction of each pixel is calculated and arranged and counted by a histogram, wherein the object gradient calculation and the gradient direction calculation are performed by the following methods:
using [ -1,0,1 [ ]]And [ -1,0,1 [ -1]TThe two convolution kernels perform two-dimensional convolution on the image to calculate the object gradient;
the gradient magnitude and direction are calculated as follows:
Figure FDA0002611308050000061
Figure FDA0002611308050000062
in the above formula, gxAnd gyRespectively representing gradient values in x and y directions, g representing gradient magnitude, and theta representing gradient direction;
after obtaining the gradient, a threshold value g is setThreshAt 250 f, the robot has enough depth to place the splint for effective grasping only if the gradient is greater than the threshold value, i.e. the robot has sufficient depth to place the splint for effective grasping
Figure FDA0002611308050000063
In the process of grabbing a transparent object by a robot, two contact lines exist when a clamping jaw is in contact with the object, and the conditions for selecting the proper contact lines are as follows:
the gradient directions of two contact lines are basically opposite;
the distance between the two contact lines does not exceed the maximum opening distance of the gripper;
the depth of the two contact lines is not more than 1/2 of the maximum depth in the clamping jaw;
the depth difference between the shallowest point in the area contained between the two contact lines and the shallowest point of the contact line does not exceed the internal depth of the clamping jaw;
the following formula was used to evaluate the grasping reliability of a pair of contact wires:
Figure FDA0002611308050000064
wherein G represents the grasping reliability,/1、l2Respectively showing the lengths of two contact lines of the clamping jaw and the transparent object to be grabbed, L showing the width of the clamping jaw,
Figure FDA0002611308050000065
for the purpose of evaluating the length of the contact line,
Figure FDA0002611308050000066
evaluation of the ratio of the lengths of the two contact lines,/maxIndicating the long strip in the contact line, lminWhich represents the short strip of the strip,
Figure FDA0002611308050000067
for evaluating the contact line engaging the paw, dlRepresenting the shallowest point of the contact line, dsRepresenting the shallowest point in the rectangular frame area, and using sin theta to evaluate the error degree of two contact lines, wherein theta is an acute angle formed by a connecting line of the midpoints of the two contact lines and the contact lines;
all contact line combinations are traversed through equation (12), and the combination with the highest score is selected as the best grasping position.
CN202010755192.2A 2020-07-31 2020-07-31 Method for grabbing transparent object by robot based on deep learning Pending CN112045676A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010755192.2A CN112045676A (en) 2020-07-31 2020-07-31 Method for grabbing transparent object by robot based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010755192.2A CN112045676A (en) 2020-07-31 2020-07-31 Method for grabbing transparent object by robot based on deep learning

Publications (1)

Publication Number Publication Date
CN112045676A true CN112045676A (en) 2020-12-08

Family

ID=73601341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010755192.2A Pending CN112045676A (en) 2020-07-31 2020-07-31 Method for grabbing transparent object by robot based on deep learning

Country Status (1)

Country Link
CN (1) CN112045676A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112802105A (en) * 2021-02-05 2021-05-14 梅卡曼德(北京)机器人科技有限公司 Object grabbing method and device
CN112809679A (en) * 2021-01-25 2021-05-18 清华大学深圳国际研究生院 Method and device for grabbing deformable object and computer readable storage medium
CN113160313A (en) * 2021-03-03 2021-07-23 广东工业大学 Transparent object grabbing control method and device, terminal and storage medium
CN114049399A (en) * 2022-01-13 2022-02-15 上海景吾智能科技有限公司 Mirror positioning method combining RGBD image
CN114055501A (en) * 2021-11-17 2022-02-18 长春理工大学 Robot grabbing system and control method thereof
CN114627359A (en) * 2020-12-08 2022-06-14 山东新松工业软件研究院股份有限公司 Out-of-order stacked workpiece grabbing priority evaluation method
CN114750164A (en) * 2022-05-25 2022-07-15 清华大学深圳国际研究生院 Transparent object grabbing method and system and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105014678A (en) * 2015-07-16 2015-11-04 深圳市得意自动化科技有限公司 Robot hand-eye calibration method based on laser range finding
CN108898634A (en) * 2018-07-06 2018-11-27 张显磊 Pinpoint method is carried out to embroidery machine target pinprick based on binocular camera parallax
CN110246193A (en) * 2019-06-20 2019-09-17 南京博蓝奇智能科技有限公司 Industrial robot end camera online calibration method
CN110370286A (en) * 2019-08-13 2019-10-25 西北工业大学 Dead axle motion rigid body spatial position recognition methods based on industrial robot and monocular camera

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105014678A (en) * 2015-07-16 2015-11-04 深圳市得意自动化科技有限公司 Robot hand-eye calibration method based on laser range finding
CN108898634A (en) * 2018-07-06 2018-11-27 张显磊 Pinpoint method is carried out to embroidery machine target pinprick based on binocular camera parallax
CN110246193A (en) * 2019-06-20 2019-09-17 南京博蓝奇智能科技有限公司 Industrial robot end camera online calibration method
CN110370286A (en) * 2019-08-13 2019-10-25 西北工业大学 Dead axle motion rigid body spatial position recognition methods based on industrial robot and monocular camera

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SHREEYAK S. SAJJAN等: "《ClearGrasp:3D Shape Estimation of Transparent Objects for Manipulation》", 《ARXIV:1910.02550V2 [CS.CV] 14 OCT 2019》 *
卢岸潇: "《基于双目立体视觉的工件定位技术研究》", 《信息科技辑》 *
王俊义: "《基于视觉伺服的机器人抓取关键技术研究》", 《信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114627359A (en) * 2020-12-08 2022-06-14 山东新松工业软件研究院股份有限公司 Out-of-order stacked workpiece grabbing priority evaluation method
CN112809679A (en) * 2021-01-25 2021-05-18 清华大学深圳国际研究生院 Method and device for grabbing deformable object and computer readable storage medium
CN112802105A (en) * 2021-02-05 2021-05-14 梅卡曼德(北京)机器人科技有限公司 Object grabbing method and device
CN113160313A (en) * 2021-03-03 2021-07-23 广东工业大学 Transparent object grabbing control method and device, terminal and storage medium
CN114055501A (en) * 2021-11-17 2022-02-18 长春理工大学 Robot grabbing system and control method thereof
CN114049399A (en) * 2022-01-13 2022-02-15 上海景吾智能科技有限公司 Mirror positioning method combining RGBD image
CN114049399B (en) * 2022-01-13 2022-04-12 上海景吾智能科技有限公司 Mirror positioning method combining RGBD image
CN114750164A (en) * 2022-05-25 2022-07-15 清华大学深圳国际研究生院 Transparent object grabbing method and system and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN112045676A (en) Method for grabbing transparent object by robot based on deep learning
CN112270249B (en) Target pose estimation method integrating RGB-D visual characteristics
CN103530599B (en) The detection method and system of a kind of real human face and picture face
CN105279372B (en) A kind of method and apparatus of determining depth of building
JP4429298B2 (en) Object number detection device and object number detection method
CN108470356B (en) Target object rapid ranging method based on binocular vision
US11948344B2 (en) Method, system, medium, equipment and terminal for inland vessel identification and depth estimation for smart maritime
CN110509273B (en) Robot manipulator detection and grabbing method based on visual deep learning features
US20150317821A1 (en) Geodesic Distance Based Primitive Segmentation and Fitting for 3D Modeling of Non-Rigid Objects from 2D Images
CN115272271A (en) Pipeline defect detecting and positioning ranging system based on binocular stereo vision
WO2014044126A1 (en) Coordinate acquisition device, system and method for real-time 3d reconstruction, and stereoscopic interactive device
CN107274483A (en) A kind of object dimensional model building method
CN105894574A (en) Binocular three-dimensional reconstruction method
CN113850865A (en) Human body posture positioning method and system based on binocular vision and storage medium
CN110648362B (en) Binocular stereo vision badminton positioning identification and posture calculation method
WO2023070312A1 (en) Image processing method
CN113393439A (en) Forging defect detection method based on deep learning
CN111996883A (en) Method for detecting width of road surface
WO2024148645A1 (en) Apparatus for estimating under monocular infrared thermal imaging vision pose of object grasped by manipulator, and method thereof
CN114742888A (en) 6D attitude estimation method based on deep learning
CN116152697A (en) Three-dimensional model measuring method and related device for concrete structure cracks
CN112712059A (en) Living body face recognition method based on infrared thermal image and RGB image
CN104243970A (en) 3D drawn image objective quality evaluation method based on stereoscopic vision attention mechanism and structural similarity
JP4935769B2 (en) Plane region estimation apparatus and program
JP4918615B2 (en) Object number detection device and object number detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination