CN112045676A - Method for grabbing transparent object by robot based on deep learning - Google Patents
Method for grabbing transparent object by robot based on deep learning Download PDFInfo
- Publication number
- CN112045676A CN112045676A CN202010755192.2A CN202010755192A CN112045676A CN 112045676 A CN112045676 A CN 112045676A CN 202010755192 A CN202010755192 A CN 202010755192A CN 112045676 A CN112045676 A CN 112045676A
- Authority
- CN
- China
- Prior art keywords
- coordinate system
- camera
- robot
- depth
- grabbing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1694—Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
- B25J9/1697—Vision controlled systems
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1628—Programme controls characterised by the control loop
- B25J9/163—Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for grabbing a transparent object by a robot based on deep learning, which comprises the following steps: s1: completing the establishment of a hardware environment of a system for grabbing the transparent object by the robot; s2: completing the calibration of a camera of a system for grabbing the transparent object by the robot; s3: and finishing the training of a grasping planning model based on the convolutional neural network and the grasping of the robot in a real environment. The specific implementation method of S3 includes: utilizing a depth camera to scan and capture a color image and a depth image of a transparent object; filtering the acquired image; completing transparent object detection and segmentation by using a ClearGrasp depth learning algorithm; and searching and scoring the grabbing position of the object by using a contact line searching method, and accurately grabbing the object after the optimal grabbing position is obtained. The method can accurately predict the 3D data of the high-transparency object through the RGB-D camera, accurately calculate the normal of the curved surface of the transparent object through the reflected light spot, and improve the prediction accuracy of the transparent object.
Description
Technical Field
The invention relates to the technical field of robot grabbing, in particular to a method for grabbing a transparent object by a robot based on deep learning.
Background
For the service robot, it is most important to be able to grasp the target object more quickly and accurately in a home environment, and only then, the service robot can help people with mobility disabilities better. The key of successful grabbing is the identification and positioning of targets, and at present, a vision sensor is generally adopted on a robot to identify objects. Among a plurality of grabbed objects, transparent objects are very common in life, and whether the transparent objects can be effectively identified and positioned plays a crucial role in the grabbing efficiency of the objects. However, when the robot recognizes the transparent object by using vision, the transparent object area is sensitive to light changes, and does not have enough texture features to extract, and has the reasons of dependence on the background environment, external influence on the strength gradient features and the like, so that the recognition of the transparent object is always a problem which is difficult to effectively solve.
At present, the commonly used transparent object detection methods include a non-visual method and transparent object detection based on an RGB two-dimensional image. Among them, the non-visual method is complicated to use, and makes the robot cost very high, is inconvenient for the service robot to use; the two-dimensional object obtained by the RGB image method has weak robustness, the detection condition is harsh, and the spatial position of the object cannot be obtained.
Disclosure of Invention
In view of the above, there is a need to provide a method for a robot to grab a transparent object based on deep learning, that is, a three-dimensional geometry of the transparent object is accurately estimated from an RGB-D image by using a deep learning method for the robot to operate, so as to solve the task of grabbing the transparent object in a home scene by the robot.
In order to realize the purpose, the invention is realized according to the following technical scheme:
a method for grabbing a transparent object by a robot based on deep learning comprises the following steps:
step S1: completing the establishment of a hardware environment of a system for grabbing the transparent object by the robot;
step S2: completing the calibration of a camera of a system for grabbing the transparent object by the robot;
step S3: and finishing the training of a grasping planning model based on the convolutional neural network and the grasping of the robot in a real environment.
Further, the hardware environment of the robot grasping transparent object system comprises a depth camera, at least one computer with ROS dynamics, at least one robot with a gripper, and at least one object to be grasped;
the depth camera is used for acquiring 3D visual data and is installed on the robot;
the computer is used for finishing the training of grabbing the network model;
the robot is used for grabbing an object to be grabbed.
Further, when the camera shoots an object, the camera captures a depth image and a color image at the same time, when the camera is calibrated, the color image and the depth image need to be calibrated, and each pixel point of the depth image corresponds to each pixel point of the color image through calibration, where the step S2 specifically includes the following steps:
step S21: determining internal parameters and external parameters of a binocular camera through camera calibration, and completing the transformation from a world coordinate system to a camera coordinate system;
step S22: and determining the relative position between the camera and the end effector through hand-eye calibration, and finishing the transformation of a camera coordinate system and a robot end effector coordinate system.
Further, the specific implementation method of step S21 includes:
the transformation of the world coordinate system into the camera coordinate system is described using the rotation matrix R and the translation vector T, as shown in equation (1):
in the formula (1), R1、T1Is an external reference of the Levoeye camera, R2、T2Is external reference of the right eye camera, which is obtained by camera calibration, (X)W,YW,ZW) Is the coordinate of a point in space under the world coordinate system, (X)1,Y1,Z1) Is the coordinate of a point in space under the coordinate system of the eye-lens camera (X)2,Y2,Z2) A point in space is a coordinate under a coordinate system of a right-eye camera;
taking the left eye camera coordinate system as a reference, taking the rotation matrix from the right eye camera coordinate system to the left eye camera coordinate system as R ', taking the translation vector as T', then:
according to formula (1) and formula (2):
the position of the calibration plate is kept unchanged when the binocular camera is used for shooting, the left-eye camera and the right-eye camera shoot images of the calibration plate at the same time, a plurality of groups of image pairs are collected and then led into the tool box, the tool box automatically calculates a rotation matrix and a translation vector between the two cameras, and the rotation matrix and the translation vector are used for completing the transformation from a world coordinate system to a camera coordinate system.
Further, the specific implementation method of step S22 includes:
the method comprises the following steps of solving transformation from a camera coordinate system to a robot end effector coordinate system through hand-eye calibration, wherein a hand represents an end effector, an eye represents a camera, and in the hand-eye calibration process, 4 coordinate systems are involved, namely a calibration plate coordinate system B, a camera coordinate system C, an end effector coordinate system T and a robot base coordinate system R;
using transformation matricesDescribing the transformation of the calibration plate coordinate system B to the robot base coordinate system R,is represented as follows:
in the formula (4), the reaction mixture is,expressing a transformation matrix from the coordinate system B of the calibration plate to the coordinate system C of the camera, namely camera external parameters, and obtaining the transformation matrix through camera calibration;a transformation matrix representing the coordinate system T of the end effector to the coordinate system R of the robot base is obtained through parameters on the robot demonstrator;a hand-eye matrix to be solved is obtained;
in the calibration process, the position of the calibration plate is kept unchanged, the robot is controlled to shoot images of the calibration plate from different positions, and two positions are selected for analysis, so that the following formula (5) can be obtained:
in the formula (5), the reaction mixture is,calibration board for respectively representing position i and position i +1A transformation matrix from coordinate system B to robot base coordinate system R,respectively representing transformation matrixes from a position i and a position i +1 of the end effector coordinate system T to a robot base coordinate system R,respectively representing the hand-eye matrix to be solved at the position i and the position i +1,respectively representing transformation matrixes from a position i and a position i +1 calibration board coordinate system B to a camera coordinate system C; because the relative position between the calibration plate and the robot base is not changed, and the relative position between the robot end effector and the camera is not changed, the method comprises the following stepsThis is obtained simultaneously for formula (6):
in the formula (6), the reaction mixture is,are all known quantities, and are finally solved to obtainI.e. a transformation matrix from the camera coordinate system to the robot end effector coordinate system.
Further, the specific implementation method of step S3 includes:
s31: utilizing a depth camera to scan and capture a color image and a depth image of a transparent object;
s32: filtering the acquired image;
s33: completing transparent object detection and segmentation by using a ClearGrasp deep learning algorithm;
s34: and searching and scoring the grabbing position of the object by using a contact line searching method, and accurately grabbing the object after the optimal grabbing position is obtained.
Further, in step S32, a gaussian filtering algorithm with balanced speed and effect is selected to filter the acquired image, where the gaussian filtering formula is shown in equation (7):
in equation (7), f (x, y) represents a gaussian function value, the squares of x and y represent the distances between other pixels in the neighborhood and the center pixel in the neighborhood, respectively, and σ represents a standard deviation.
Further, the specific implementation method of step S33 includes:
predicting a surface normal, identifying a boundary and segmenting a transparent object from the filtered image by adopting a ClearGrasp deep learning method, wherein the segmented mask is used for modifying the input depth image; then, the depth of all the surfaces of the high-transparency objects in the scene is reconstructed by using a global optimization algorithm, and the edges, the occlusion and the segmentation of the 3D reconstruction are optimized by using the predicted surface normal.
Further, in step S33, the cleargraph includes 3 neural networks, and the outputs of the 3 neural networks are integrated for global optimization;
the 3 neural networks include: a transparent object segmentation network, an edge identification network and a surface normal vector estimation network;
transparent object segmentation network: inputting a single RGB picture, and outputting a pixel Mask of a transparent object in a scene, namely judging that each pixel point belongs to a transparent or non-transparent object, and removing the pixel judged as the transparent object in subsequent optimization to obtain a modified depth map;
edge identification network: for a single RGB picture, outputting information of a shielding edge and a connected edge, which helps a network to better distinguish different edges in the picture and make more accurate prediction on the edge with discontinuous depth;
surface normal vector estimation: using the RGB picture as input, and performing L2 regularization on the output;
reconstructing the three-dimensional surface of the missing depth area of the transparent object by using the global optimization algorithm, filling the removed depth area by using the normal vector of the surface of the predicted transparent object, and observing the depth discontinuity of the information displayed by the shielding edge, wherein the depth discontinuity is expressed by the following formula:
E=λDED+λSES+λNENB (8)
in the formula (8), E represents the predicted depth, EDDistance representing predicted depth and observed original depth, ESRepresenting depth differences of adjacent points, ENDenotes the consistency of the normal vector of the predicted depth and the predicted surface, B denotes the boundary occlusion based on whether the pixel occludes the boundary, lambdaD、λS、λNRepresenting the correlation coefficient.
Further, in step S34, the direction of the best capture position is the main direction of the object image gradient, the main position extraction is performed on the depth image of the object to increase the speed of selecting the capture position, that is, gradient values are calculated on the x-axis and the y-axis, respectively, and the gradient direction of each pixel is calculated and arranged and counted through a histogram, wherein the method for calculating the object gradient and calculating the gradient direction is as follows:
using [ -1,0,1 [ ]]And [ -1,0,1 [ -1]TThe two convolution kernels perform two-dimensional convolution on the image to calculate the object gradient;
the gradient magnitude and direction are calculated as follows:
in the above formula, gxAnd gyRespectively representing gradient values in x and y directions, g representing gradient magnitude, and theta representing gradient direction;
after obtaining the gradient, a threshold value g is setThreshAt 250 f, the robot has enough depth to place the splint for effective grasping only if the gradient is greater than the threshold value, i.e. the robot has sufficient depth to place the splint for effective grasping
In the process of grabbing a transparent object by a robot, two contact lines exist when a clamping jaw is in contact with the object, and the conditions for selecting the proper contact lines are as follows:
the gradient directions of two contact lines are basically opposite;
the distance between the two contact lines does not exceed the maximum opening distance of the gripper;
the depth of the two contact lines is not more than 1/2 of the maximum depth in the clamping jaw;
the depth difference between the shallowest point in the area contained between the two contact lines and the shallowest point of the contact line does not exceed the internal depth of the clamping jaw;
the following formula was used to evaluate the grasping reliability of a pair of contact wires:
wherein G represents the grasping reliability,/1、l2Respectively showing the lengths of two contact lines of the clamping jaw and the transparent object to be grabbed, L showing the width of the clamping jaw,for the purpose of evaluating the length of the contact line,evaluation of the ratio of the lengths of the two contact lines,/maxIndicating the long strip in the contact line, lminWhich represents the short strip of the strip,for evaluating contact line fitting degree of paw,dlRepresenting the shallowest point of the contact line, dsRepresenting the shallowest point in the rectangular frame area, and using sin theta to evaluate the error degree of two contact lines, wherein theta is an acute angle formed by a connecting line of the midpoints of the two contact lines and the contact lines;
all contact line combinations are traversed through equation (12), and the combination with the highest score is selected as the best grasping position.
The invention has the advantages and positive effects that: aiming at the problem that the transparent object is difficult to grasp, the invention provides a clearGrasp-based deep learning algorithm which is characterized in that 3D data of a high-transparency object can be accurately predicted through an RGB-D camera.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for grabbing a transparent object by a robot based on deep learning according to the present invention;
FIG. 2 is a system hardware diagram of the robot based on deep learning for grabbing transparent objects according to the present invention;
fig. 3 is a schematic diagram of a cleargrass algorithm model network structure according to the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It should be noted that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by those skilled in the art without any inventive work based on the embodiments of the present invention belong to the protection scope of the present invention.
Examples
Fig. 1 is a schematic flow chart of a method for grabbing a transparent object by a robot based on deep learning according to the present invention, and as shown in fig. 1, the present invention provides a method for grabbing a transparent object by a robot based on deep learning, which includes the following steps:
step S1: completing the establishment of a hardware environment of a system for grabbing the transparent object by the robot;
step S2: completing the calibration of a camera of a system for grabbing the transparent object by the robot;
step S3: and finishing the training of a grasping planning model based on the convolutional neural network and the grasping of the robot in a real environment.
Specifically, the hardware environment of the system for grabbing the transparent object by the robot is shown in fig. 2, and comprises an Inter Realsense depth camera, at least one ROS dynamic Ubantu18.04 computer, at least one UR5 robot with a gripper and at least one object to be grabbed;
the Inter Realsense depth camera is used for collecting 3D visual data and is installed on the UR5 robot;
the Ubantu18.04 computer is used for finishing the training of grabbing the network model;
the UR5 robot is used to grab objects to be grabbed.
Specifically, when the depth camera shoots an object, the depth camera captures a depth image and a color image at the same time, when the camera is calibrated, the color image and the depth image need to be calibrated, and each pixel point of the depth image corresponds to each pixel point of the color image through calibration, where the step S2 specifically includes the following steps:
step S21: determining internal parameters and external parameters of a binocular camera through camera calibration, and completing the transformation from a world coordinate system to a camera coordinate system;
step S22: and determining the relative position between the camera and the end effector through hand-eye calibration, and finishing the transformation of a camera coordinate system and a robot end effector coordinate system.
Specifically, the method for implementing step S21 includes:
the transformation of the world coordinate system into the camera coordinate system is described using the rotation matrix R and the translation vector T, as shown in equation (1):
in the formula (1), R1、T1Is an external reference of the Levoeye camera, R2、T2Is an external reference of the right eye camera, which can be obtained by camera calibration, (X)W,YW,ZW) Is the coordinate of a point in space under the world coordinate system, (X)1,Y1,Z1) Is the coordinate of a point in space under the coordinate system of the eye-lens camera (X)2,Y2,Z2) A point in space is a coordinate under a coordinate system of a right-eye camera;
taking the left eye camera coordinate system as a reference, taking the rotation matrix from the right eye camera coordinate system to the left eye camera coordinate system as R ', taking the translation vector as T', then:
according to formula (1) and formula (2):
the position of the calibration plate is kept unchanged when the binocular camera is used for shooting, the left-eye camera and the right-eye camera shoot images of the calibration plate at the same time, a plurality of groups of image pairs are collected and then led into a tool kit of Matlab, the tool kit automatically calculates a rotation matrix and a translation vector between the two cameras, and the transformation from a world coordinate system to a camera coordinate system can be completed by using the rotation matrix and the translation vector.
Specifically, the method for implementing step S22 includes:
the method comprises the following steps of solving transformation from a camera coordinate system to a robot end effector coordinate system through hand-eye calibration, wherein a hand represents an end effector, an eye represents a camera, and in the hand-eye calibration process, 4 coordinate systems are involved, namely a calibration plate coordinate system B, a camera coordinate system C, an end effector coordinate system T and a robot base coordinate system R;
using transformation matricesDescribing the transformation of the calibration plate coordinate system B to the robot base coordinate system R,is represented as follows:
in the formula (4), the reaction mixture is,a transformation matrix representing the coordinate system B of the calibration plate to the coordinate system C of the camera, namely camera external parameters, can be obtained through camera calibration;a transformation matrix representing the coordinate system T of the end effector to the coordinate system R of the robot base can be obtained through parameters on the robot demonstrator;a hand-eye matrix to be solved is obtained;
in the calibration process, the position of the calibration plate is kept unchanged, the robot is controlled to shoot images of the calibration plate from different positions, and two positions are selected for analysis, so that the following formula (5) can be obtained:
in the formula (5), the reaction mixture is,respectively representing transformation matrixes from a coordinate system B of the calibration board at the position i and a coordinate system R of the robot base at the position i +1,respectively representing transformation matrixes from a position i and a position i +1 of the end effector coordinate system T to a robot base coordinate system R,respectively representing the hand-eye matrix to be solved at the position i and the position i +1,respectively representing transformation matrixes from a position i and a position i +1 calibration board coordinate system B to a camera coordinate system C; because the relative position between the calibration plate and the robot base is not changed, and the relative position between the robot end effector and the camera is not changed, the method comprises the following stepsThis is obtained simultaneously for formula (6):
in the formula (6), the reaction mixture is,andare all known quantities, and are finally solved to obtainI.e. a transformation matrix from the camera coordinate system to the robot end effector coordinate system.
Specifically, the method for implementing step S3 includes:
s31: utilizing a RealSense RGB-D camera to scan and capture a color image and a depth image of a transparent object;
s32: filtering the acquired image;
s33: completing transparent object detection and segmentation by using a ClearGrasp deep learning algorithm;
s34: and searching and scoring the grabbing position of the object by using a contact line searching method, and accurately grabbing the object after the optimal grabbing position is obtained.
Specifically, in step S32, a gaussian filtering algorithm with balanced speed and effect is selected to filter the acquired image, where the gaussian filtering formula is shown in equation (7):
in equation (7), f (x, y) represents a gaussian function value, the squares of x and y represent the distances between other pixels in the neighborhood and the center pixel in the neighborhood, respectively, and σ represents a standard deviation.
Specifically, a cleargraph deep learning algorithm model network structure is shown in fig. 3, and the specific implementation method of step S33 includes:
predicting a surface normal, identifying a boundary and segmenting a transparent object from the filtered image by adopting a ClearGrasp deep learning method, wherein the segmented mask is used for modifying the input depth image; then, the depth of all the surfaces of the high-transparency objects in the scene is reconstructed by using a global optimization algorithm, and the edges, the occlusion and the segmentation of the 3D reconstruction are optimized by using the predicted surface normal.
Specifically, in step S33, the cleargraph includes 3 neural networks, and the outputs of the 3 neural networks are integrated for global optimization;
the 3 neural networks include: a transparent object segmentation network, an edge identification network and a surface normal vector estimation network;
transparent object segmentation network: inputting a single RGB picture, and outputting a pixel Mask of a transparent object in a scene, namely judging that each pixel point belongs to a transparent or non-transparent object, and removing the pixel judged as the transparent object in subsequent optimization to obtain a modified depth map;
edge identification network: for a single RGB picture, outputting information of a shielding edge and a connected edge, which helps a network to better distinguish different edges in the picture and make more accurate prediction on the edge with discontinuous depth;
surface normal vector estimation: using the RGB picture as input, and performing L2 regularization on the output;
reconstructing the three-dimensional surface of the missing depth area of the transparent object by using the global optimization algorithm, filling the removed depth area by using the normal vector of the surface of the predicted transparent object, and observing the depth discontinuity of the information displayed by the shielding edge, wherein the depth discontinuity can be expressed by the following formula:
E=λDED+λSES+λNENB (8)
in the formula (8), E represents the predicted depth, EDDistance representing predicted depth and observed original depth, ESRepresenting depth differences of adjacent points, ENDenotes the consistency of the normal vector of the predicted depth and the predicted surface, B denotes the boundary occlusion based on whether the pixel occludes the boundary, lambdaD、λS、λNRepresenting the correlation coefficient.
Specifically, in step S34, the direction of the best capture position is the main direction of the object image gradient, the main position extraction is performed on the depth image of the object to accelerate the selection speed of the capture position, that is, gradient values are calculated on the x-axis and the y-axis, respectively, and the gradient direction of each pixel is calculated, and the gradient directions are arranged and counted through a histogram, wherein the method for calculating the object gradient and calculating the gradient direction is as follows:
using [ -1,0,1 [ ]]And [ -1,0,1 [ -1]TThe two convolution kernels perform two-dimensional convolution on the image to calculate the object gradient;
the gradient magnitude and direction are calculated as follows:
in the above formula, gxAnd gyRespectively representing gradient values in x and y directions, g representing gradient magnitude, and theta representing gradient direction;
after obtaining the gradient, a threshold value g is setThreshAt 250 f, the robot has enough depth to place the splint for effective grasping only if the gradient is greater than the threshold value, i.e. the robot has sufficient depth to place the splint for effective grasping
In the process of grabbing a transparent object by a robot, two contact lines exist when a clamping jaw is in contact with the object, and the conditions for selecting the proper contact lines are as follows:
the gradient directions of two contact lines are basically opposite;
the distance between the two contact lines does not exceed the maximum opening distance of the gripper;
the depth of the two contact lines is not more than 1/2 of the maximum depth in the clamping jaw;
the depth difference between the shallowest point in the area contained between the two contact lines and the shallowest point of the contact line does not exceed the internal depth of the clamping jaw;
the following formula was used to evaluate the grasping reliability of a pair of contact wires:
wherein G represents the grasping reliability,/1、l2Respectively showing the lengths of two contact lines of the clamping jaw and the transparent object to be grabbed, L showing the width of the clamping jaw,for the purpose of evaluating the length of the contact line,evaluation of twoLength ratio of strip contact line, /)maxIndicating the long strip in the contact line, lminWhich represents the short strip of the strip,for evaluating the contact line engaging the paw, dlRepresenting the shallowest point of the contact line, dsRepresenting the shallowest point in the rectangular frame area, and using sin theta to evaluate the error degree of two contact lines, wherein theta is an acute angle formed by a connecting line of the midpoints of the two contact lines and the contact lines;
all contact line combinations are traversed through equation (12), and the combination with the highest score is selected as the best grasping position.
The invention has the advantages and positive effects that: aiming at the problem that the transparent object is difficult to grasp, the invention provides a clearGrasp-based deep learning algorithm which is characterized in that 3D data of a high-transparency object can be accurately predicted through an RGB-D camera.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the appended claims.
Claims (10)
1. A method for grabbing a transparent object by a robot based on deep learning is characterized by comprising the following steps:
step S1: completing the establishment of a hardware environment of a system for grabbing the transparent object by the robot;
step S2: completing the calibration of a camera of a system for grabbing the transparent object by the robot;
step S3: and finishing the training of a grasping planning model based on the convolutional neural network and the grasping of the robot in a real environment.
2. The method for grabbing transparent objects by a robot based on deep learning of claim 1, wherein the hardware environment of the system for grabbing transparent objects by a robot comprises a depth camera, at least one computer with ROS dynamics, at least one robot with gripper and at least one object to be grabbed;
the depth camera is used for acquiring 3D visual data and is installed on the robot;
the computer is used for finishing the training of grabbing the network model;
the robot is used for grabbing an object to be grabbed.
3. The method for grabbing a transparent object by a robot based on deep learning of claim 1, wherein when the camera shoots the object, the camera captures a depth image and a color image at the same time, when the camera is calibrated, both the color image and the depth image need to be calibrated, and each pixel point of the depth image corresponds to each pixel point of the color image through calibration, the step S2 specifically includes the following steps:
step S21: determining internal parameters and external parameters of a binocular camera through camera calibration, and completing the transformation from a world coordinate system to a camera coordinate system;
step S22: and determining the relative position between the camera and the end effector through hand-eye calibration, and finishing the transformation of a camera coordinate system and a robot end effector coordinate system.
4. The method for grabbing the transparent object by the deep learning based robot according to claim 3, wherein the step S21 is implemented by the method comprising:
the transformation of the world coordinate system into the camera coordinate system is described using the rotation matrix R and the translation vector T, as shown in equation (1):
in the formula (1), R1、T1Is an external reference of the Levoeye camera, R2、T2Is external reference of the right eye camera, which is obtained by camera calibration, (X)W,YW,ZW) Is the coordinate of a point in space under the world coordinate system, (X)1,Y1,Z1) Is the coordinate of a point in space under the coordinate system of the eye-lens camera (X)2,Y2,Z2) A point in space is a coordinate under a coordinate system of a right-eye camera;
taking the left eye camera coordinate system as a reference, taking the rotation matrix from the right eye camera coordinate system to the left eye camera coordinate system as R ', taking the translation vector as T', then:
according to formula (1) and formula (2):
the position of the calibration plate is kept unchanged when the binocular camera is used for shooting, the left-eye camera and the right-eye camera shoot images of the calibration plate at the same time, a plurality of groups of image pairs are collected and then led into the tool box, the tool box automatically calculates a rotation matrix and a translation vector between the two cameras, and the rotation matrix and the translation vector are used for completing the transformation from a world coordinate system to a camera coordinate system.
5. The method for grabbing the transparent object by the deep learning based robot according to claim 3, wherein the step S22 is implemented by the method comprising:
the method comprises the following steps of solving transformation from a camera coordinate system to a robot end effector coordinate system through hand-eye calibration, wherein a hand represents an end effector, an eye represents a camera, and in the hand-eye calibration process, 4 coordinate systems are involved, namely a calibration plate coordinate system B, a camera coordinate system C, an end effector coordinate system T and a robot base coordinate system R;
using transformation matricesDescribing the transformation of the calibration plate coordinate system B to the robot base coordinate system R,is represented as follows:
in the formula (4), the reaction mixture is,expressing a transformation matrix from the coordinate system B of the calibration plate to the coordinate system C of the camera, namely camera external parameters, and obtaining the transformation matrix through camera calibration;a transformation matrix representing the coordinate system T of the end effector to the coordinate system R of the robot base is obtained through parameters on the robot demonstrator;a hand-eye matrix to be solved is obtained;
in the calibration process, the position of the calibration plate is kept unchanged, the robot is controlled to shoot images of the calibration plate from different positions, and two positions are selected for analysis, so that the following formula (5) can be obtained:
in the formula (5), the reaction mixture is,respectively representing a position i and a position i +1 of the calibration boardA transformation matrix from the coordinate system B to the robot base coordinate system R,respectively representing transformation matrixes from a position i and a position i +1 of the end effector coordinate system T to a robot base coordinate system R,respectively representing the hand-eye matrix to be solved at the position i and the position i +1,respectively representing transformation matrixes from a position i and a position i +1 calibration board coordinate system B to a camera coordinate system C; because the relative position between the calibration plate and the robot base is not changed, and the relative position between the robot end effector and the camera is not changed, the method comprises the following stepsThis is obtained simultaneously for formula (6):
6. The method for grabbing the transparent object by the deep learning based robot according to claim 1, wherein the step S3 is implemented by a method comprising:
s31: utilizing a depth camera to scan and capture a color image and a depth image of a transparent object;
s32: filtering the acquired image;
s33: completing transparent object detection and segmentation by using a ClearGrasp deep learning algorithm;
s34: and searching and scoring the grabbing position of the object by using a contact line searching method, and accurately grabbing the object after the optimal grabbing position is obtained.
7. The method for grabbing a transparent object by a deep learning based robot according to claim 6, wherein in step S32, a gaussian filter algorithm with balanced speed and effect is selected to filter the captured image, and the gaussian filter formula is shown in equation (7):
in equation (7), f (x, y) represents a gaussian function value, the squares of x and y represent the distances between other pixels in the neighborhood and the center pixel in the neighborhood, respectively, and σ represents a standard deviation.
8. The method for grabbing the transparent object by the deep learning based robot as claimed in claim 6, wherein the specific implementation method of step S33 includes:
predicting a surface normal, identifying a boundary and segmenting a transparent object from the filtered image by adopting a ClearGrasp deep learning method, wherein the segmented mask is used for modifying the input depth image; then, the depth of all the surfaces of the high-transparency objects in the scene is reconstructed by using a global optimization algorithm, and the edges, the occlusion and the segmentation of the 3D reconstruction are optimized by using the predicted surface normal.
9. The method for grabbing the transparent object by the deep learning based robot as claimed in claim 6, wherein in step S33, the cleargraph includes 3 neural networks, and the outputs of the 3 neural networks are collected for global optimization;
the 3 neural networks include: a transparent object segmentation network, an edge identification network and a surface normal vector estimation network;
transparent object segmentation network: inputting a single RGB picture, and outputting a pixel Mask of a transparent object in a scene, namely judging that each pixel point belongs to a transparent or non-transparent object, and removing the pixel judged as the transparent object in subsequent optimization to obtain a modified depth map;
edge identification network: for a single RGB picture, outputting information of a shielding edge and a connected edge, which helps a network to better distinguish different edges in the picture and make more accurate prediction on the edge with discontinuous depth;
surface normal vector estimation: using the RGB picture as input, and performing L2 regularization on the output;
reconstructing the three-dimensional surface of the missing depth area of the transparent object by using the global optimization algorithm, filling the removed depth area by using the normal vector of the surface of the predicted transparent object, and observing the depth discontinuity of the information displayed by the shielding edge, wherein the depth discontinuity is expressed by the following formula:
E=λDED+λSES+λNENB (8)
in the formula (8), E represents the predicted depth, EDDistance representing predicted depth and observed original depth, ESRepresenting depth differences of adjacent points, ENDenotes the consistency of the normal vector of the predicted depth and the predicted surface, B denotes the boundary occlusion based on whether the pixel occludes the boundary, lambdaD、λS、λNRepresenting the correlation coefficient.
10. The method for grabbing transparent objects by a robot based on deep learning as claimed in claim 6, wherein in step S34, the direction of the best grabbing position is the main direction of the object image gradient, the main position extraction is performed on the depth image of the object to accelerate the speed of selecting the grabbing position, i.e. the gradient values are calculated on the x-axis and the y-axis, and the gradient direction of each pixel is calculated and arranged and counted by a histogram, wherein the object gradient calculation and the gradient direction calculation are performed by the following methods:
using [ -1,0,1 [ ]]And [ -1,0,1 [ -1]TThe two convolution kernels perform two-dimensional convolution on the image to calculate the object gradient;
the gradient magnitude and direction are calculated as follows:
in the above formula, gxAnd gyRespectively representing gradient values in x and y directions, g representing gradient magnitude, and theta representing gradient direction;
after obtaining the gradient, a threshold value g is setThreshAt 250 f, the robot has enough depth to place the splint for effective grasping only if the gradient is greater than the threshold value, i.e. the robot has sufficient depth to place the splint for effective grasping
In the process of grabbing a transparent object by a robot, two contact lines exist when a clamping jaw is in contact with the object, and the conditions for selecting the proper contact lines are as follows:
the gradient directions of two contact lines are basically opposite;
the distance between the two contact lines does not exceed the maximum opening distance of the gripper;
the depth of the two contact lines is not more than 1/2 of the maximum depth in the clamping jaw;
the depth difference between the shallowest point in the area contained between the two contact lines and the shallowest point of the contact line does not exceed the internal depth of the clamping jaw;
the following formula was used to evaluate the grasping reliability of a pair of contact wires:
wherein G represents the grasping reliability,/1、l2Respectively showing the lengths of two contact lines of the clamping jaw and the transparent object to be grabbed, L showing the width of the clamping jaw,for the purpose of evaluating the length of the contact line,evaluation of the ratio of the lengths of the two contact lines,/maxIndicating the long strip in the contact line, lminWhich represents the short strip of the strip,for evaluating the contact line engaging the paw, dlRepresenting the shallowest point of the contact line, dsRepresenting the shallowest point in the rectangular frame area, and using sin theta to evaluate the error degree of two contact lines, wherein theta is an acute angle formed by a connecting line of the midpoints of the two contact lines and the contact lines;
all contact line combinations are traversed through equation (12), and the combination with the highest score is selected as the best grasping position.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010755192.2A CN112045676A (en) | 2020-07-31 | 2020-07-31 | Method for grabbing transparent object by robot based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010755192.2A CN112045676A (en) | 2020-07-31 | 2020-07-31 | Method for grabbing transparent object by robot based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112045676A true CN112045676A (en) | 2020-12-08 |
Family
ID=73601341
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010755192.2A Pending CN112045676A (en) | 2020-07-31 | 2020-07-31 | Method for grabbing transparent object by robot based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112045676A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112802105A (en) * | 2021-02-05 | 2021-05-14 | 梅卡曼德(北京)机器人科技有限公司 | Object grabbing method and device |
CN112809679A (en) * | 2021-01-25 | 2021-05-18 | 清华大学深圳国际研究生院 | Method and device for grabbing deformable object and computer readable storage medium |
CN113160313A (en) * | 2021-03-03 | 2021-07-23 | 广东工业大学 | Transparent object grabbing control method and device, terminal and storage medium |
CN114049399A (en) * | 2022-01-13 | 2022-02-15 | 上海景吾智能科技有限公司 | Mirror positioning method combining RGBD image |
CN114055501A (en) * | 2021-11-17 | 2022-02-18 | 长春理工大学 | Robot grabbing system and control method thereof |
CN114627359A (en) * | 2020-12-08 | 2022-06-14 | 山东新松工业软件研究院股份有限公司 | Out-of-order stacked workpiece grabbing priority evaluation method |
CN114750164A (en) * | 2022-05-25 | 2022-07-15 | 清华大学深圳国际研究生院 | Transparent object grabbing method and system and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105014678A (en) * | 2015-07-16 | 2015-11-04 | 深圳市得意自动化科技有限公司 | Robot hand-eye calibration method based on laser range finding |
CN108898634A (en) * | 2018-07-06 | 2018-11-27 | 张显磊 | Pinpoint method is carried out to embroidery machine target pinprick based on binocular camera parallax |
CN110246193A (en) * | 2019-06-20 | 2019-09-17 | 南京博蓝奇智能科技有限公司 | Industrial robot end camera online calibration method |
CN110370286A (en) * | 2019-08-13 | 2019-10-25 | 西北工业大学 | Dead axle motion rigid body spatial position recognition methods based on industrial robot and monocular camera |
-
2020
- 2020-07-31 CN CN202010755192.2A patent/CN112045676A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105014678A (en) * | 2015-07-16 | 2015-11-04 | 深圳市得意自动化科技有限公司 | Robot hand-eye calibration method based on laser range finding |
CN108898634A (en) * | 2018-07-06 | 2018-11-27 | 张显磊 | Pinpoint method is carried out to embroidery machine target pinprick based on binocular camera parallax |
CN110246193A (en) * | 2019-06-20 | 2019-09-17 | 南京博蓝奇智能科技有限公司 | Industrial robot end camera online calibration method |
CN110370286A (en) * | 2019-08-13 | 2019-10-25 | 西北工业大学 | Dead axle motion rigid body spatial position recognition methods based on industrial robot and monocular camera |
Non-Patent Citations (3)
Title |
---|
SHREEYAK S. SAJJAN等: "《ClearGrasp:3D Shape Estimation of Transparent Objects for Manipulation》", 《ARXIV:1910.02550V2 [CS.CV] 14 OCT 2019》 * |
卢岸潇: "《基于双目立体视觉的工件定位技术研究》", 《信息科技辑》 * |
王俊义: "《基于视觉伺服的机器人抓取关键技术研究》", 《信息科技辑》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114627359A (en) * | 2020-12-08 | 2022-06-14 | 山东新松工业软件研究院股份有限公司 | Out-of-order stacked workpiece grabbing priority evaluation method |
CN112809679A (en) * | 2021-01-25 | 2021-05-18 | 清华大学深圳国际研究生院 | Method and device for grabbing deformable object and computer readable storage medium |
CN112802105A (en) * | 2021-02-05 | 2021-05-14 | 梅卡曼德(北京)机器人科技有限公司 | Object grabbing method and device |
CN113160313A (en) * | 2021-03-03 | 2021-07-23 | 广东工业大学 | Transparent object grabbing control method and device, terminal and storage medium |
CN114055501A (en) * | 2021-11-17 | 2022-02-18 | 长春理工大学 | Robot grabbing system and control method thereof |
CN114049399A (en) * | 2022-01-13 | 2022-02-15 | 上海景吾智能科技有限公司 | Mirror positioning method combining RGBD image |
CN114049399B (en) * | 2022-01-13 | 2022-04-12 | 上海景吾智能科技有限公司 | Mirror positioning method combining RGBD image |
CN114750164A (en) * | 2022-05-25 | 2022-07-15 | 清华大学深圳国际研究生院 | Transparent object grabbing method and system and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112045676A (en) | Method for grabbing transparent object by robot based on deep learning | |
CN112270249B (en) | Target pose estimation method integrating RGB-D visual characteristics | |
CN103530599B (en) | The detection method and system of a kind of real human face and picture face | |
CN105279372B (en) | A kind of method and apparatus of determining depth of building | |
JP4429298B2 (en) | Object number detection device and object number detection method | |
CN108470356B (en) | Target object rapid ranging method based on binocular vision | |
US11948344B2 (en) | Method, system, medium, equipment and terminal for inland vessel identification and depth estimation for smart maritime | |
CN110509273B (en) | Robot manipulator detection and grabbing method based on visual deep learning features | |
US20150317821A1 (en) | Geodesic Distance Based Primitive Segmentation and Fitting for 3D Modeling of Non-Rigid Objects from 2D Images | |
CN115272271A (en) | Pipeline defect detecting and positioning ranging system based on binocular stereo vision | |
WO2014044126A1 (en) | Coordinate acquisition device, system and method for real-time 3d reconstruction, and stereoscopic interactive device | |
CN107274483A (en) | A kind of object dimensional model building method | |
CN105894574A (en) | Binocular three-dimensional reconstruction method | |
CN113850865A (en) | Human body posture positioning method and system based on binocular vision and storage medium | |
CN110648362B (en) | Binocular stereo vision badminton positioning identification and posture calculation method | |
WO2023070312A1 (en) | Image processing method | |
CN113393439A (en) | Forging defect detection method based on deep learning | |
CN111996883A (en) | Method for detecting width of road surface | |
WO2024148645A1 (en) | Apparatus for estimating under monocular infrared thermal imaging vision pose of object grasped by manipulator, and method thereof | |
CN114742888A (en) | 6D attitude estimation method based on deep learning | |
CN116152697A (en) | Three-dimensional model measuring method and related device for concrete structure cracks | |
CN112712059A (en) | Living body face recognition method based on infrared thermal image and RGB image | |
CN104243970A (en) | 3D drawn image objective quality evaluation method based on stereoscopic vision attention mechanism and structural similarity | |
JP4935769B2 (en) | Plane region estimation apparatus and program | |
JP4918615B2 (en) | Object number detection device and object number detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |