CN112045676A

CN112045676A - A method for robot grasping transparent objects based on deep learning

Info

Publication number: CN112045676A
Application number: CN202010755192.2A
Authority: CN
Inventors: 雷渠江; 徐杰; 李秀昊; 桂光超; 王雨禾; 潘艺芃; 周纪民; 王卫军; 韩彰秀
Original assignee: Guangzhou Institute of Advanced Technology of CAS
Current assignee: Guangzhou Institute of Advanced Technology of CAS
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2020-12-08

Abstract

The invention discloses a deep learning-based method for a robot to grasp a transparent object, comprising the following steps: S1: complete the hardware environment construction of the robot grasping transparent object system; S2: complete the camera calibration of the robot grasping transparent object system; S3 : Complete the training of the grasping planning model based on the convolutional neural network, and the grasping of the robot in the real environment. The specific implementation method of S3 includes: using a depth camera to scan and grab color images and depth images of transparent objects; filtering the collected images; using the ClearGrasp deep learning algorithm to complete the detection and segmentation of transparent objects; using the contact line finding method to grasp the objects Searching and scoring of locations, and crawls correctly after getting the best crawl location. The invention can accurately predict the 3D data of the high-transparency object through the RGB-D camera, accurately calculate the surface normal of the transparent object through the reflection light spot, and improve the prediction accuracy of the transparent object.

Description

Method for grabbing transparent object by robot based on deep learning

Technical Field

The invention relates to the technical field of robot grabbing, in particular to a method for grabbing a transparent object by a robot based on deep learning.

Background

For the service robot, it is most important to be able to grasp the target object more quickly and accurately in a home environment, and only then, the service robot can help people with mobility disabilities better. The key of successful grabbing is the identification and positioning of targets, and at present, a vision sensor is generally adopted on a robot to identify objects. Among a plurality of grabbed objects, transparent objects are very common in life, and whether the transparent objects can be effectively identified and positioned plays a crucial role in the grabbing efficiency of the objects. However, when the robot recognizes the transparent object by using vision, the transparent object area is sensitive to light changes, and does not have enough texture features to extract, and has the reasons of dependence on the background environment, external influence on the strength gradient features and the like, so that the recognition of the transparent object is always a problem which is difficult to effectively solve.

At present, the commonly used transparent object detection methods include a non-visual method and transparent object detection based on an RGB two-dimensional image. Among them, the non-visual method is complicated to use, and makes the robot cost very high, is inconvenient for the service robot to use; the two-dimensional object obtained by the RGB image method has weak robustness, the detection condition is harsh, and the spatial position of the object cannot be obtained.

Disclosure of Invention

In view of the above, there is a need to provide a method for a robot to grab a transparent object based on deep learning, that is, a three-dimensional geometry of the transparent object is accurately estimated from an RGB-D image by using a deep learning method for the robot to operate, so as to solve the task of grabbing the transparent object in a home scene by the robot.

In order to realize the purpose, the invention is realized according to the following technical scheme:

a method for grabbing a transparent object by a robot based on deep learning comprises the following steps:

step S1: completing the establishment of a hardware environment of a system for grabbing the transparent object by the robot;

step S2: completing the calibration of a camera of a system for grabbing the transparent object by the robot;

step S3: and finishing the training of a grasping planning model based on the convolutional neural network and the grasping of the robot in a real environment.

Further, the hardware environment of the robot grasping transparent object system comprises a depth camera, at least one computer with ROS dynamics, at least one robot with a gripper, and at least one object to be grasped;

the depth camera is used for acquiring 3D visual data and is installed on the robot;

the computer is used for finishing the training of grabbing the network model;

the robot is used for grabbing an object to be grabbed.

Further, when the camera shoots an object, the camera captures a depth image and a color image at the same time, when the camera is calibrated, the color image and the depth image need to be calibrated, and each pixel point of the depth image corresponds to each pixel point of the color image through calibration, where the step S2 specifically includes the following steps:

step S21: determining internal parameters and external parameters of a binocular camera through camera calibration, and completing the transformation from a world coordinate system to a camera coordinate system;

step S22: and determining the relative position between the camera and the end effector through hand-eye calibration, and finishing the transformation of a camera coordinate system and a robot end effector coordinate system.

Further, the specific implementation method of step S21 includes:

the transformation of the world coordinate system into the camera coordinate system is described using the rotation matrix R and the translation vector T, as shown in equation (1):

in the formula (1), R₁、T₁Is an external reference of the Levoeye camera, R₂、T₂Is external reference of the right eye camera, which is obtained by camera calibration, (X)_W，Y_W，Z_W) Is the coordinate of a point in space under the world coordinate system, (X)₁，Y₁，Z₁) Is the coordinate of a point in space under the coordinate system of the eye-lens camera (X)₂，Y₂，Z₂) A point in space is a coordinate under a coordinate system of a right-eye camera;

taking the left eye camera coordinate system as a reference, taking the rotation matrix from the right eye camera coordinate system to the left eye camera coordinate system as R ', taking the translation vector as T', then:

according to formula (1) and formula (2):

the position of the calibration plate is kept unchanged when the binocular camera is used for shooting, the left-eye camera and the right-eye camera shoot images of the calibration plate at the same time, a plurality of groups of image pairs are collected and then led into the tool box, the tool box automatically calculates a rotation matrix and a translation vector between the two cameras, and the rotation matrix and the translation vector are used for completing the transformation from a world coordinate system to a camera coordinate system.

Further, the specific implementation method of step S22 includes:

the method comprises the following steps of solving transformation from a camera coordinate system to a robot end effector coordinate system through hand-eye calibration, wherein a hand represents an end effector, an eye represents a camera, and in the hand-eye calibration process, 4 coordinate systems are involved, namely a calibration plate coordinate system B, a camera coordinate system C, an end effector coordinate system T and a robot base coordinate system R;

using transformation matrices

Describing the transformation of the calibration plate coordinate system B to the robot base coordinate system R,

is represented as follows:

in the formula (4), the reaction mixture is,

expressing a transformation matrix from the coordinate system B of the calibration plate to the coordinate system C of the camera, namely camera external parameters, and obtaining the transformation matrix through camera calibration;

a transformation matrix representing the coordinate system T of the end effector to the coordinate system R of the robot base is obtained through parameters on the robot demonstrator;

a hand-eye matrix to be solved is obtained;

in the calibration process, the position of the calibration plate is kept unchanged, the robot is controlled to shoot images of the calibration plate from different positions, and two positions are selected for analysis, so that the following formula (5) can be obtained:

in the formula (5), the reaction mixture is,

calibration board for respectively representing position i and position i +1A transformation matrix from coordinate system B to robot base coordinate system R,

respectively representing transformation matrixes from a position i and a position i +1 of the end effector coordinate system T to a robot base coordinate system R,

respectively representing the hand-eye matrix to be solved at the position i and the position i +1,

respectively representing transformation matrixes from a position i and a position i +1 calibration board coordinate system B to a camera coordinate system C; because the relative position between the calibration plate and the robot base is not changed, and the relative position between the robot end effector and the camera is not changed, the method comprises the following steps

This is obtained simultaneously for formula (6):

in the formula (6), the reaction mixture is,

are all known quantities, and are finally solved to obtain

I.e. a transformation matrix from the camera coordinate system to the robot end effector coordinate system.

Further, the specific implementation method of step S3 includes:

s31: utilizing a depth camera to scan and capture a color image and a depth image of a transparent object;

s32: filtering the acquired image;

s33: completing transparent object detection and segmentation by using a ClearGrasp deep learning algorithm;

s34: and searching and scoring the grabbing position of the object by using a contact line searching method, and accurately grabbing the object after the optimal grabbing position is obtained.

Further, in step S32, a gaussian filtering algorithm with balanced speed and effect is selected to filter the acquired image, where the gaussian filtering formula is shown in equation (7):

in equation (7), f (x, y) represents a gaussian function value, the squares of x and y represent the distances between other pixels in the neighborhood and the center pixel in the neighborhood, respectively, and σ represents a standard deviation.

Further, the specific implementation method of step S33 includes:

predicting a surface normal, identifying a boundary and segmenting a transparent object from the filtered image by adopting a ClearGrasp deep learning method, wherein the segmented mask is used for modifying the input depth image; then, the depth of all the surfaces of the high-transparency objects in the scene is reconstructed by using a global optimization algorithm, and the edges, the occlusion and the segmentation of the 3D reconstruction are optimized by using the predicted surface normal.

Further, in step S33, the cleargraph includes 3 neural networks, and the outputs of the 3 neural networks are integrated for global optimization;

the 3 neural networks include: a transparent object segmentation network, an edge identification network and a surface normal vector estimation network;

transparent object segmentation network: inputting a single RGB picture, and outputting a pixel Mask of a transparent object in a scene, namely judging that each pixel point belongs to a transparent or non-transparent object, and removing the pixel judged as the transparent object in subsequent optimization to obtain a modified depth map;

edge identification network: for a single RGB picture, outputting information of a shielding edge and a connected edge, which helps a network to better distinguish different edges in the picture and make more accurate prediction on the edge with discontinuous depth;

surface normal vector estimation: using the RGB picture as input, and performing L2 regularization on the output;

reconstructing the three-dimensional surface of the missing depth area of the transparent object by using the global optimization algorithm, filling the removed depth area by using the normal vector of the surface of the predicted transparent object, and observing the depth discontinuity of the information displayed by the shielding edge, wherein the depth discontinuity is expressed by the following formula:

E＝λ_DE_D+λ_SE_S+λ_NE_NB (8)

in the formula (8), E represents the predicted depth, E_DDistance representing predicted depth and observed original depth, E_SRepresenting depth differences of adjacent points, E_NDenotes the consistency of the normal vector of the predicted depth and the predicted surface, B denotes the boundary occlusion based on whether the pixel occludes the boundary, lambda_D、λ_S、λ_NRepresenting the correlation coefficient.

Further, in step S34, the direction of the best capture position is the main direction of the object image gradient, the main position extraction is performed on the depth image of the object to increase the speed of selecting the capture position, that is, gradient values are calculated on the x-axis and the y-axis, respectively, and the gradient direction of each pixel is calculated and arranged and counted through a histogram, wherein the method for calculating the object gradient and calculating the gradient direction is as follows:

using [ -1,0,1 [ ]]And [ -1,0,1 [ -1]^TThe two convolution kernels perform two-dimensional convolution on the image to calculate the object gradient;

the gradient magnitude and direction are calculated as follows:

in the above formula, g_xAnd g_yRespectively representing gradient values in x and y directions, g representing gradient magnitude, and theta representing gradient direction;

after obtaining the gradient, a threshold value g is set_ThreshAt 250 f, the robot has enough depth to place the splint for effective grasping only if the gradient is greater than the threshold value, i.e. the robot has sufficient depth to place the splint for effective grasping

In the process of grabbing a transparent object by a robot, two contact lines exist when a clamping jaw is in contact with the object, and the conditions for selecting the proper contact lines are as follows:

the gradient directions of two contact lines are basically opposite;

the distance between the two contact lines does not exceed the maximum opening distance of the gripper;

the depth of the two contact lines is not more than 1/2 of the maximum depth in the clamping jaw;

the depth difference between the shallowest point in the area contained between the two contact lines and the shallowest point of the contact line does not exceed the internal depth of the clamping jaw;

the following formula was used to evaluate the grasping reliability of a pair of contact wires:

wherein G represents the grasping reliability,/₁、l₂Respectively showing the lengths of two contact lines of the clamping jaw and the transparent object to be grabbed, L showing the width of the clamping jaw,

for the purpose of evaluating the length of the contact line,

evaluation of the ratio of the lengths of the two contact lines,/_maxIndicating the long strip in the contact line, l_minWhich represents the short strip of the strip,

for evaluating contact line fitting degree of paw，d_lRepresenting the shallowest point of the contact line, d_sRepresenting the shallowest point in the rectangular frame area, and using sin theta to evaluate the error degree of two contact lines, wherein theta is an acute angle formed by a connecting line of the midpoints of the two contact lines and the contact lines;

all contact line combinations are traversed through equation (12), and the combination with the highest score is selected as the best grasping position.

The invention has the advantages and positive effects that: aiming at the problem that the transparent object is difficult to grasp, the invention provides a clearGrasp-based deep learning algorithm which is characterized in that 3D data of a high-transparency object can be accurately predicted through an RGB-D camera.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for grabbing a transparent object by a robot based on deep learning according to the present invention;

FIG. 2 is a system hardware diagram of the robot based on deep learning for grabbing transparent objects according to the present invention;

fig. 3 is a schematic diagram of a cleargrass algorithm model network structure according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It should be noted that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by those skilled in the art without any inventive work based on the embodiments of the present invention belong to the protection scope of the present invention.

Examples

Fig. 1 is a schematic flow chart of a method for grabbing a transparent object by a robot based on deep learning according to the present invention, and as shown in fig. 1, the present invention provides a method for grabbing a transparent object by a robot based on deep learning, which includes the following steps:

Specifically, the hardware environment of the system for grabbing the transparent object by the robot is shown in fig. 2, and comprises an Inter Realsense depth camera, at least one ROS dynamic Ubantu18.04 computer, at least one UR5 robot with a gripper and at least one object to be grabbed;

the Inter Realsense depth camera is used for collecting 3D visual data and is installed on the UR5 robot;

the Ubantu18.04 computer is used for finishing the training of grabbing the network model;

the UR5 robot is used to grab objects to be grabbed.

Specifically, when the depth camera shoots an object, the depth camera captures a depth image and a color image at the same time, when the camera is calibrated, the color image and the depth image need to be calibrated, and each pixel point of the depth image corresponds to each pixel point of the color image through calibration, where the step S2 specifically includes the following steps:

Specifically, the method for implementing step S21 includes:

in the formula (1), R₁、T₁Is an external reference of the Levoeye camera, R₂、T₂Is an external reference of the right eye camera, which can be obtained by camera calibration, (X)_W，Y_W，Z_W) Is the coordinate of a point in space under the world coordinate system, (X)₁，Y₁，Z₁) Is the coordinate of a point in space under the coordinate system of the eye-lens camera (X)₂，Y₂，Z₂) A point in space is a coordinate under a coordinate system of a right-eye camera;

according to formula (1) and formula (2):

the position of the calibration plate is kept unchanged when the binocular camera is used for shooting, the left-eye camera and the right-eye camera shoot images of the calibration plate at the same time, a plurality of groups of image pairs are collected and then led into a tool kit of Matlab, the tool kit automatically calculates a rotation matrix and a translation vector between the two cameras, and the transformation from a world coordinate system to a camera coordinate system can be completed by using the rotation matrix and the translation vector.

Specifically, the method for implementing step S22 includes:

using transformation matrices

is represented as follows:

in the formula (4), the reaction mixture is,

a transformation matrix representing the coordinate system B of the calibration plate to the coordinate system C of the camera, namely camera external parameters, can be obtained through camera calibration;

a transformation matrix representing the coordinate system T of the end effector to the coordinate system R of the robot base can be obtained through parameters on the robot demonstrator;

a hand-eye matrix to be solved is obtained;

in the formula (5), the reaction mixture is,

respectively representing transformation matrixes from a coordinate system B of the calibration board at the position i and a coordinate system R of the robot base at the position i +1,

This is obtained simultaneously for formula (6):

in the formula (6), the reaction mixture is,

and

are all known quantities, and are finally solved to obtain

Specifically, the method for implementing step S3 includes:

s31: utilizing a RealSense RGB-D camera to scan and capture a color image and a depth image of a transparent object;

s32: filtering the acquired image;

Specifically, in step S32, a gaussian filtering algorithm with balanced speed and effect is selected to filter the acquired image, where the gaussian filtering formula is shown in equation (7):

Specifically, a cleargraph deep learning algorithm model network structure is shown in fig. 3, and the specific implementation method of step S33 includes:

Specifically, in step S33, the cleargraph includes 3 neural networks, and the outputs of the 3 neural networks are integrated for global optimization;

reconstructing the three-dimensional surface of the missing depth area of the transparent object by using the global optimization algorithm, filling the removed depth area by using the normal vector of the surface of the predicted transparent object, and observing the depth discontinuity of the information displayed by the shielding edge, wherein the depth discontinuity can be expressed by the following formula:

E＝λ_DE_D+λ_SE_S+λ_NE_NB (8)

Specifically, in step S34, the direction of the best capture position is the main direction of the object image gradient, the main position extraction is performed on the depth image of the object to accelerate the selection speed of the capture position, that is, gradient values are calculated on the x-axis and the y-axis, respectively, and the gradient direction of each pixel is calculated, and the gradient directions are arranged and counted through a histogram, wherein the method for calculating the object gradient and calculating the gradient direction is as follows:

the gradient magnitude and direction are calculated as follows:

the gradient directions of two contact lines are basically opposite;

for the purpose of evaluating the length of the contact line,

evaluation of twoLength ratio of strip contact line, /)_maxIndicating the long strip in the contact line, l_minWhich represents the short strip of the strip,

for evaluating the contact line engaging the paw, d_lRepresenting the shallowest point of the contact line, d_sRepresenting the shallowest point in the rectangular frame area, and using sin theta to evaluate the error degree of two contact lines, wherein theta is an acute angle formed by a connecting line of the midpoints of the two contact lines and the contact lines;

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the appended claims.

Claims

1. a method for grasping a transparent object based on a deep learning robot, is characterized in that, comprises the following steps:

Step S1: complete the hardware environment construction of the robot grasping transparent object system;

Step S2: complete the camera calibration of the robot grasping transparent object system;

Step S3: Complete the training of the grasping planning model based on the convolutional neural network, and the grasping of the robot in the real environment.

2. the method for robot grasping transparent object based on deep learning according to claim 1, is characterized in that, the hardware environment of described robot grasping transparent object system comprises depth camera, the computer of at least one ROS dynamic characteristic, at least one robot with a gripper and at least one object to be grasped;

The depth camera is used to collect 3D visual data and is installed on the robot;

The computer is used to complete the training of the grabbing network model;

The robot is used for grasping the object to be grasped.

3. The method for grasping a transparent object based on a deep learning robot according to claim 1, wherein the camera captures a depth image and a color image at the same time when photographing the object, and when calibrating the camera, it is necessary to measure the color image. Both the image and the depth image are calibrated, and through the calibration, each pixel of the depth image corresponds to each pixel of the color image, and the step S2 specifically includes the following steps:

Step S21: Determine the internal parameters and external parameters of the binocular camera through camera calibration, and complete the transformation from the world coordinate system to the camera coordinate system;

Step S22: Determine the relative position between the camera and the end effector through hand-eye calibration, and complete the transformation of the camera coordinate system and the robot end effector coordinate system.

4. The method for robot grasping transparent objects based on deep learning according to claim 3, wherein the specific implementation method of step S21 comprises:

Use the rotation matrix R and the translation vector T to describe the transformation from the world coordinate system to the camera coordinate system, as shown in formula (1):

In formula (1), R ₁ and T ₁ are the external parameters of the left camera, R ₂ and T ₂ are the external parameters of the right camera, which are obtained by camera calibration, and (X _W , Y _W , Z _W ) is a point in the space The coordinates in the world coordinate system, (X ₁ , Y ₁ , Z ₁ ) are the coordinates of a point in the space in the left-eye camera coordinate system, (X ₂ , Y ₂ , Z ₂ ) is a point in the space in the right-eye camera coordinate system the coordinates below;

Taking the left-eye camera coordinate system as the benchmark, the rotation matrix from the right-eye camera coordinate system to the left-eye camera coordinate system is R', and the translation vector is T', there are:

According to formula (1) and formula (2), we can get:

When shooting with a binocular camera, keep the position of the calibration board unchanged. The left and right cameras capture images of the calibration board at the same time, collect several sets of image pairs, and then import them into the toolbox. The toolbox automatically calculates the rotation matrix between the two cameras and Translation vector, use the rotation matrix and translation vector to complete the transformation from the world coordinate system to the camera coordinate system.

5. The method for grasping a transparent object by a robot based on deep learning according to claim 3, wherein the specific implementation method of the step S22 comprises:

Solve the transformation from the camera coordinate system to the robot end-effector coordinate system through hand-eye calibration, where the hand represents the end-effector and the eye represents the camera. In the process of hand-eye calibration, four coordinate systems are involved, namely the calibration board coordinate system B and the camera coordinate. System C, end effector coordinate system T, robot base coordinate system R;

Use transformation matrices

Describe the transformation from the calibration plate coordinate system B to the robot base coordinate system R,

It is expressed as follows:

In formula (4),

Represents the transformation matrix from the calibration board coordinate system B to the camera coordinate system C, that is, the camera external parameters, obtained through camera calibration;

Represents the transformation matrix from the end effector coordinate system T to the robot base coordinate system R, which is obtained through the parameters on the robot teach pendant;

is the hand-eye matrix to be solved;

During the calibration process, keep the position of the calibration board unchanged, control the robot to take images of the calibration board from different positions, and select two positions for analysis, the following formula (5) can be obtained:

In formula (5),

respectively represent the transformation matrix of position i and position i+1 from the coordinate system B of the calibration board to the coordinate system R of the robot base,

respectively represent the transformation matrix of the position i, position i+1 end effector coordinate system T to the robot base coordinate system R,

Represents the hand-eye matrix to be solved for position i and position i+1, respectively,

Represents the transformation matrix of position i, position i+1 from the calibration board coordinate system B to the camera coordinate system C; because the relative position between the calibration board and the robot base does not change, the relative position between the robot end effector and the camera does not change. , so there is

Simultaneously with equation (6), we can get:

In formula (6),

and

are known quantities, and the final solution is

It is the transformation matrix from the camera coordinate system to the robot end-effector coordinate system.

6. The method for robot grasping transparent objects based on deep learning according to claim 1, wherein the specific implementation method of step S3 comprises:

S31: Use a depth camera to scan and capture color images and depth images of transparent objects;

S32: filter the collected image;

S33: Use ClearGrasp deep learning algorithm to complete transparent object detection and segmentation;

S34: Search and score the grasping position of the object by using the contact line finding method, and perform correct grasping after obtaining the best grasping position.

7. the method for grasping transparent objects based on deep learning according to claim 6, is characterized in that, in step S32, selects the Gaussian filtering algorithm of speed and effect balance to filter the image collected, and Gaussian filtering formula is as follows: Formula (7) shows:

In formula (7), f(x, y) represents the Gaussian function value, the square of x and the square of y respectively represent the distance between other pixels in the neighborhood and the central pixel in the neighborhood, and σ represents the standard deviation.

8. The method for grasping a transparent object by a robot based on deep learning according to claim 6, wherein the specific implementation method of the step S33 comprises:

The ClearGrasp deep learning method is used to predict surface normals, identify boundaries, and segment transparent objects from the filtered image, and the segmented mask will be used to modify the input depth image; then use a global optimization algorithm to reconstruct all high-transparency objects in the scene The depth of the surface and use the predicted surface normals to optimize 3D reconstruction for edges, occlusion and segmentation.

9. the method for grasping transparent objects based on deep learning according to claim 6, is characterized in that, in step S33, ClearGrasp comprises 3 neural networks, and has assembled the output of above-mentioned 3 neural networks to do global optimization ;

3 neural networks including: transparent object segmentation network, edge recognition network and surface normal vector estimation network;

Transparent Object Segmentation Network: Input a single RGB image and output the pixel Mask of the transparent object in the scene, that is, determine whether each pixel belongs to a transparent or non-transparent object, and in the subsequent optimization, the pixels determined as transparent objects will be removed and modified. the depth map after;

Edge recognition network: For a single RGB image, output occlusion edge and connected edge information, which helps the network to better distinguish different edges in the image and make more accurate predictions for edges with discontinuous depths;

Surface normal vector estimation: use an RGB image as input and perform L2 regularization on the output;

The three-dimensional surface of the missing depth region of the transparent object is reconstructed by using the global optimization algorithm, and the removed depth region is filled with the surface normal vector of the predicted transparent object. At the same time, the depth discontinuity of the information displayed by the occlusion edge is observed, which is expressed by the following formula:

E=λ _D E _D +λ _S E _S +λ _N E _N B (8)

In formula (8), E represents the predicted depth, E _D represents the distance between the predicted depth and the observed original depth, _{ES represents the depth difference between adjacent points, E N} _represents the consistency between the predicted depth and the predicted surface normal vector, and B represents the Whether the pixel blocks the boundary, λ _D , λ _S , and λ _N represent the correlation coefficient.

10. The method for grasping a transparent object based on deep learning according to claim 6, wherein in step S34, the direction of the best grasping position is the main direction of the object image gradient, and the depth image of the object is Extract the main position to speed up the selection of the grabbing position, that is, calculate the gradient value on the x-axis and the y-axis respectively, and calculate the gradient direction of each pixel, and arrange and count the gradient directions through the histogram. Among them, the gradient of the object The method of calculation and gradient direction calculation is as follows:

Use the two convolution kernels of [-1,0,1] and [-1,0,1] ^T to perform two-dimensional convolution on the image to calculate the gradient of the object;

The gradient magnitude and direction are calculated as follows:

In the above formula, g _x and g _y represent the gradient values in the x and y directions, respectively, g represents the gradient amplitude, and θ represents the gradient direction;

After obtaining the gradient, set the threshold value g _Thresh = 250. Only when the gradient is greater than this threshold value can the robot be placed into the splint with sufficient depth for effective grasping, that is,

When the robot grabs a transparent object, there are two contact lines when the gripper is in contact with the object. The conditions for selecting the appropriate contact line are as follows:

①The gradient directions of the two contact lines are basically opposite;

②The distance between the two contact lines does not exceed the maximum opening distance of the gripper;

③ The depth of the two contact lines does not exceed 1/2 of the maximum depth inside the jaws;

④ The depth difference between the shallowest point of the area contained between the two contact lines and the shallowest point of the contact line shall not exceed the internal depth of the gripper;

The following formula was used to evaluate the grasping reliability of a pair of contact lines:

In the formula, G represents the grasping reliability, l ₁ and l ₂ respectively represent the lengths of the two contact lines between the gripper and the transparent object to be gripped, L represents the width of the gripper,

used to evaluate the contact line length,

Evaluate the length ratio of the two contact lines, l _max represents the longer one of the contact lines, l _min represents the short one,

It is used to evaluate the degree to which the contact line fits the paw, d _l represents the shallowest point of the contact line, d _s represents the shallowest point in the rectangular frame area, sinθ is used to evaluate the error degree of the two contact lines, and θ is the difference between the two contact lines. The acute angle between the point connecting line and the contact line;

All contact line combinations are traversed by formula (12), and the combination with the highest score is selected as the best grasping position.