CN115187781A

CN115187781A - Six-degree-of-freedom grabbing detection algorithm based on semantic segmentation network

Info

Publication number: CN115187781A
Application number: CN202210817429.4A
Authority: CN
Inventors: 张向燕; 张勤俭; 李海源; 沈勇; 王勇
Original assignee: Beijing University of Posts and Telecommunications; Beijing Information Science and Technology University; Peking University School of Stomatology
Current assignee: Beijing University of Posts and Telecommunications; Beijing Information Science and Technology University; Peking University School of Stomatology
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2022-10-14
Anticipated expiration: 2042-07-12
Also published as: CN115187781B

Abstract

The invention discloses a six-degree-of-freedom grabbing detection algorithm based on a semantic segmentation network, which comprises the following steps of: acquiring a six-degree-of-freedom grabbing detection data set, and converting a grabbing parameter array into a grabbing parameter semantic label image; carrying out scaling, normalization, classification reduction and division of a training test set on RGB images in the data set and the captured parameter semantic label images obtained by conversion; constructing a six-degree-of-freedom capture detection semantic segmentation network, and training the network; sending RGB image data into a trained six-degree-of-freedom capture detection semantic segmentation network to obtain a predicted capture parameter semantic image; converting the predicted semantic tag image of the grabbing parameters into grabbing parameters relative to a camera coordinate system; and driving the robot with the end effector to grab according to the grabbing parameters. By adopting the six-degree-of-freedom capture detection algorithm based on the semantic segmentation network, the network model is more stable and robust, and the prediction result is more accurate.

Description

Six-degree-of-freedom grabbing detection algorithm based on semantic segmentation network

Technical Field

The invention belongs to the technical field of computer vision and robot grabbing, and particularly relates to a six-degree-of-freedom grabbing detection algorithm based on a semantic segmentation network.

Background

The grabbing and the operation of the robot are important modes of interaction between the robot and the environment, the grabbing of the robot is a more basic and important link, and although a large number of researchers research the grabbing problem at present, the grabbing problem is a quite complex problem, which not only shows the diversity of grabbing scenes, but also shows the precision in the aspects of grabbing pose analysis, grabbing motion planning and the like.

The robot grabbing problem can be broken down into two main steps: firstly, detecting a grabbing pose; and secondly, performing exercise planning. The detection of the grabbing pose is the key to realize the successful grabbing of the robot. At present, the grabbing pose detection mainly comprises two modes: one is mathematical analysis and the other is data-driven.

The data analysis method needs to obtain the position and the posture of the robot for grabbing the target object through a series of complex operations, has low operation efficiency and complex analysis process, and is difficult to migrate to other new scenes.

With the continuous development of artificial intelligence and machine vision, a data-driven neural network is gradually applied to grasping pose detection research, and typical methods include: two-dimensional plane grabbing detection and six-degree-of-freedom grabbing detection. Two-dimensional plane grabbing is generally represented by directed rectangles, a large number of usable data sets for two-dimensional plane grabbing are available at present, a large number of researches are carried out, and a good prediction effect is obtained, but two-dimensional plane grabbing can only achieve grabbing perpendicular to a plane, and cannot adapt to more flexible and complex scenes. Six-degree-of-freedom grabbing can achieve grabbing of an object from any direction, grabbing is more flexible, however, most of current researches on six-degree-of-freedom grabbing detection take a depth image or three-dimensional point cloud as input, instability of prediction is increased, and the six-degree-of-freedom grabbing detection is easily influenced by depth image noise. In addition, in the six-degree-of-freedom grabbing detection, the grabbing attitude is a 3 × 3 matrix, and most researches directly use a regression mode to predict the grabbing attitude, which leads to a non-ideal prediction result.

Disclosure of Invention

The invention aims to provide a six-degree-of-freedom grabbing detection algorithm based on a semantic segmentation network, which can overcome the problems of the existing two-dimensional plane grabbing detection and the existing six-degree-of-freedom grabbing detection, improve the accuracy of a prediction result and adapt to more flexible and complex scenes.

In order to achieve the purpose, the invention provides a six-degree-of-freedom capture detection algorithm based on a semantic segmentation network, which comprises the following steps of:

step 1: acquiring a six-degree-of-freedom grabbing detection data set, and converting a grabbing parameter array into a grabbing parameter semantic label image by a designed data conversion method;

and 2, step: carrying out scaling, normalization, classification reduction and division of a training test set on RGB images in the data set and the captured parameter semantic label images obtained by conversion;

and 3, step 3: constructing a six-degree-of-freedom capture detection semantic segmentation network, and training the network by using the processed data;

and 4, step 4: sending RGB image data into a trained six-degree-of-freedom capture detection semantic segmentation network to obtain a predicted capture parameter semantic image;

and 5: converting the predicted semantic label image of the grabbing parameters through the proposed post-processing operation to obtain the grabbing parameters relative to a camera coordinate system;

step 6: and (4) driving the robot with the end effector to grab the object in the scene through the grabbing parameters obtained by post-processing.

Preferably, step 1 specifically comprises:

step 1.1: acquiring a GraspNet-1Billion six-degree-of-freedom grabbing detection data set, reading and converting grabbing labels in the data set;

step 1.2: converting the specific numerical values of the grabbing depth and the opening and closing width in the grabbing label obtained after the treatment into classifications;

step 1.3: converting the rotation attitude matrix in the captured label obtained after processing into a rotation vector, decomposing the rotation attitude matrix into a unit direction vector and an angle rotating around the unit direction vector, and converting to obtain a classification of the unit direction vector and the angle rotating around the unit direction vector through a classification matching template;

step 1.4: and converting the grabbing parameters into grabbing parameter semantic label images by constructing three-dimensional point cloud of the grabbing parameters.

Preferably, step 2 specifically comprises:

step 2.1: the RGB images in the data set and the captured parameter semantic label images obtained by conversion are reduced to 224 x 224 pixel size from the original 720 x 1280 pixel size in a neighboring point sampling mode, so that the model operation speed is improved, and the model is adaptive to a network structure;

step 2.2: dividing the data of the RGB image by 255 for normalization; converting the label images of the captured parameters into actual classification numerical values from the classification in the images;

step 2.3: and randomly scrambling the data in the first 100 scenes for training in the data set according to the proportion of 8:2, and dividing the data into training data and test data.

Preferably, step 3 specifically comprises:

step 3.1: constructing a six-degree-of-freedom capture detection semantic segmentation network;

step 3.2: setting an optimizer, a loss function, a batch of loading data and iteration times;

step 3.3: and loading the training data and the test data after the data processing by using an iterator so as to train the model.

Preferably, step 4 specifically comprises:

step 4.1: performing data preprocessing operation including picture reduction and normalization on the RGB image;

step 4.2: and sending the RGB image subjected to data preprocessing into a six-degree-of-freedom capture detection semantic segmentation network which completes training to obtain a predicted capture parameter semantic image.

Preferably, step 5 specifically comprises:

step 5.1: capturing a parameter semantic image with the predicted size of 224 x 224 pixels, and restoring the semantic image to the original size of 720 x 1280 pixels in a mode of sampling adjacent points;

step 5.2: screening out a series of grabbing parameters representing background classification;

step 5.3: through a dictionary or an array, restoring the screened grabbing parameters obtained through prediction into corresponding actual numerical values through classification;

step 5.4: through the input depth image, the depth of the corresponding pixel position in the depth image is obtained by the image information index of the grabbing point, and then the coordinate of the grabbing point under the three-dimensional coordinate is calculated through the depth value;

step 5.5: calculating to obtain a rotation vector through a unit direction vector R of the rotation vector and an angle theta rotating around the unit direction vector, and converting the rotation vector into a rotation attitude matrix R to obtain a grabbing parameter G = [ X, Y, Z, R, d, w ] in a camera coordinate system;

where (X, Y, Z) represents three-dimensional coordinates of the grasping point in the camera coordinate system, R represents a 3 × 3 rotation posture matrix, and d and w are a grasping depth of the gripper near the object in the approaching direction and an opening width of the gripper.

Preferably, step 6 specifically includes:

step 6.1: converting the predicted grabbing parameters relative to the camera coordinate system into grabbing parameters relative to the world coordinate system through a homogeneous transformation matrix of a depth camera used in an actual environment in the world coordinate system;

step 6.2: planning the tail end attitude and the running path of the robot by the predicted grabbing parameters and inverse kinematics calculation;

step 6.3: the robot is driven to the predicted grasp attitude, and the article is grasped by a gripper provided at the end of the robot.

Therefore, the six-degree-of-freedom grabbing detection algorithm based on the semantic segmentation network, which adopts the structure, takes the RGBD image as input, so that a network model is more stable and robust, and the grabbing posture is predicted in a classification rather than regression mode after being transformed, so that the prediction result is more accurate.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a schematic diagram illustrating steps of an embodiment of a six-degree-of-freedom capture detection algorithm based on a semantic segmentation network according to the present invention;

FIG. 2 is a schematic flow chart illustrating a process of acquiring a six-degree-of-freedom grabbed data set according to an embodiment of the present invention;

FIG. 3 is a flowchart of a data conversion method for converting a capture parameter array into a capture parameter semantic tag image according to an embodiment of the present invention;

FIG. 4 is an exemplary diagram of a captured parameter semantic tag image obtained by conversion according to an embodiment of the present invention;

FIG. 5 is a six-degree-of-freedom capture detection semantic segmentation model according to an embodiment of the present invention;

FIG. 6 is a flow chart of model training, prediction and data post-processing according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating a display of a predicted capture in a three-dimensional scene point cloud after a post-processing and encoding/decoding semantic segmentation network according to an embodiment of the present invention.

Detailed Description

The technical solution of the present invention is further illustrated by the accompanying drawings and examples.

Examples

As shown in fig. 1, a six-degree-of-freedom capture detection algorithm based on a semantic segmentation network includes the following steps:

step 1: and acquiring a six-degree-of-freedom grabbing detection data set, and converting the grabbing parameter array into a grabbing parameter semantic label image by a designed data conversion method.

The step 1 specifically comprises the following steps:

step 1.1: as shown in fig. 2, a GraspNet-1Billion six-degree-of-freedom grabbing detection dataset is obtained, and grabbing tags in the dataset are read and converted.

The grabbing parameters of the GraspNet-1Billion six-degree-of-freedom grabbing detection data set comprise: collision tags, grab tags, and tags of the number and pose of objects in the scene. The grabbing tag is a grabbing pose for different objects and comprises coordinates of grabbing points, an approach vector, a rotation angle in a plane, a grabbing depth, an opening and closing width and a grabbing fraction. The approach vector here refers to a direction vector in which the gripper approaches the object, and the rotation angle in the plane refers to an angle around which the gripper rotates. The collision tag gives a binary boolean property that different objects are placed in the scene. The object number and the pose label store the number of the object put in different scenes and the pose of the object in the scenes.

Six-degree-of-freedom grabbing can realize that the end effector can grab an object from any direction, and the six-degree-of-freedom grabbing method is expressed as follows:

G＝[X,Y,Z,R,d,w]

The model output obtained by the six-degree-of-freedom grabbing detection semantic segmentation network is as follows:

G_cls＝[point,r,θ,d,w]

the output gripping parameters are all classification numbers, where point denotes whether or not it is a gripping point, r and θ denote classification of a unit direction vector of a rotation vector after converting the rotation posture matrix into the rotation vector and classification of an angle of rotation around the unit direction vector, respectively, and d and w are classification of a gripping depth and classification of an opening width of the gripper.

The specific process of acquiring and screening the grabbing parameters is as follows:

firstly, according to the scene number and the camera view angle number in the data set, the collision tag of the scene, the number of the object in the scene and the pose of the object under the camera view angle corresponding to the scene are indexed.

And secondly, reading in the grabbing label corresponding to the object according to the serial number of the object.

Converting the three-dimensional coordinates of the grabbing points in the grabbing label relative to the object coordinate system into the three-dimensional coordinates of the grabbing points in the camera coordinate system through the postures of the objects, wherein the calculation formula is as follows:

in the formula, point _ obj refers to three-dimensional coordinates of a capture Point relative to an object coordinate system under a capture tag in a data set, point _ cam is three-dimensional coordinates of the capture Point relative to a camera coordinate system, and obj _ position is a 4 × 4 matrix of the object relative to the camera coordinate system.

And fourthly, calculating to obtain a rotation attitude matrix relative to the object coordinate system according to the number of the proximity vector in the grabbing label and the number of the rotation angle in the plane.

Fifthly, converting the calculated rotation attitude matrix into a rotation attitude matrix under a camera coordinate system through the position and attitude of the object, wherein the calculation formula is as follows;

R＝R _{obj_pose} ·R_obj

where view and ang denote the rotation angles in the plane and the proximity vector in the grab tag of the dataset, R ₁ 、R ₂ Is a 3 × 3 rotation matrix, R, generated therefrom _{obj_pose} A 3 × 3 rotation matrix in the pose matrix representing the object, R _ obj is represented by R ₁ And R ₂ A rotation attitude matrix obtained by calculation relative to the object coordinate system, and R is a phase obtained by the rotation matrix of the objectA rotation attitude matrix for the camera coordinate system.

And sixthly, screening and retaining a series of grabbing parameters which are not collided and have the highest grabbing scores according to the collision labels and the grabbing scores.

Step 1.2: and converting the specific numerical values of the grabbing depth and the opening and closing width in the grabbing label obtained after the processing into classification.

Firstly, d and w in the grabbing parameters are converted into classification numbers with classification numbers of 4 and 16 respectively through a dictionary:

d_dict＝{0:0.01,1:0.02,2:0.03,3:0.04}

according to the conversion dictionary for grabbing the depth, the depth data in the d array is converted into corresponding classification from specific numerical values such as 0.01, 0.02 and the like.

Similarly, the opening/closing width w is converted into a classification by a dictionary, and before the classification is converted, a rounding operation for retaining two decimal places is also required for the opening/closing width.

w_dict＝{0:0,1:0.01,2:0.02,3:0.03,4:0.04；5:0.05,6:0.06,7:0.07；8:0.08；9:0.09,10:0.10；11:0.11；12:0.12；13:0.13；14:0.14；15:0.15}

Step 1.3: and converting the rotation posture matrix in the captured label obtained after the processing into a rotation vector, decomposing the rotation vector into a unit direction vector and an angle rotating around the unit direction vector, and converting to obtain the classification of the unit direction vector and the angle rotating around the unit direction vector through a classification matching template.

To convert the rotation attitude matrix into a classification, the rotation attitude matrix is first converted into a rotation vector, which is denoted by v = θ · r, where r denotes a unit direction vector of the rotation vector and θ denotes an angle of rotation around the unit direction vector. θ can be solved by rotating the trace of the attitude matrix:

a unit direction vector of the rotation vector is further defined as r = (r) _x ，r _y ，r _z ) Then, thenr can be calculated by the following formula:

next, a unit direction vector and a classification matching template rotated by an angle around the unit direction vector are constructed, and the classification of the template is set to 255 types.

The classification matching template of the unit direction vector is to uniformly sample 255 points in a three-dimensional unit sphere space, the points and the origin of the center of the sphere form 255 unit direction vectors, and the calculation formula is as follows:

in the formula (x) _n ，y _n ，z _n ) A coordinate value representing a three-dimensional space, N being the total number of sample points, here 255, N representing the nth sample,

representing the golden section ratio, here taking the value

The sampling points and the sampling center origin point obtained in the above way can form 255 unit direction vectors which are uniformly distributed in a three-dimensional space.

The classification matching template of the vector rotation angle around the unit direction is a classification matching template array obtained by equally dividing the rotation angle in the range of [0, pi ] according to the classification interval of pi/254.

And finally, matching the unit direction vector r of the rotation vector and the angle theta rotated around the unit direction vector with the unit direction vector and the classification matching template rotated around the unit direction vector one by one respectively. The matching method of the unit direction vector and the unit direction vector classification matching template is determined by calculating cosine similarity of the unit vector and the template vector, the cosine similarity is an index for judging the similarity degree of the two vectors by calculating cosine values of included angles of the two vectors, the closer the calculated cosine values are to 1, the more similar the directions of the two vectors are, and the calculation formula is as follows:

and calculating cosine similarity to obtain the vector with the highest similarity in the unit direction vector of the rotation vector and the unit vector of the unit direction vector classification matching template, and taking the index value corresponding to the vector with the highest similarity as the classification number of the unit direction vector, namely converting the direction vector into a classification integer.

Similarly, the difference between the angle of rotation around the unit direction vector and the angle in the classification matching template array is calculated and squared, the angle of rotation around the unit direction vector and the angle in the classification matching template array with the minimum difference are found, and the index corresponding to the angle is used as the classification number of the angle of rotation around the unit direction vector.

The grabbing parameters are converted into the grabbing parameter semantic label images, the grabbing problems can be converted into semantic segmentation problems, the training effect and the prediction precision of the model are improved, and the grabbing parameters are stored in an image mode, so that the memory space is saved compared with the grabbing parameters stored in an array mode.

As shown in fig. 3, the capture parameters are converted into the capture parameter semantic label image by constructing a three-dimensional point cloud of the capture parameters and a coordinate transformation relationship from camera coordinates to image coordinates. And taking the three-dimensional coordinates of the grabbing points in the grabbing parameters as a position array of the three-dimensional point cloud, and firstly converting the grabbing parameters of whether the grabbing points are the grabbing points, the unit direction vectors, the rotating angles around the unit direction vectors, the grabbing depth and the opening and closing width into a range of [0,1], and then setting the color attributes of the point cloud so as to realize the corresponding matching of the grabbing points and other grabbing parameters. And (4) the point cloud with the color information is subjected to internal reference of the camera, and other capture parameters can be stored into an image by virtue of conversion of an Open3d library.

Transformation formulas among a camera coordinate system, an image coordinate system and a pixel coordinate system are as follows:

in the formula, s _x ，s _y Representing the transform coefficients of the image with a unit pixel length corresponding to the actual length, c _x ，c _y ，f _x ，f _y Representing the internal parameters of the camera. (X, Y, Z) represents coordinates in a camera coordinate system, (X, Y) represents coordinates in an image coordinate system, and (u, v) represents coordinates in a pixel coordinate system. Through the formula, the three-dimensional coordinates of the grabbing point relative to the camera coordinate system can be converted into two-dimensional coordinates in the image coordinate system.

And (3) converting the classification of the grabbing parameters of whether the grabbing points are the grabbing points, the unit direction vector, the rotating angle around the unit direction vector, the grabbing depth and the opening and closing width into a range of [0,1] by the following formula to be used as a color array of the point cloud:

wherein color represents the color array of the range of [0,1] of different grabbing parameters, num _ cls represents the total classification number of the grabbing parameters, current _ cls represents the current classification number, and 1 is added to distinguish the classification of the grabbing parameters from the black background. And (4) converting the obtained capture parameter semantic label image as shown in FIG. 4.

And 2, step: carrying out scaling, normalization, classification reduction and division on RGB images in the data set and captured parameter semantic label images obtained by conversion to obtain a training test set;

the step 2 specifically comprises the following steps:

step 2.1: the RGB images in the data set and the captured parameter semantic label images obtained by conversion are reduced to 224 x 224 pixel size from the original 720 x 1280 pixel size in a neighboring point sampling mode, so that the model operation speed is improved, and the model is suitable for a network structure.

Step 2.2: dividing the data of the RGB image by 255 for normalization so as to improve the model precision and the model convergence rate; converting the label image of the grabbing parameter from the classification in the image into an actual classification numerical value, wherein the conversion formula is as follows:

cls_r＝cls_r_img

cls_θ＝cls_θ_img

where img denotes a classification numerical value of a color in an image, r denotes a unit direction vector of a rotation vector, θ denotes an angle of rotation around the unit direction vector in the rotation vector, d denotes a grip depth at which the gripper approaches an object, and w denotes an opening and closing width of the gripper. Since the actual classification of depth is 4 and the classification of width is 16, the transformation equations for the classification of width and depth are multiplied by coefficients of 4 and 16, respectively, [ ] representing rounding.

Step 2.3: according to the following steps of 8: the ratio of 2 randomly scrambles the data in the first 100 scenes for training in the data set and divides the data into training data and test data.

And step 3: constructing a six-degree-of-freedom capture detection semantic segmentation network, and training the network;

the step 3 specifically comprises the following steps:

step 3.1: and (3) constructing a six-degree-of-freedom capture detection semantic segmentation network.

Semantic segmentation is an important branch of computer vision, and the essence of semantic segmentation is pixel-level classification. The input of the semantic segmentation is an RGB image, and the label is a mask image of a region of interest having the same size as the input image. Different from the classification problem of images, semantic segmentation can not only obtain the classification of different objects in a scene, but also obtain the region and the position of the objects.

A semantic segmentation network selects a U-Net model, and the model has the characteristics of simple network structure, high operation efficiency, suitability for a small sample training set and the like. U-Net adopts an Encoder-Decoder coding and decoding structure, which has similarities with a fully connected network FCN. The Encoder Encoder is mainly used for extracting the features of the image, and in the feature extraction process, the size of the image is gradually reduced, and the key features are continuously extracted; the Decoder is mainly used to restore the resolution of the image, and this process includes splicing with the image obtained from the encoding process, and also includes up-sampling the image with smaller size. The constructed U-Net semantic segmentation network applied to six-degree-of-freedom grabbing detection is shown in FIG. 5.

Step 3.2: setting an optimizer, a loss function, a batch of loaded data, and the number of iterations. The optimizer in the training process is an Adam optimizer, the initial learning rate is set to 0.0005, and the Adam optimizer can automatically adjust according to performances on a test set. Using cross entropy as a loss function:

in the formula, N is the number of samples, y is the predicted value, and a is the tag value. The evaluation index use accuracy is set to be 6 in batch of loaded data and 20 in iteration time epoch.

Step 3.3: and loading the training data and the test data after the data processing by using an iterator to train the model.

And 4, step 4: and sending the RGB image data into a trained six-degree-of-freedom capture detection semantic segmentation network to obtain a predicted capture parameter semantic image.

The step 4 specifically comprises the following steps:

step 4.1: the input data set is used for verifying RGB images in scene data, and the RGB images are subjected to data preprocessing operation including picture reduction and normalization.

And 4.2: and sending the RGB image subjected to data preprocessing into a trained six-degree-of-freedom capture detection semantic segmentation network to obtain a semantic image of the predicted capture parameters of whether the 224 multiplied by 224 pixel size is a capture point, a unit direction vector, an angle rotating around the unit direction vector, a capture depth and an opening and closing width.

And 5: and converting the predicted semantic tag image of the grabbing parameters into the grabbing parameters relative to the camera coordinate system through the proposed post-processing operation.

The step 5 specifically comprises the following steps:

step 5.1: and capturing the parameter semantic image with the predicted 224 × 224 pixel size, and restoring the semantic image to the original 720 × 1280 pixel size in a neighboring point sampling mode.

And step 5.2: a series of grab parameters representing the background classification are screened out.

Step 5.3: and restoring the screened grabbing parameters obtained through prediction into corresponding actual numerical values from classification through a dictionary or an array, wherein the process is the reverse process of converting the grabbing parameters into classification numbers.

Step 5.4: and through the input depth image, the depth of the corresponding pixel position in the depth image is obtained by the image information index of the capture point, and then the coordinate of the capture point under the three-dimensional coordinate is calculated through the depth value.

The step 6 specifically comprises the following steps:

step 6.1: and converting the predicted grabbing parameters relative to the camera coordinate system into the grabbing parameters relative to the world coordinate system through a homogeneous transformation matrix of the depth camera used in the actual environment in the world coordinate system.

Step 6.2: and planning the tail end attitude and the running path of the robot by the predicted grabbing parameters and through inverse kinematics calculation.

Step 6.3: the robot is driven to the predicted gripping pose, and the article is gripped by a gripper disposed at the end of the robot.

Therefore, the six-degree-of-freedom grabbing detection algorithm based on the semantic segmentation network with the structure is adopted, the grabbing parameter array is converted into the grabbing parameter semantic label image, and the grabbing task is converted into the semantic segmentation problem; compared with a depth map or three-dimensional point cloud as input, the method has the advantages that the RGBD image is used as input, the method is more stable, the problem of large-scale six-degree-of-freedom grabbing pose detection under a dense and disordered scene is solved, a network model is more stable and robust, grabbing poses are predicted in a classification rather than regression mode after being transformed, and the prediction result is more accurate.

Finally, it should be noted that: the above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and although the present invention is described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the invention without departing from the spirit and scope of the invention.

Claims

1. A six-degree-of-freedom grabbing detection algorithm based on a semantic segmentation network is characterized by comprising the following steps: the method comprises the following steps:

and 2, step: scaling, normalizing, classifying and restoring the RGB images in the data set and the captured parameter semantic label images obtained by conversion and dividing a training test set;

and step 3: constructing a six-degree-of-freedom capture detection semantic segmentation network, and training the network by using the processed data;

step 6: and (4) driving a robot configured with an end effector to grab the object in the scene through the grabbing parameters obtained by post-processing.

2. The algorithm for six-degree-of-freedom grabbing detection based on the semantic segmentation network as claimed in claim 1, wherein: the step 1 specifically comprises:

3. The algorithm for six-degree-of-freedom grabbing detection based on the semantic segmentation network as claimed in claim 2, wherein: the step 2 specifically comprises the following steps:

step 2.1: the RGB images in the data set and the captured parameter semantic label images obtained through conversion are reduced to 224 x 224 pixel sizes from the original 720 x 1280 pixel sizes through a neighboring point sampling mode, so that the model operation speed is increased, and the model is adaptive to a network structure;

4. The algorithm for six-degree-of-freedom grabbing detection based on the semantic segmentation network as claimed in claim 3, wherein: the step 3 specifically comprises the following steps:

step 3.2: setting an optimizer, a loss function, a batch for loading data and iteration times;

5. The algorithm for six-degree-of-freedom grabbing detection based on the semantic segmentation network as claimed in claim 4, wherein: the step 4 specifically comprises the following steps:

step 4.1: carrying out data preprocessing operation including picture reduction and normalization on the RGB image;

6. The six-degree-of-freedom grab detection algorithm based on the semantic segmentation network as claimed in claim 5, wherein: the step 5 specifically comprises the following steps:

step 5.4: through the input depth image, the depth of the corresponding pixel position in the depth image is obtained through the image information index of the grabbing point, and then the three-dimensional coordinate of the grabbing point in a camera coordinate system is calculated through the depth value;

step 5.5: calculating to obtain a rotation vector through a unit direction vector R of the rotation vector and an angle theta rotating around the unit direction vector, and converting the rotation vector into a rotation attitude matrix R to obtain all capture parameters G = [ X, Y, Z, R, d, w ] in a camera coordinate system;

7. The algorithm for six-degree-of-freedom grab detection based on semantic segmentation network as claimed in claim 6, wherein: the step 6 specifically comprises the following steps: