CN111080693A - Robot autonomous classification grabbing method based on YOLOv3 - Google Patents

Robot autonomous classification grabbing method based on YOLOv3 Download PDF

Info

Publication number
CN111080693A
CN111080693A CN201911159864.7A CN201911159864A CN111080693A CN 111080693 A CN111080693 A CN 111080693A CN 201911159864 A CN201911159864 A CN 201911159864A CN 111080693 A CN111080693 A CN 111080693A
Authority
CN
China
Prior art keywords
target object
yolov
coordinate
target
yolov3
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911159864.7A
Other languages
Chinese (zh)
Inventor
王太勇
冯志杰
韩文灯
彭鹏
张凌雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201911159864.7A priority Critical patent/CN111080693A/en
Publication of CN111080693A publication Critical patent/CN111080693A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J11/00Manipulators not otherwise provided for
    • B25J11/008Manipulators for service tasks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • B25J19/02Sensing devices
    • B25J19/021Optical sensing devices
    • B25J19/023Optical sensing devices including video camera means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Abstract

The invention discloses a robot autonomous classification grabbing method based on YOLOv3, which is characterized by comprising the steps of collecting and constructing a target object sample data set; training a YOLOv3 target detection network to obtain a target object recognition model; collecting a color image and a depth image of a target object; processing the color image by using a trained YOLOv3 target detection network to obtain the category information and the position information of a target object to be captured, and further processing by combining a depth image to obtain point cloud information of the target object; and (3) solving the point cloud information by a minimum bounding box, calculating the main direction of the point cloud by combining a PCA algorithm, calibrating X, Y, Z-axis coordinate data of the target object, and calculating the six-degree-of-freedom pose of the target object relative to a robot coordinate system. According to the invention, a YOLOv3 algorithm is adopted, and object grabbing pose is estimated by point cloud preprocessing, PCA and other methods, so that the robot can grab target objects in a classified manner.

Description

Robot autonomous classification grabbing method based on YOLOv3
Technical Field
The invention relates to a robot autonomous classified grabbing method, in particular to a robot autonomous classified grabbing method based on YOLOv 3.
Background
At present, the population of China is seriously aged, the labor force is in short supply, and the demand on the service robot is more and more increased, but the unstructured environment in which the service robot works also brings a plurality of technical problems, wherein a main problem is the autonomous grabbing of the robot in the unstructured environment. Grabbing is one of the main ways for robots to interact with the real world and is an urgent problem to be solved. Unlike industrial robots that grab workpieces in a structured environment, the automatic grabbing of service robots in an unstructured environment faces many challenges, such as dynamic environment, illumination variation, mutual occlusion between objects, and above all, there are a lot of unknown objects in the unstructured environment in addition to known objects, and most of all, applying a well-established grabbing planning method on most industrial robots relies on obtaining models of objects in advance to build a database, or executing fixed actions under a pre-programmed program, while for service robots working in an unstructured environment, obtaining models of all objects to be grabbed in advance is not practical, so the robots must be able to perform fast, stable and reliable grabbing planning on the unknown objects online. The invention adopts a method in computer vision, extracts a color image and a depth image of a target object through a camera, and then identifies and positions the target object by using a target detection method to obtain the category of the target object and the position represented by a rectangular frame. And then the specific pose of the object can be obtained through image processing and a related algorithm of point cloud, and then the object is grabbed through the mechanical arm. In the aspect of target object identification, the traditional algorithm generally adopts an image processing method to perform edge extraction, surf, sift and other methods to perform feature extraction on an image, and then performs matching with a template. However, the algorithm is easily affected by the working environment, is sensitive to illumination, object shape, size and the like, and has poor robustness and weak generalization capability.
Disclosure of Invention
The invention provides the robot autonomous classified grabbing method based on the YOLOv3, which has better robustness and strong generalization capability, for solving the technical problems in the prior art.
The technical scheme adopted by the invention for solving the technical problems in the prior art is as follows: a robot autonomous classification grabbing method based on YOLOv3 is characterized by comprising the steps of collecting and constructing a sample data set of a target object; training a YOLOv3 target detection network by using the sample data set to obtain a target object recognition model; collecting a color image and a depth image of a target object to be grabbed; processing the color image by adopting a trained YOLOv3 target detection network to obtain the category information and the position information of a target object to be grabbed, introducing a depth image for further processing to obtain point cloud information of the target object to be grabbed; and (3) solving the point cloud information by a minimum bounding box, calculating the main direction of the point cloud by combining a PCA algorithm, calibrating X, Y, Z-axis coordinate data of the target object to be grabbed, and calculating the six-degree-of-freedom pose of the target object to be grabbed relative to a robot coordinate system.
Further, the step of collecting and constructing the target object sample data set comprises:
step a, using a kinect camera to acquire images of various target objects and acquiring images of various target object combinations;
step b, constructing a sample data set conforming to the YOLOv3 target detection network, and enabling the sample data set to be as follows: 1: the scale of 1 is divided into a training set, a validation set, and a test set.
Further, the step of training the YOLOv3 target detection network is as follows: the method comprises the steps of constructing a YOLOv3 target detection network comprising a Darknet-53 network, firstly inputting a sample data set into the Darknet-53 neural network, carrying out down-sampling by changing the step length of a convolution kernel in the Darknet-53 neural network, and simultaneously splicing up-sampling results of a middle layer and an output layer of the YOLOv3 target detection network to obtain three feature maps with different scales.
Furthermore, when the image information of the combination of the multiple target objects is processed, the class marking and the position marking are carried out on the objects in the image through an image marking tool.
Further, the position information of the target object is represented by a rectangular frame; the rectangular box calculation method is as follows:
Figure BDA0002285771940000021
in the formula:
bxcoordinates in the X direction of the center point of the object bounding box predicted by YOLOV 3;
bycoordinate in Y direction of center point of boundary box of object predicted by YOLOV 3;
bwthe predicted width of the object bounding box in the X direction for YOLOV 3;
bhthe predicted Y-direction width of the object bounding box for YOLOV 3;
cxcoordinates in the X direction of the upper left corner of a grid on the feature map are obtained;
cycoordinates of the upper left corner of the grid on the feature map in the Y direction are obtained;
txthe predicted X-coordinate offset value of the target object for YOLOV 3;
tythe predicted Y-coordinate offset value of the target object for YOLOV 3;
twa lateral scale scaling value of the target object predicted for YOLOV 3;
tha longitudinal scale scaling value of the target object predicted for YOLOV 3;
pwthe preset transverse dimension of the anchor frame on the characteristic diagram is adopted;
phthe longitudinal dimension of a preset anchor frame on the characteristic diagram is obtained;
sigma is sigmoid function.
Further, through traversing the pixel mask of the target object in the rectangular frame, and combining the depth image, calculating point cloud information of the target object; the calculation formula is as follows:
Figure BDA0002285771940000031
in the formula:
xwis the coordinate of the object in the X direction under the camera coordinate system;
ywis the coordinate of the object in the Y direction under the camera coordinate system;
zwis the Z-direction coordinate of an object under a camera coordinate system;
zcdepth information of an object under a camera coordinate system;
u is the horizontal coordinate of the pixel point under the pixel coordinate system;
v is the coordinate of the pixel point in the vertical direction under the pixel coordinate system;
u0the pixel coordinate of the image center in the horizontal direction;
v0pixel coordinates in the vertical direction of the center of the image are taken;
f is the focal length of the camera;
wherein u is0、v0And f is a camera parameter and is obtained by calibrating a camera.
Further, the RGB-D image information of the target object is acquired by a vision sensor.
Further, a depth image of the target object is acquired by the Kinect depth camera.
And further, calculating the six-degree-of-freedom pose of the target object relative to a camera coordinate system according to the obtained X, Y and Z axis coordinates of the target object, and obtaining the six-degree-of-freedom pose of the target object under the robot base coordinate by combining the hand-eye calibration result.
The invention has the advantages and positive effects that: the method adopts a YOLO V3 target detection algorithm in computer vision to identify and locate the target object. The algorithm is mature at present, the accuracy and the speed of the algorithm are higher than those of the previous target detection algorithm, and the algorithm is very suitable for classified grabbing of robots with high real-time requirements. Through analysis of a YOLO V3 target detection algorithm result, estimation of object grabbing pose is carried out through point cloud preprocessing, PCA and other methods, so that classified grabbing of the robot on the target object is achieved, and automatic grabbing of the mechanical arm is achieved. The method has the advantages of good robustness and strong generalization capability.
Drawings
FIG. 1 is a schematic workflow diagram of the present invention.
Detailed Description
For further understanding of the contents, features and effects of the present invention, the following embodiments are enumerated in conjunction with the accompanying drawings, and the following detailed description is given:
the English Chinese in this application is explained as follows:
YOLOv 3: a single-stage target detection algorithm proposed by Joseph Redmon in 2018;
PCA: a principal component analysis method;
darknet-53: the deep convolutional neural network is used for extracting image features and is a core module of a YOLOv3 algorithm;
kinect: and the visual sensor can obtain RGB information and depth information of the object.
RGB: a three-channel color image;
RGB-D: a total name of three-channel color images and depth images;
R-CNN: regional convolutional neural networks, a target detection algorithm proposed by Ross Girshick et al 2014.
Fast R-CNN: a fast regional convolutional neural network, a target detection algorithm proposed by Ross Girshick et al in 2015 for R-CNN detection speed.
SSD: a single-stage target detector for multiple categories, the target detection algorithm proposed by Wei Liu et al in 2016;
cell: a grid;
ROS: the robot operating system writes a software architecture with high flexibility of a robot software program;
message: a communication mechanism in the ROS robot operating system;
an anchor: an anchor frame;
confidence: a confidence level;
referring to fig. 1, a robot autonomous classification grabbing method based on YOLOv3 collects and constructs a sample data set of a target object; training a YOLOv3 target detection network by using the sample data set to obtain a target object recognition model; collecting a color image and a depth image of a target object to be grabbed; processing the color image by adopting a trained YOLOv3 target detection network to obtain the category information and the position information of a target object to be grabbed, introducing a depth image for further processing to obtain point cloud information of the target object to be grabbed; and (3) solving the point cloud information by a minimum bounding box, calculating the main direction of the point cloud by combining a PCA algorithm, calibrating X, Y, Z-axis coordinate data of the target object to be grabbed, and calculating the six-degree-of-freedom pose of the target object to be grabbed relative to a robot coordinate system. A target object recognition model based on a YOLOv3 target detection network is built, a YOLOv3 target detection network is trained by using a sample data set of the collected and built target object, and category information, position information and the like of the target object are obtained.
Preferably, the step of acquiring and constructing the target object sample data set may be as follows:
step a, a kinect camera can be used for carrying out image acquisition on various target objects, and image acquisition can be carried out on various target object combinations;
step b, constructing a sample data set conforming to the Yolov3 target detection network, and enabling the sample data set to be as follows: 1: the scale of 1 is divided into a training set, a validation set, and a test set.
Preferably, the step of training the YOLOv3 target detection network may be: the YOLOv3 target detection network comprising the Darknet-53 network can be constructed, a sample data set can be input into the Darknet-53 neural network firstly, downsampling can be carried out by changing the step length of a convolution kernel in the Darknet-53 neural network, and meanwhile, upsampling results of a middle layer and an output layer of the YOLOv3 target detection network can be spliced to obtain three feature maps with different scales.
Preferably, when processing image information of a combination of multiple target objects, the image annotation tool can be used to perform category annotation and position annotation on the objects in the image.
Preferably, the position information of the target object may be represented by a rectangular frame; the rectangular box calculation method can be shown as the following formula:
Figure BDA0002285771940000051
in the formula:
bxcoordinates in the X direction of the center point of the object bounding box predicted by YOLOV 3;
bycoordinate in Y direction of center point of boundary box of object predicted by YOLOV 3;
bwthe predicted width of the object bounding box in the X direction for YOLOV 3;
bhthe predicted Y-direction width of the object bounding box for YOLOV 3;
cxcoordinates in the X direction of the upper left corner of a grid on the feature map are obtained;
cycoordinates of the upper left corner of the grid on the feature map in the Y direction are obtained;
txthe predicted X-coordinate offset value of the target object for YOLOV 3;
tythe predicted Y-coordinate offset value of the target object for YOLOV 3;
twa lateral scale scaling value of the target object predicted for YOLOV 3;
tha longitudinal scale scaling value of the target object predicted for YOLOV 3;
pwthe preset transverse dimension of the anchor frame on the characteristic diagram is adopted;
phthe longitudinal dimension of a preset anchor frame on the characteristic diagram is obtained;
sigma is sigmoid function.
σ is sigmoid function, which can be used to convert tx、tyCompressed at [0,1 ]]And the interval prevents excessive deviation.
Preferably, the point cloud information of the target object can be calculated by traversing the pixel mask of the target object in the rectangular frame and combining the depth image; the calculation method can be shown as the following formula:
Figure BDA0002285771940000061
in the formula:
xwis the coordinate of the object in the X direction under the camera coordinate system;
ywis the coordinate of the object in the Y direction under the camera coordinate system;
zwis the Z-direction coordinate of an object under a camera coordinate system;
zcdepth information of an object under a camera coordinate system;
u is the horizontal coordinate of the pixel point under the pixel coordinate system;
v is the coordinate of the pixel point in the vertical direction under the pixel coordinate system;
u0the pixel coordinate of the image center in the horizontal direction;
v0pixel coordinates in the vertical direction of the center of the image are taken;
f is the focal length of the camera;
wherein u is0、v0And f is a camera parameter which can be obtained by calibrating a camera.
Preferably, the RGB-D image information of the target object may be acquired with a vision sensor.
Preferably, the depth image of the target object may be acquired by a Kinect depth camera.
Preferably, the six-degree-of-freedom pose of the target object under the robot base coordinate system can be calculated according to the obtained X, Y and Z axis coordinates of the target object, and the six-degree-of-freedom pose of the target object under the robot base coordinate system can be obtained by combining the hand-eye calibration result.
The working process and working principle of the invention are further explained below with reference to the preferred embodiments of the invention:
the YOLOv3 target detection network is an algorithm with the strongest comprehensive performance in the current target detection, the algorithm is mature, the precision is high, the speed is high, and the method is well applied to the robot field and the unmanned field at present. In short, the Prior detection (Prior detection) system of YOLOv3 reuses the classifier or locator for performing the detection task. Applying the model to multiple locations and scales of the feature map improves the accuracy of the identification of small objects. And further performing boundary box regression on the anchor boxes with higher scores by a target scoring method. Furthermore, the network uses a completely different approach to other object detection methods. A neural network is applied to the entire image, which divides the image into different regions, thus predicting the bounding box and probability of each block region, which will be weighted by the predicted probability. In contrast to classifier-based systems, it looks at the entire image under test, so its prediction exploits global information in the image. Unlike R-CNN, which requires thousands of single target images, it predicts through a single network evaluation. This makes Yolov3 very Fast, typically 1000 times faster than R-CNN and 100 times faster than Fast R-CNN. It is also more accurate than SSD single-stage detectors and is about three times faster than SSD. In view of its excellent performance, and excellent real-time performance.
Firstly, acquiring position information and category information of a target object
In the invention, a model based on a YOLOv3 target detection network is adopted for target identification, and the identified objects are various types of fruits (including 13 types of fruits such as bananas, apples, carambola, cherries, grapes and strawberries) and the specific steps can be as follows:
1. various fruits are photographed using a kinect camera and different combinations of fruits are photographed.
2. Making a data set conforming to the YOLOv3 network, and arranging the data set according to the following ratio of 5: 1: the scale of 1 is divided into a training set, a validation set, and a test set.
3. Model training is carried out on the YOLOv3 target detection network by utilizing a training set and a verification set in a data set, firstly, a deep CNN network Darknet-53 is passed through, downsampling is carried out on the network by changing the step length of a convolution kernel, and simultaneously, upsampling results of a middle layer and a later network layer of the network are spliced, so that three characteristic graphs with different scales are obtained, and the estimation of objects with different degrees is realized. The method mainly improves the identification precision and positioning precision of the objects such as strawberries, cherries and the like.
4. The three feature maps are divided into small grids (cells) of corresponding sizes, and three boxes (bounding boxes) are predicted for each grid.
5. Before prediction, logistic regression is used for scoring the target of each box, namely the possibility of predicting the position of the block to be a target is high, partial unnecessary anchor frames are eliminated, and the optimal anchor frame is selected for carrying out next boundary frame regression, so that the calculation amount is reduced.
6. Each box contains five basic parameters (x, y, w, h, confidence) and category information. Output (t)x,ty,tw,th,to) The (x, y, w, h, confidence) of the object can be calculated by the following formula
Figure BDA0002285771940000081
In formula 1:
bxcoordinates in the X direction of the center point of the object bounding box predicted by YOLOV 3;
bycoordinate in Y direction of center point of boundary box of object predicted by YOLOV 3;
bwthe predicted width of the object bounding box in the X direction for YOLOV 3;
bhthe predicted Y-direction width of the object bounding box for YOLOV 3;
cxcoordinates in the X direction of the upper left corner of a grid on the feature map are obtained;
cycoordinates of the upper left corner of the grid on the feature map in the Y direction are obtained;
txthe predicted X-coordinate offset value of the target object for YOLOV 3;
tythe predicted Y-coordinate offset value of the target object for YOLOV 3;
twa lateral scale scaling value of the target object predicted for YOLOV 3;
tha longitudinal scale scaling value of the target object predicted for YOLOV 3;
pwthe preset transverse dimension of the anchor frame on the characteristic diagram is adopted;
phfor preset anchor frame in characteristic diagramA longitudinal dimension of;
sigma is sigmoid function.
7. And establishing a message in the ROS, and issuing the position information, the category information and the confidence coefficient of the picture after being identified by the YOLO V3.
Secondly, acquiring a point cloud picture of the target object
After the picture is identified by the YOLOV3 algorithm, the position information and the category information of the object in the picture are output, the specified positions in the RGB map and the depth map are intercepted by combining the RGB map and the depth map of the picture, namely, the target object is intercepted from the RGB map and the depth map, and the point cloud picture of the area where the target object is located is calculated by utilizing the pixel information on the RGB map and the depth information on the depth map. Before the point cloud picture is obtained, camera internal reference calibration and depth picture and RGB picture registration are needed to be carried out on the Kinect camera. The specific steps can be as follows:
1. and performing internal reference calibration and RGB (red, green and blue) image and depth image registration on the Kinect camera by using a Kinect toolkit in the ROS system.
2. In the ROS, subscribing to RGB map and depth map topics of a kinect camera and receiving published message messages.
3. And extracting the corresponding RGB map and depth map according to the received ROS message. And obtaining an RGB (red, green and blue) image and a depth image corresponding to the target object.
4. And traversing each point in the RGB map and the depth map, and calculating the position information of each point in the image by combining the formula (2). Wherein u is0、v0F is camera reference, u and v are pixel coordinates. And adding the obtained space coordinates of each pixel point into the point cloud, thereby constructing point cloud information of the target object.
Figure BDA0002285771940000091
In formula 2:
xwis the coordinate of the object in the X direction under the camera coordinate system;
ywis the coordinate of the object in the Y direction under the camera coordinate system;
zwis the Z-direction coordinate of an object under a camera coordinate system;
zcdepth information of an object under a camera coordinate system;
u is the horizontal coordinate of the pixel point under the pixel coordinate system;
v is the coordinate of the pixel point in the vertical direction under the pixel coordinate system;
u0the pixel coordinate of the image center in the horizontal direction;
v0pixel coordinates in the vertical direction of the center of the image are taken;
f is the focal length of the camera;
wherein u is0、v0And f is a camera parameter which can be obtained by calibrating a camera.
And thirdly, estimating six degrees of freedom of the target object based on PCA.
Pca (principal Component analysis), a principal Component analysis method, is one of the most widely used data dimension reduction algorithms. The main idea of PCA is to map n-dimensional features onto k-dimensions, which are completely new orthogonal features, also called principal components, and k-dimensional features reconstructed on the basis of the original n-dimensional features. The PCA algorithm utilizes the covariance matrix to calculate the degree of dispersion of the sample set in different directions. The task of PCA is to find a set of mutually orthogonal axes in order from the original space, the selection of new axes being strongly dependent on the data itself. The first new coordinate axis is selected to be the direction with the largest square difference in the original data, the second new coordinate axis is selected to be the plane which is orthogonal to the first coordinate axis and enables the square difference to be the largest, and the third axis is the plane which is orthogonal to the 1 st axis and the 2 nd axis and enables the square difference to be the largest. By analogy, n such coordinates can be obtained. The Principal Component Analysis (PCA) method can be used for extracting the principal direction of the point cloud to obtain the principal direction of the point cloud object. And then converting the main direction of the obtained point cloud into quaternion information required by the mechanical arm to grab. The specific steps can be as follows:
1. and denoising the obtained point cloud picture, and filtering outliers and noise points in the point cloud.
2. And performing conditional filtering on the point cloud, removing the point cloud of the plane where the object is located, and keeping the point cloud information of the object.
3. Point cloud is sparse, one point is reserved in each voxel to represent other points through a method of voxel grid, the down sampling of point cloud data is realized, and the size of the grid is adjusted to change the down sampling proportion. By the point cloud sparse method, the data volume of the point cloud is reduced on the premise of keeping object characteristics, the calculated amount is reduced, and the calculation efficiency is improved.
4. And performing PCA calculation on the point clouds in the two steps to obtain a main direction coordinate system of the object point cloud.
5. And calculating the minimum bounding box of the point cloud data, and calculating the six-degree-of-freedom pose of the target point cloud under a camera coordinate system.
6. And converting the point cloud pose under the camera coordinate system into the robot coordinate system to obtain the six-degree-of-freedom pose of the target object under the robot coordinate system.
The above-mentioned embodiments are only for illustrating the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and to carry out the same, and the present invention shall not be limited to the embodiments, i.e. the equivalent changes or modifications made within the spirit of the present invention shall fall within the scope of the present invention.

Claims (9)

1. A robot autonomous classification grabbing method based on YOLOv3 is characterized by comprising the steps of collecting and constructing a sample data set of a target object; training a YOLOv3 target detection network by using the sample data set to obtain a target object recognition model; collecting a color image and a depth image of a target object to be grabbed; processing the color image by adopting a trained YOLOv3 target detection network to obtain the category information and the position information of a target object to be grabbed, introducing a depth image for further processing to obtain point cloud information of the target object to be grabbed; and (3) solving the point cloud information by a minimum bounding box, calculating the main direction of the point cloud by combining a PCA algorithm, calibrating X, Y, Z-axis coordinate data of the target object to be grabbed, and calculating the six-degree-of-freedom pose of the target object to be grabbed relative to a robot coordinate system.
2. The YOLOv 3-based robot autonomous classified grabbing method of claim 1, wherein the step of collecting and constructing the target object sample dataset is as follows:
step a, using a kinect camera to acquire images of various target objects and acquiring images of various target object combinations;
step b, constructing a sample data set conforming to the YOLOv3 target detection network, and enabling the sample data set to be as follows: 1: the scale of 1 is divided into a training set, a validation set, and a test set.
3. The YOLOv 3-based robot autonomous classification capture method according to claim 1, wherein the training of the YOLOv3 target detection network comprises the steps of: the method comprises the steps of constructing a YOLOv3 target detection network comprising a Darknet-53 network, firstly inputting a sample data set into the Darknet-53 neural network, carrying out down-sampling by changing the step length of a convolution kernel in the Darknet-53 neural network, and simultaneously splicing up-sampling results of a middle layer and an output layer of the YOLOv3 target detection network to obtain three feature maps with different scales.
4. The YOLOv 3-based robot autonomous classified capture method according to claim 1, wherein when processing image information of a combination of multiple target objects, class labeling and position labeling are performed on the objects in the image by an image labeling tool.
5. The YOLOv 3-based robot autonomous classified grabbing method of claim 1, wherein the position information of the target object is represented by a rectangular frame; the rectangular box calculation method is as follows:
Figure FDA0002285771930000011
in the formula:
bxcoordinates in the X direction of the center point of the object bounding box predicted by YOLOV 3;
bycoordinate in Y direction of center point of boundary box of object predicted by YOLOV 3;
bwthe predicted width of the object bounding box in the X direction for YOLOV 3;
bhthe predicted Y-direction width of the object bounding box for YOLOV 3;
cxcoordinates in the X direction of the upper left corner of a grid on the feature map are obtained;
cycoordinates of the upper left corner of the grid on the feature map in the Y direction are obtained;
txthe predicted X-coordinate offset value of the target object for YOLOV 3;
tythe predicted Y-coordinate offset value of the target object for YOLOV 3;
twa lateral scale scaling value of the target object predicted for YOLOV 3;
tha longitudinal scale scaling value of the target object predicted for YOLOV 3;
pwthe preset transverse dimension of the anchor frame on the characteristic diagram is adopted;
phthe longitudinal dimension of a preset anchor frame on the characteristic diagram is obtained;
sigma is sigmoid function.
6. The YOLOv 3-based robot autonomous classification grabbing method according to claim 5, wherein point cloud information of a target object is calculated by traversing a pixel mask of the target object in a rectangular frame in combination with a depth image; the calculation formula is as follows:
Figure FDA0002285771930000021
in the formula:
xwis the coordinate of the object in the X direction under the camera coordinate system;
ywis the coordinate of the object in the Y direction under the camera coordinate system;
zwis the Z-direction coordinate of an object under a camera coordinate system;
zcfor objects in the camera coordinate systemDepth information of (2);
u is the horizontal coordinate of the pixel point under the pixel coordinate system;
v is the coordinate of the pixel point in the vertical direction under the pixel coordinate system;
u0the pixel coordinate of the image center in the horizontal direction;
v0pixel coordinates in the vertical direction of the center of the image are taken;
f is the focal length of the camera;
wherein u is0、v0And f is a camera parameter and is obtained by calibrating a camera.
7. The YOLOv 3-based robot autonomous classified capture method of claim 1, wherein a vision sensor is used to capture RGB-D image information of a target object.
8. The YOLOv 3-based robot autonomous classified grabbing method according to claim 1, wherein the depth image of the target object is acquired by a Kinect depth camera.
9. The YOLOv 3-based robot autonomous classified grabbing method according to claim 7 or 8, wherein the pose of the target object in six degrees of freedom relative to the camera coordinate system is calculated according to the obtained X, Y and Z axis coordinates of the target object, and the pose of the target object in six degrees of freedom in the robot base coordinate is obtained by combining the hand-eye calibration result.
CN201911159864.7A 2019-11-22 2019-11-22 Robot autonomous classification grabbing method based on YOLOv3 Pending CN111080693A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911159864.7A CN111080693A (en) 2019-11-22 2019-11-22 Robot autonomous classification grabbing method based on YOLOv3

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911159864.7A CN111080693A (en) 2019-11-22 2019-11-22 Robot autonomous classification grabbing method based on YOLOv3

Publications (1)

Publication Number Publication Date
CN111080693A true CN111080693A (en) 2020-04-28

Family

ID=70311401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911159864.7A Pending CN111080693A (en) 2019-11-22 2019-11-22 Robot autonomous classification grabbing method based on YOLOv3

Country Status (1)

Country Link
CN (1) CN111080693A (en)

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553949A (en) * 2020-04-30 2020-08-18 张辉 Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning
CN111598172A (en) * 2020-05-18 2020-08-28 东北大学 Dynamic target grabbing posture rapid detection method based on heterogeneous deep network fusion
CN111645080A (en) * 2020-05-08 2020-09-11 覃立万 Intelligent service robot hand-eye cooperation system and operation method
CN111783537A (en) * 2020-05-29 2020-10-16 哈尔滨莫迪科技有限责任公司 Two-stage rapid grabbing detection method based on target detection characteristics
CN111975783A (en) * 2020-08-31 2020-11-24 广东工业大学 Robot grabbing detection method and system
CN112070736A (en) * 2020-09-01 2020-12-11 上海电机学院 Object volume vision measurement method combining target detection and depth calculation
CN112183485A (en) * 2020-11-02 2021-01-05 北京信息科技大学 Deep learning-based traffic cone detection positioning method and system and storage medium
CN112396655A (en) * 2020-11-18 2021-02-23 哈尔滨工程大学 Point cloud data-based ship target 6D pose estimation method
CN112529948A (en) * 2020-12-25 2021-03-19 南京林业大学 Mature pomegranate positioning method based on Mask R-CNN and 3-dimensional sphere fitting
CN112614182A (en) * 2020-12-21 2021-04-06 广州熙锐自动化设备有限公司 Method for identifying machining position based on deep learning, storage device and mobile terminal
CN112733640A (en) * 2020-12-29 2021-04-30 武汉中海庭数据技术有限公司 Traffic indicator lamp positioning and extracting method and system based on point cloud high-precision map
CN112720477A (en) * 2020-12-22 2021-04-30 泉州装备制造研究所 Object optimal grabbing and identifying method based on local point cloud model
CN112750163A (en) * 2021-01-19 2021-05-04 武汉理工大学 Port ship shore power connection method and system and computer readable storage medium
CN112884825A (en) * 2021-03-19 2021-06-01 清华大学 Deep learning model-based grabbing method and device
CN112927297A (en) * 2021-02-20 2021-06-08 华南理工大学 Target detection and visual positioning method based on YOLO series
CN112936275A (en) * 2021-02-05 2021-06-11 华南理工大学 Mechanical arm grabbing system based on depth camera and control method
CN113111712A (en) * 2021-03-11 2021-07-13 稳健医疗用品股份有限公司 AI identification positioning method, system and device for bagged product
CN113129449A (en) * 2021-04-16 2021-07-16 浙江孔辉汽车科技有限公司 Vehicle pavement feature recognition and three-dimensional reconstruction method based on binocular vision
CN113246140A (en) * 2021-06-22 2021-08-13 沈阳风驰软件股份有限公司 Multi-model workpiece disordered grabbing method and device based on camera measurement
CN113284129A (en) * 2021-06-11 2021-08-20 梅卡曼德(北京)机器人科技有限公司 Box pressing detection method and device based on 3D bounding box
CN113524194A (en) * 2021-04-28 2021-10-22 重庆理工大学 Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning
CN113537096A (en) * 2021-07-21 2021-10-22 常熟理工学院 ROS-based AGV forklift storage tray identification and auxiliary positioning method and system
CN113627478A (en) * 2021-07-08 2021-11-09 深圳市优必选科技股份有限公司 Target detection method, target detection device and robot
CN113723389A (en) * 2021-08-30 2021-11-30 广东电网有限责任公司 Method and device for positioning strut insulator
CN113723217A (en) * 2021-08-09 2021-11-30 南京邮电大学 Object intelligent detection method and system based on yolo improvement
CN113808205A (en) * 2021-08-31 2021-12-17 华南理工大学 Rapid dynamic target grabbing method based on detection constraint
CN113984037A (en) * 2021-09-30 2022-01-28 电子科技大学长三角研究院(湖州) Semantic map construction method based on target candidate box in any direction
CN114170521A (en) * 2022-02-11 2022-03-11 杭州蓝芯科技有限公司 Forklift pallet butt joint identification positioning method
CN114683251A (en) * 2022-03-31 2022-07-01 上海节卡机器人科技有限公司 Robot grabbing method and device, electronic equipment and readable storage medium
CN114723827A (en) * 2022-04-28 2022-07-08 哈尔滨理工大学 Grabbing robot target positioning system based on deep learning
CN114897999A (en) * 2022-04-29 2022-08-12 美的集团(上海)有限公司 Object pose recognition method, electronic device, storage medium, and program product
CN114926527A (en) * 2022-06-08 2022-08-19 哈尔滨理工大学 Mechanical arm grabbing pose detection method under complex background
CN115170911A (en) * 2022-09-06 2022-10-11 浙江大学湖州研究院 Human body key part positioning system and method based on image recognition
CN115249333A (en) * 2021-06-29 2022-10-28 达闼科技(北京)有限公司 Grab network training method and system, electronic equipment and storage medium
CN115272791A (en) * 2022-07-22 2022-11-01 仲恺农业工程学院 Multi-target detection positioning method for tea based on YoloV5
CN115578608A (en) * 2022-12-12 2023-01-06 南京慧尔视智能科技有限公司 Anti-interference classification method and device based on millimeter wave radar point cloud
CN115922738A (en) * 2023-03-09 2023-04-07 季华实验室 Electronic component grabbing method, device, equipment and medium in stacking scene
CN116596996A (en) * 2023-05-26 2023-08-15 河北农业大学 Method and system for acquiring spatial pose information of apple fruits
WO2023165161A1 (en) * 2022-05-09 2023-09-07 青岛理工大学 Multi-task convolution-based object grasping and positioning identification algorithm and system, and robot

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
CN109102547A (en) * 2018-07-20 2018-12-28 上海节卡机器人科技有限公司 Robot based on object identification deep learning model grabs position and orientation estimation method
CN109635697A (en) * 2018-12-04 2019-04-16 国网浙江省电力有限公司电力科学研究院 Electric operating personnel safety dressing detection method based on YOLOv3 target detection
CN110363158A (en) * 2019-07-17 2019-10-22 浙江大学 A kind of millimetre-wave radar neural network based cooperates with object detection and recognition method with vision

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
CN109102547A (en) * 2018-07-20 2018-12-28 上海节卡机器人科技有限公司 Robot based on object identification deep learning model grabs position and orientation estimation method
CN109635697A (en) * 2018-12-04 2019-04-16 国网浙江省电力有限公司电力科学研究院 Electric operating personnel safety dressing detection method based on YOLOv3 target detection
CN110363158A (en) * 2019-07-17 2019-10-22 浙江大学 A kind of millimetre-wave radar neural network based cooperates with object detection and recognition method with vision

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘万萌;童创明;王童;彭鹏;: "基于电磁散射模型的宽带雷达海杂波特性分析" *
焦天驰等: ""结合反残差块和 YOLOv3 的目标检测法"" *

Cited By (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553949A (en) * 2020-04-30 2020-08-18 张辉 Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning
CN111645080A (en) * 2020-05-08 2020-09-11 覃立万 Intelligent service robot hand-eye cooperation system and operation method
CN111598172A (en) * 2020-05-18 2020-08-28 东北大学 Dynamic target grabbing posture rapid detection method based on heterogeneous deep network fusion
CN111598172B (en) * 2020-05-18 2023-08-29 东北大学 Dynamic target grabbing gesture rapid detection method based on heterogeneous depth network fusion
CN111783537A (en) * 2020-05-29 2020-10-16 哈尔滨莫迪科技有限责任公司 Two-stage rapid grabbing detection method based on target detection characteristics
CN111975783B (en) * 2020-08-31 2021-09-03 广东工业大学 Robot grabbing detection method and system
CN111975783A (en) * 2020-08-31 2020-11-24 广东工业大学 Robot grabbing detection method and system
CN112070736A (en) * 2020-09-01 2020-12-11 上海电机学院 Object volume vision measurement method combining target detection and depth calculation
CN112070736B (en) * 2020-09-01 2023-02-24 上海电机学院 Object volume vision measurement method combining target detection and depth calculation
CN112183485A (en) * 2020-11-02 2021-01-05 北京信息科技大学 Deep learning-based traffic cone detection positioning method and system and storage medium
CN112183485B (en) * 2020-11-02 2024-03-05 北京信息科技大学 Deep learning-based traffic cone detection positioning method, system and storage medium
CN112396655A (en) * 2020-11-18 2021-02-23 哈尔滨工程大学 Point cloud data-based ship target 6D pose estimation method
CN112614182A (en) * 2020-12-21 2021-04-06 广州熙锐自动化设备有限公司 Method for identifying machining position based on deep learning, storage device and mobile terminal
CN112614182B (en) * 2020-12-21 2023-04-28 广州熙锐自动化设备有限公司 Deep learning-based method for identifying machining position, storage device and mobile terminal
CN112720477A (en) * 2020-12-22 2021-04-30 泉州装备制造研究所 Object optimal grabbing and identifying method based on local point cloud model
CN112720477B (en) * 2020-12-22 2024-01-30 泉州装备制造研究所 Object optimal grabbing and identifying method based on local point cloud model
CN112529948A (en) * 2020-12-25 2021-03-19 南京林业大学 Mature pomegranate positioning method based on Mask R-CNN and 3-dimensional sphere fitting
CN112733640A (en) * 2020-12-29 2021-04-30 武汉中海庭数据技术有限公司 Traffic indicator lamp positioning and extracting method and system based on point cloud high-precision map
CN112750163A (en) * 2021-01-19 2021-05-04 武汉理工大学 Port ship shore power connection method and system and computer readable storage medium
CN112936275A (en) * 2021-02-05 2021-06-11 华南理工大学 Mechanical arm grabbing system based on depth camera and control method
CN112927297A (en) * 2021-02-20 2021-06-08 华南理工大学 Target detection and visual positioning method based on YOLO series
CN113111712A (en) * 2021-03-11 2021-07-13 稳健医疗用品股份有限公司 AI identification positioning method, system and device for bagged product
CN112884825A (en) * 2021-03-19 2021-06-01 清华大学 Deep learning model-based grabbing method and device
CN112884825B (en) * 2021-03-19 2022-11-04 清华大学 Deep learning model-based grabbing method and device
CN113129449A (en) * 2021-04-16 2021-07-16 浙江孔辉汽车科技有限公司 Vehicle pavement feature recognition and three-dimensional reconstruction method based on binocular vision
CN113524194A (en) * 2021-04-28 2021-10-22 重庆理工大学 Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning
CN113284129A (en) * 2021-06-11 2021-08-20 梅卡曼德(北京)机器人科技有限公司 Box pressing detection method and device based on 3D bounding box
CN113246140B (en) * 2021-06-22 2021-10-15 沈阳风驰软件股份有限公司 Multi-model workpiece disordered grabbing method and device based on camera measurement
CN113246140A (en) * 2021-06-22 2021-08-13 沈阳风驰软件股份有限公司 Multi-model workpiece disordered grabbing method and device based on camera measurement
CN115249333A (en) * 2021-06-29 2022-10-28 达闼科技(北京)有限公司 Grab network training method and system, electronic equipment and storage medium
CN113627478A (en) * 2021-07-08 2021-11-09 深圳市优必选科技股份有限公司 Target detection method, target detection device and robot
CN113537096A (en) * 2021-07-21 2021-10-22 常熟理工学院 ROS-based AGV forklift storage tray identification and auxiliary positioning method and system
CN113537096B (en) * 2021-07-21 2023-08-15 常熟理工学院 AGV forklift warehouse position tray identification and auxiliary positioning method and system based on ROS
CN113723217A (en) * 2021-08-09 2021-11-30 南京邮电大学 Object intelligent detection method and system based on yolo improvement
CN113723389A (en) * 2021-08-30 2021-11-30 广东电网有限责任公司 Method and device for positioning strut insulator
CN113808205A (en) * 2021-08-31 2021-12-17 华南理工大学 Rapid dynamic target grabbing method based on detection constraint
CN113808205B (en) * 2021-08-31 2023-07-18 华南理工大学 Rapid dynamic target grabbing method based on detection constraint
CN113984037A (en) * 2021-09-30 2022-01-28 电子科技大学长三角研究院(湖州) Semantic map construction method based on target candidate box in any direction
CN113984037B (en) * 2021-09-30 2023-09-12 电子科技大学长三角研究院(湖州) Semantic map construction method based on target candidate frame in any direction
CN114170521A (en) * 2022-02-11 2022-03-11 杭州蓝芯科技有限公司 Forklift pallet butt joint identification positioning method
CN114683251A (en) * 2022-03-31 2022-07-01 上海节卡机器人科技有限公司 Robot grabbing method and device, electronic equipment and readable storage medium
CN114723827A (en) * 2022-04-28 2022-07-08 哈尔滨理工大学 Grabbing robot target positioning system based on deep learning
CN114897999A (en) * 2022-04-29 2022-08-12 美的集团(上海)有限公司 Object pose recognition method, electronic device, storage medium, and program product
CN114897999B (en) * 2022-04-29 2023-12-08 美的集团(上海)有限公司 Object pose recognition method, electronic device, storage medium, and program product
WO2023165161A1 (en) * 2022-05-09 2023-09-07 青岛理工大学 Multi-task convolution-based object grasping and positioning identification algorithm and system, and robot
CN114926527A (en) * 2022-06-08 2022-08-19 哈尔滨理工大学 Mechanical arm grabbing pose detection method under complex background
CN115272791A (en) * 2022-07-22 2022-11-01 仲恺农业工程学院 Multi-target detection positioning method for tea based on YoloV5
CN115170911A (en) * 2022-09-06 2022-10-11 浙江大学湖州研究院 Human body key part positioning system and method based on image recognition
CN115578608B (en) * 2022-12-12 2023-02-28 南京慧尔视智能科技有限公司 Anti-interference classification method and device based on millimeter wave radar point cloud
CN115578608A (en) * 2022-12-12 2023-01-06 南京慧尔视智能科技有限公司 Anti-interference classification method and device based on millimeter wave radar point cloud
CN115922738A (en) * 2023-03-09 2023-04-07 季华实验室 Electronic component grabbing method, device, equipment and medium in stacking scene
CN116596996A (en) * 2023-05-26 2023-08-15 河北农业大学 Method and system for acquiring spatial pose information of apple fruits
CN116596996B (en) * 2023-05-26 2024-01-30 河北农业大学 Method and system for acquiring spatial pose information of apple fruits

Similar Documents

Publication Publication Date Title
CN111080693A (en) Robot autonomous classification grabbing method based on YOLOv3
CN111340797B (en) Laser radar and binocular camera data fusion detection method and system
CN109344701B (en) Kinect-based dynamic gesture recognition method
CN109583483B (en) Target detection method and system based on convolutional neural network
CN114202672A (en) Small target detection method based on attention mechanism
CN111462120B (en) Defect detection method, device, medium and equipment based on semantic segmentation model
CN107545263B (en) Object detection method and device
CN109447979B (en) Target detection method based on deep learning and image processing algorithm
CN112836734A (en) Heterogeneous data fusion method and device and storage medium
CN108711172B (en) Unmanned aerial vehicle identification and positioning method based on fine-grained classification
CN111738344A (en) Rapid target detection method based on multi-scale fusion
CN111553949A (en) Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning
CN115439458A (en) Industrial image defect target detection algorithm based on depth map attention
CN111598172B (en) Dynamic target grabbing gesture rapid detection method based on heterogeneous depth network fusion
CN115330734A (en) Automatic robot repair welding system based on three-dimensional target detection and point cloud defect completion
CN111008576A (en) Pedestrian detection and model training and updating method, device and readable storage medium thereof
CN110909656B (en) Pedestrian detection method and system integrating radar and camera
CN116071315A (en) Product visual defect detection method and system based on machine vision
CN111626241A (en) Face detection method and device
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN111507249A (en) Transformer substation nest identification method based on target detection
CN113076889B (en) Container lead seal identification method, device, electronic equipment and storage medium
CN114331961A (en) Method for defect detection of an object
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN111709269B (en) Human hand segmentation method and device based on two-dimensional joint information in depth image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200428