CN212724028U - Vision robot grasping system - Google Patents

Vision robot grasping system Download PDF

Info

Publication number
CN212724028U
CN212724028U CN202021517844.0U CN202021517844U CN212724028U CN 212724028 U CN212724028 U CN 212724028U CN 202021517844 U CN202021517844 U CN 202021517844U CN 212724028 U CN212724028 U CN 212724028U
Authority
CN
China
Prior art keywords
feature extraction
model
module
grabbing
robot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202021517844.0U
Other languages
Chinese (zh)
Inventor
高振清
秦志民
文博宇
杜艳平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Graphic Communication
Original Assignee
Beijing Institute of Graphic Communication
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Graphic Communication filed Critical Beijing Institute of Graphic Communication
Priority to CN202021517844.0U priority Critical patent/CN212724028U/en
Application granted granted Critical
Publication of CN212724028U publication Critical patent/CN212724028U/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The utility model provides a vision robot grasping system, include: the robot comprises a workbench, an image acquisition device, a computing device and a robot, wherein the image acquisition device is positioned above the workbench and used for acquiring an initial image of a target to be grabbed; the computing device is in communication connection with the image acquisition device and is used for receiving the initial image, inputting the initial image into a pre-trained multi-scale feature extraction model, and computing to obtain a processed image so as to acquire the grabbing attitude information of the target to be grabbed; the robot is in communication connection with the computing device and used for receiving control signals sent by the computing device based on the grabbing attitude information and adjusting to corresponding grabbing positions to grab the target to be grabbed based on the control signals. The vision robot grabbing system can realize rapid identification, positioning and grabbing prediction of a target object, completes identification and grabbing tasks of the vision robot by combining the motion control of robotics, and improves the real-time performance and stability of the system.

Description

Vision robot grasping system
Technical Field
The utility model relates to the technical field of robot, especially, relate to a vision robot grasping system.
Background
The traditional vision robot uses feature description to process the target detection problem, and along with the increase of target object types, feature extraction becomes more and more troublesome, the calculation amount is exponentially increased, and the real-time performance of robot operation is influenced.
Traditional vision robot adopts "teaching" mode in the task such as the location of carrying out the target object and snatching, and this kind of mode does not possess the generalization ability, and when waiting to snatch the object position and shape and change, the robot can not carry out automatic adjustment and then lead to the operation failure. Therefore, poor flexibility and insufficient stability are problems to be solved by the traditional vision robot.
SUMMERY OF THE UTILITY MODEL
The to-be-solved technical problem of the utility model is how to improve the flexibility and the reliability that the robot snatched article, the utility model provides a vision robot grasping system.
According to the utility model discloses vision robot grasping system, include:
the workbench is used for placing an object to be grabbed;
the image acquisition device is positioned above the workbench and used for acquiring an initial image of the target to be grabbed;
the computing device is in communication connection with the image acquisition device and is used for receiving the initial image, inputting the initial image into a pre-trained multi-scale feature extraction model, and computing to obtain a processed image so as to acquire the grabbing posture information of the target to be grabbed;
and the robot is in communication connection with the computing device and is used for receiving a control signal sent by the computing device based on the grabbing attitude information and adjusting to a corresponding grabbing position to grab the target to be grabbed based on the control signal.
According to the utility model discloses vision robot grasping system, after the initial image input that obtains waiting to snatch the target with image acquisition device trains the multi-scale feature extraction model in advance, can handle the gesture information that snatchs that obtains the robot to control robot treats to snatch the target and carry out automation, high efficiency and snatch. The grabbing system can realize rapid identification, positioning and grabbing prediction of the target object, completes the identification and grabbing tasks of the visual robot by combining the kinematics control of the robot, and effectively improves the real-time performance and stability of the system.
According to some embodiments of the invention, the robot comprises:
a six-axis mechanical arm;
and the mechanical arm driver is in communication connection with the computing device and the six-axis mechanical arm, receives a control instruction sent by the computing device, and controls the six-axis mechanical arm to move to the grabbing position to grab the target to be grabbed based on the control instruction.
In some embodiments of the present invention, the six-axis mechanical arm is switched between the grabbing position and the initial position, and the six-axis mechanical arm automatically resets to the initial position when grabbing is completed once.
According to some embodiments of the invention, the computing device is a computer, the image acquisition device is a depth camera.
In some embodiments of the present invention, the workbench has a placement area for placing the object to be grabbed, and the image acquisition device is located directly above the placement area.
According to some embodiments of the invention, the grasping system further comprises: the adjusting device is arranged on the image acquisition device, and the height and the angle of the image acquisition device are adjusted by the adjusting device.
In some embodiments of the present invention, the target to be grabbed includes: annular cable, express delivery box, receiver, scissors, screwdriver, toothbrush and screw.
According to some embodiments of the invention, the computing device comprises:
the convolutional neural network feature extraction module is used for building a convolutional neural network feature extraction model based on a Darknet-53 skeleton;
the standard convolution separation module is in communication connection with the convolution neural network feature extraction module and is used for calling a convolution layer creation function to separate separable standard convolutions in the convolution neural network feature extraction module into unit convolutions to form a base model; the multi-scale feature extraction module is in communication connection with the standard convolution separation module and is used for connecting different convolution layers in the base model in a jump connection mode to construct a multi-scale feature extraction model;
and the model training module is in communication connection with the multi-scale feature extraction module and is used for training the multi-scale feature extraction model to obtain the pre-trained multi-scale feature extraction model.
According to some embodiments of the invention, the model training module comprises:
the data set creating module is used for acquiring a multi-scale data set;
and the training module is in communication connection with the data set creating module and is used for training the multi-scale feature extraction model by adopting at least one of transfer learning, parallel operation and GPU acceleration methods based on the multi-scale data set.
According to some embodiments of the present invention, the model training module further comprises:
and the model evaluation optimization module is in communication connection with the training module and is used for evaluating and optimizing the trained multi-scale feature extraction model.
Drawings
Fig. 1 is a schematic view of a vision robot gripping system according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for grabbing by a visual robot according to an embodiment of the present invention.
Reference numerals:
the gripping system (100) is provided with,
a workbench 10, an image acquisition device 20, a computing device 30 and a robot 40.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the present invention will be described in detail below with reference to the accompanying drawings and preferred embodiments.
As shown in fig. 1, the vision robot grasping system 100 according to the embodiment of the present invention includes: a table 10, an image acquisition device 20, a computing device 30, and a robot 40.
The workbench 10 is used for placing an object to be grabbed, and the image acquisition device 20 is located above the workbench 10 and used for acquiring an initial image of the object to be grabbed.
The calculating device 30 is in communication connection with the image obtaining device 20, and is configured to receive the initial image, input the initial image into a pre-trained multi-scale feature extraction model, and calculate to obtain a processed image, so as to obtain the grabbing posture information of the object to be grabbed.
The robot 40 is in communication connection with the computing device 30, and is configured to receive a control signal sent by the computing device 30 based on the grasping posture information, and adjust to a corresponding grasping position based on the control signal to grasp the object to be grasped.
The "acquiring the grabbing attitude information of the object to be grabbed based on the processed image" may be understood as acquiring coordinate information of a frame to be grabbed and a frame to be grabbed of the processed image, and converting and calculating the coordinate information through a coordinate system to acquire the grabbing attitude information of the robot.
For example, based on processing the image, four vertices of the frame to be grabbed are obtainedThe coordinate information of (2): (x)1,y1),(x2,y2), (x3,y3),(x4,y4);
Calculating the grabbing attitude information (X) of the robot according to the following formula0,Y0,H0,W00):
Figure DEST_PATH_GDA0002920802000000041
Figure DEST_PATH_GDA0002920802000000042
Figure DEST_PATH_GDA0002920802000000043
Figure DEST_PATH_GDA0002920802000000044
Figure DEST_PATH_GDA0002920802000000045
Wherein (X)0,Y0) Coordinates corresponding to the center of the frame to be grasped, H0Is the maximum height, W, of two parallel fingers of the robot0Finger width, θ, of a two finger parallel grip0Is the angle of the frame to be grabbed relative to the horizontal plane.
Thereby, the grasping posture information (X) of the robot can be calculated based on the obtained information0,Y0,H0,W00) And adjusting the robot to the corresponding grabbing position to realize grabbing of the object to be grabbed.
According to the utility model discloses vision robot grasping system 100 based on degree of depth learning, after the initial image input that obtains waiting to snatch the target that image acquisition device 20 acquireed trained many yardstick feature extraction model in advance, can handle the gesture information that snatchs that obtains robot 40 to control robot 40 treats to snatch the target and carry out automation, high-efficient snatching. The grasping system 100 can realize rapid identification, positioning and grasping prediction of the target object, completes the identification and grasping tasks of the vision robot 40 by combining the motion control of the robot 40, and effectively improves the real-time performance and stability of the system.
According to some embodiments of the present invention, as shown in fig. 1, the robot 40 includes: six-axis robotic arms and robotic arm drivers.
The mechanical arm driver is in communication connection with the computing device 30 and the six-axis mechanical arm, receives a control instruction sent by the computing device 30, and controls the six-axis mechanical arm to move to a grabbing position to grab an object to be grabbed based on the control instruction.
The utility model discloses an in some embodiments, six arms switch between snatching position and initial position, once accomplish and snatch, six arms automatic re-setting to initial position.
It should be noted that when the robot is controlled to perform a grabbing task, the information of the grabbing frame needs to be obtained through the obtained original image of the target to be grabbed, and then the information of the grabbing frame is calculated and converted into grabbing posture information of the robot, and the robot is reset to the original set position, which can facilitate the calculation and conversion of the coordinate information.
According to some embodiments of the present invention, the computing device is a computer and the image acquisition device is a depth camera.
In some embodiments of the present invention, the working table 10 has a placement area for placing the object to be grasped, and the image acquiring device 20 is located directly above the placement area. Thereby, image acquisition of the object to be captured by the image acquisition device 20 is facilitated.
According to some embodiments of the present invention, the grasping system 100 further includes: the adjusting device, the image acquisition device 20 is arranged on the adjusting device, and the adjusting device adjusts the height and the angle of the image acquisition device 20. For example, the adjusting device may be a stand capable of rotating up and down, the image capturing device 20 is disposed at an end of the stand, and the height and angle of the image capturing device 20 can be conveniently adjusted by the stand.
In some embodiments of the present invention, the object to be grabbed includes: annular cable, express delivery box, receiver, scissors, screwdriver, toothbrush and screw. That is, the grasping system 100 may be used to grasp looped cables, express cassettes, storage boxes, scissors, screwdrivers, toothbrushes, screws, and the like. It is to be understood that the above-mentioned objects to be grabbed are only for illustration and should not be construed as limiting the present invention.
According to some embodiments of the present invention, computing device 30, comprises:
the convolutional neural network feature extraction module is used for building a convolutional neural network feature extraction model based on a Darknet-53 skeleton;
the standard convolution separation module is in communication connection with the convolution neural network feature extraction module and is used for calling a convolution layer creation function to separate separable standard convolutions in the convolution neural network feature extraction module into unit convolutions to form a base model; the multi-scale feature extraction module is in communication connection with the standard convolution separation module and is used for connecting different convolution layers in the base model in a jump connection mode to construct a multi-scale feature extraction model;
and the model training module is in communication connection with the multi-scale feature extraction module and is used for training the multi-scale feature extraction model so as to obtain a pre-trained multi-scale feature extraction model.
According to some embodiments of the utility model, the model training module includes:
the data set creating module is used for acquiring a multi-scale data set;
and the training module is in communication connection with the data set creating module and is used for training the multi-scale feature extraction model by adopting at least one of transfer learning, parallel operation and GPU acceleration methods based on the multi-scale data set.
According to some embodiments of the utility model, the model training module still includes:
and the model evaluation optimization module is in communication connection with the training module and is used for evaluating and optimizing the trained multi-scale feature extraction model.
Adopt the utility model discloses vision robot grasping system treats the process of snatching the target and snatching, include:
the method comprises the steps that an image acquisition device acquires an initial image of a target to be grabbed, wherein the target to be grabbed is placed on a workbench;
the image acquisition device sends the initial image to the computing device, so that the computing device inputs the initial image into a pre-trained multi-scale feature extraction model, a processed image is obtained through computing, and the grabbing attitude information of the target to be grabbed is acquired based on the processed image;
and the robot grabs the target to be grabbed based on the grabbing posture information.
It should be noted that the pre-trained multi-scale feature extraction model may be constructed by a system in the computing device 30, and specifically, the computing device 30 includes:
the convolutional neural network feature extraction module can build a convolutional neural network feature extraction model through a TensorFlow platform based on a Darknet-53 framework;
the standard convolution separation module can separate separable standard convolution in the convolution neural network feature extraction model into unit convolution by calling a convolution layer creating function to form a base model;
the multi-scale feature extraction module can adopt a jump connection mode to connect different convolution layers in the base model to construct a multi-scale feature extraction model;
the data set creating module can acquire a multi-scale data set; for example, a picture to be trained is acquired by using a depth camera or by using an existing data set in the cloud; that is to say, the picture to be trained can be obtained by shooting through the depth camera, and the picture to be trained can also be obtained through the existing data of the cloud. And then, carrying out real grabbing frame marking on the picture to be trained, and carrying out clipping and/or rotation processing on the picture to be trained to amplify the picture to be trained so as to obtain a multi-scale data set.
The training module can train the built multi-scale feature extraction model through a data set, and at least one of transfer learning, parallel operation and GPU acceleration methods is used in the training process to improve the model training speed; it should be noted that after the picture to be trained is obtained, the picture to be trained needs to be labeled with a real capture frame. In order to amplify the pictures to be trained, the pictures to be trained can be cut to obtain pictures to be trained with different sizes; or rotating the picture to be trained to obtain pictures to be trained with different rotation angles; of course, the image to be trained can be cut and rotated to increase the number of the images to be trained.
And the model evaluation optimization module can evaluate and optimize the trained multi-scale feature extraction model, and when the processed image obtained by calculation of the multi-scale feature extraction model meets the preset requirement, the training of the multi-scale feature extraction model is completed.
The built multi-scale feature extraction model can be subjected to transfer learning by adopting a common public data set Pascal VOC data set, and the Pascal VOC data set comprises 20 object types.
The model building process specifically comprises the following steps: building a convolutional neural network feature extraction model through a TensorFlow 1.15 deep learning platform, calling an input common _ conv2d _ fixed _ paging (input, filters 1,1) convolutional layer creation function to separate the standard convolution into unit convolutions, adding a convolution operation of 1 x 1 to a convolution operation of 3 x 3, i.e. common _ conv2d _ fixed _ paging (input, filters 2,3), and jointly building basic modules required by the network model, wherein the basic modules jointly form a base model.
And defining shortcuts as inputs by jumping and connecting on the basis of the base model, and defining the inputs as inputs of the next layer to realize the connection between different convolution layers. A base model generates a meta-feature as the input of a secondary model, model stacking is completed by setting the cycle number, input pixels, convolution kernel number and step length of a base module, 53 layers of convolution layers with different scales are accumulated and superposed to construct a required model frame, and the model frame is used for extracting image features and obtaining learning weights.
In the weight optimization process, cross entropy is selected as a loss function. The cross entropy loss function is defined as follows:
Figure DEST_PATH_GDA0002920802000000081
where C denotes loss, y denotes actual value, a denotes output value, n denotes total number of samples, and x denotes sample.
And (3) using an adaptive gradient descent optimizer as a main method for weight updating, and finally using a normalized exponential function softmax to output a maximum probability value as a final predicted value. Weights in the training process are placed in the chackpiont folder, and other folders include a dataset folder, a network model folder, and some configuration description class folders. During the use process, the required files and parameters can be modified according to the specific application scene, and the optimal configuration of the method can be carried out. Therefore, the construction of a multi-scale feature extraction model is realized by adding separation convolution and jump connection on the basis of the Darknet-53 network structure.
The built training process specifically comprises the following steps:
firstly, data sets of Pascal VOC 2007 and Pascal VOC 2012 are downloaded, secondly, a script provided by Darknet for processing the tags of the VOC data sets is used for generating tags of the VOC data sets, the data sets and the categories of the tags are replaced by modifying script files, and the data configuration files, the model configuration files, the tag configuration files and the data formats are correspondingly modified. And then downloading the pre-trained network model parameters and starting the migration training.
The vision robot gripping system 100 according to the present invention is described in detail in one specific embodiment with reference to the accompanying drawings. It is to be understood that the following description is only exemplary, and not restrictive of the invention.
The utility model provides a vision robot grasping system 100 mainly solves traditional vision detection method and is detecting and categorised in-process real-time insufficient at the target to and traditional robot is snatching the poor stability's in the prediction process problem.
As shown in fig. 1, the vision robot gripping system 100 includes: a table 10, an image acquisition device 20, a computing device 30, and a robot 40.
The workbench 10 is used for placing an object to be grabbed, and the image acquisition device 20 is located above the workbench 10 and used for acquiring an initial image of the object to be grabbed.
The calculating device 30 is in communication connection with the image obtaining device 20, and is configured to receive the initial image, input the initial image into a pre-trained multi-scale feature extraction model, and calculate to obtain a processed image, so as to obtain the grabbing posture information of the object to be grabbed.
The robot 40 is in communication connection with the computing device 30, and is configured to receive a control signal sent by the computing device 30 based on the grasping posture information, and adjust to a corresponding grasping position based on the control signal to grasp the object to be grasped.
The "acquiring the grabbing attitude information of the object to be grabbed based on the processed image" may be understood as acquiring coordinate information of a frame to be grabbed and a frame to be grabbed of the processed image, and converting and calculating the coordinate information through a coordinate system to acquire the grabbing attitude information of the robot.
For example, based on the processed image, coordinate information of four vertices of the frame to be grasped is acquired: (x)1,y1),(x2,y2), (x3,y3),(x4,y4);
Calculating the grabbing attitude information (X) of the robot according to the following formula0,Y0,H0,W00):
Figure DEST_PATH_GDA0002920802000000101
Figure DEST_PATH_GDA0002920802000000102
Figure DEST_PATH_GDA0002920802000000103
Figure DEST_PATH_GDA0002920802000000104
Figure DEST_PATH_GDA0002920802000000105
Wherein (X)0,Y0) Coordinates corresponding to the center of the frame to be grasped, H0Is the maximum height, W, of two parallel fingers of the robot0Finger width, θ, of a two finger parallel grip0Is the angle of the frame to be grabbed relative to the horizontal plane.
Thereby, the grasping posture information (X) of the robot can be calculated based on the obtained information0,Y0,H0,W00) And adjusting the robot to the corresponding grabbing position to realize grabbing of the object to be grabbed.
As shown in fig. 1, the robot 40 includes: six-axis robotic arms and robotic arm drivers.
The mechanical arm driver is in communication connection with the computing device 30 and the six-axis mechanical arm, receives a control instruction sent by the computing device 30, and controls the six-axis mechanical arm to move to a grabbing position to grab an object to be grabbed based on the control instruction.
The six-axis mechanical arm is switched between a grabbing position and an initial position, and the six-axis mechanical arm automatically resets to the initial position when grabbing is completed once.
It should be noted that when the robot is controlled to perform a grabbing task, the information of the grabbing frame needs to be obtained through the obtained original image of the target to be grabbed, and then the information of the grabbing frame is calculated and converted into grabbing posture information of the robot, and the robot is reset to the original set position, which can facilitate the calculation and conversion of the coordinate information.
The computing device is a computer, and the image acquisition device is a depth camera.
The table 10 has a placement area where an object to be grasped is placed, and the image pickup device 20 is located directly above the placement area. Thereby, image acquisition of the object to be captured by the image acquisition device 20 is facilitated.
The grasping system 100 further includes: the adjusting device, the image acquisition device 20 is arranged on the adjusting device, and the adjusting device adjusts the height and the angle of the image acquisition device 20. For example, the adjusting device may be a stand capable of rotating up and down, the image capturing device 20 is disposed at an end of the stand, and the height and angle of the image capturing device 20 can be conveniently adjusted by the stand.
The object to be grabbed comprises: annular cable, express delivery box, receiver, scissors, screwdriver, toothbrush and screw. That is, the grasping system 100 may be used to grasp looped cables, express cassettes, storage boxes, scissors, screwdrivers, toothbrushes, screws, and the like. It is to be understood that the above-mentioned objects to be grabbed are only for illustration and should not be construed as limiting the present invention.
A computing device 30, comprising:
the convolutional neural network feature extraction module is used for building a convolutional neural network feature extraction model based on a Darknet-53 skeleton;
the standard convolution separation module is in communication connection with the convolution neural network feature extraction module and is used for calling a convolution layer creation function to separate separable standard convolutions in the convolution neural network feature extraction module into unit convolutions to form a base model; the multi-scale feature extraction module is in communication connection with the standard convolution separation module and is used for connecting different convolution layers in the base model in a jump connection mode to construct a multi-scale feature extraction model;
and the model training module is in communication connection with the multi-scale feature extraction module and is used for training the multi-scale feature extraction model so as to obtain a pre-trained multi-scale feature extraction model.
The model training module comprises:
the data set creating module is used for acquiring a multi-scale data set;
and the training module is in communication connection with the data set creating module and is used for training the multi-scale feature extraction model by adopting at least one of transfer learning, parallel operation and GPU acceleration methods based on the multi-scale data set.
The model training module further comprises:
and the model evaluation optimization module is in communication connection with the training module and is used for evaluating and optimizing the trained multi-scale feature extraction model.
Before a target object is captured, a training multi-scale feature extraction model is set up in advance. The method comprises the steps that the acquisition of a data set to be trained is completed by using an existing data set at the cloud end and automatically acquiring the data set through a depth camera, the preprocessing of the data set is marked by adopting LabelImg software, the data set is amplified through cutting and rotating, marked real value frames are clustered, and main types of capture frames are screened out to serve as candidate frames.
And constructing a multi-scale feature extraction model to extract features of the created data set, constructing a single-stage detection model framework by matching ideas such as separation convolution, rotary convolution, jump connection and the like on the basis of a Darknet-53 framework in the model construction process, setting an IOU value as a matching standard, and using a cross entropy function as a loss function.
In the training process, methods such as transfer learning, parallel operation, GPU acceleration and the like are used for accelerating the training speed.
Based on Cornell Dataset, five-fold crossing and ablation experiments are carried out to evaluate the recognition speed, the generalization and the robustness.
In order to realize object grabbing, the pose of the target object in a mechanical arm coordinate system needs to be determined, then the pose of the object is converted into a redefined grabbing frame through coordinate conversion, and the redefined grabbing frame only needs to regress a correction value of a frame.
Specifically, with reference to fig. 2, the step of inputting an initial image into a pre-trained multi-scale feature extraction model by a computer to obtain the grasping posture information of the target to be grasped is as follows:
the method comprises the following steps: multi-scale dataset creation. Firstly, the image information of a target to be captured is obtained by a depth camera, the images are constructed into a target detection data set, and the data set is amplified through cutting and rotation. And preprocessing the amplified data set, and manually labeling the name of the target object and the bounding box in the image by using labelImg software. The marked part in the image is defined as a positive sample, and the unmarked part is defined as a negative sample.
Step two: and (5) a multi-scale feature extraction model. And constructing a multi-scale feature extraction model to extract the features of the created data set.
Step three: and (5) training a model. And training the built multi-scale feature extraction model, and accelerating the training speed by using methods such as transfer learning, parallel operation, GPU acceleration and the like in the training process.
The method comprises the following steps that migration learning is conducted, wherein the format of an ImageNet data set is converted into a format required by training of the darknet by utilizing darknet/script/ImageNet _ laber, so that label of each picture is generated and stored in a labels folder; creating a file data/ImageNet/ImageNet.name, and writing 1000 classes of ImageNet into the file; creating a file data/ImageNet/ImageNet.data, specifying the number of categories, the positions of a training set and a test set, and the like; modifying the network structure, and newly building yolov 3-ImageNet.cfg; modify utils/dataset.py; and downloading the weight weights weight/yolov3. pt of the pre-training, and starting the migration training.
Step four: and evaluating optimization. After the model training is finished, taking a Cornell Dataset (Cornell Dataset) as a reference, performing a five-fold intersection and ablation experiment, and evaluating the recognition speed, the generalization and the robustness of the optimized model.
The five-fold cross validation algorithm comprises the following steps:
a. dividing a data set Cornell Dataset into 5 packets randomly;
b. one of the packets is used as a test set each time, and the remaining 4 packets are used as a training set for training;
c. and finally, calculating the average value of the classification rates obtained for 5 times as the real resolution of the model.
Step five: and (5) carrying out feasibility analysis. And if the result is valid, the next step is carried out, otherwise, the second step is returned.
Step six: and (5) pose conversion. In order to realize object grabbing, the pose of the target object in a mechanical arm coordinate system needs to be determined, and then the pose of the object is converted into a redefined grabbing frame through coordinate conversion.
The coordinate transformation and capture box redefinition calculation is as follows:
(1) and (5) coordinate conversion. And establishing a transformation relation from a pixel coordinate system to a camera coordinate system through camera calibration, and establishing a transformation relation from the camera coordinate system to a mechanical arm coordinate system through hand-eye calibration. The concrete implementation steps are as follows:
a. acquiring an internal reference matrix and distortion parameters of the camera by using a Zhangyingyou calibration method;
b. calibrating an external parameter matrix for converting the image coordinate and the world coordinate;
c. setting N characteristic points (N >3), calculating world coordinates of the characteristic points, moving the working tail end of the mechanical arm to the characteristic points, and recording tail end coordinates to obtain N groups of data;
d. calculating a rotation matrix and a translation matrix of the two groups of data, wherein the world coordinates of the characteristic points are A group of data, and the coordinates of the tail end are B group of data;
(2) and the grabbing frame is defined again.
In order to obtain the grabbing pose of the target object, the coordinates of four vertexes of a regular grabbing frame given in the training data set are converted into five parameters (X) by adopting the method0,Y0,H0,W00) The representation, the five parameter representation, gives the position and orientation of the two parallel fingers when performing a grip on an object.
Step seven: multi-scale real-time object detection.
To sum up, utilize the utility model provides a grasping system 100 can realize the quick discernment location and the prediction of snatching of target object, combines correlation technique such as the motion control of robotics to accomplish the discernment of vision robot and snatch the task to the real-time and the stability of system have effectively been improved.
The technical means and functions of the present invention to achieve the intended purpose will be understood more deeply and concretely through the description of the embodiments, however, the attached drawings are only for reference and illustration, and are not intended to limit the present invention.

Claims (10)

1. A visual robotic grasping system, comprising:
the workbench is used for placing an object to be grabbed;
the image acquisition device is positioned above the workbench and used for acquiring an initial image of the target to be grabbed;
the computing device is in communication connection with the image acquisition device and is used for receiving the initial image, inputting the initial image into a pre-trained multi-scale feature extraction model, and computing to obtain a processed image so as to acquire the grabbing posture information of the target to be grabbed;
and the robot is in communication connection with the computing device and is used for receiving a control signal sent by the computing device based on the grabbing attitude information and adjusting to a corresponding grabbing position to grab the target to be grabbed based on the control signal.
2. The visual robotic grasping system according to claim 1, wherein the robot includes:
a six-axis mechanical arm;
and the mechanical arm driver is in communication connection with the computing device and the six-axis mechanical arm, receives a control instruction sent by the computing device, and controls the six-axis mechanical arm to move to the grabbing position to grab the target to be grabbed based on the control instruction.
3. The visual robotic gripper system of claim 2, wherein the six-axis robotic arm switches between a gripping position and an initial position to which the six-axis robotic arm automatically resets each time a grip is completed.
4. The visual robotic grasping system according to claim 1, wherein the computing device is a computer and the image acquisition device is a depth camera.
5. The vision robot gripping system according to claim 1, wherein the table has a placement area where the object to be gripped is placed, and the image acquisition device is located directly above the placement area.
6. The visual robotic grasping system according to claim 1, characterized in that the grasping system further includes: the adjusting device is arranged on the image acquisition device, and the height and the angle of the image acquisition device are adjusted by the adjusting device.
7. The visual robotic grasping system according to claim 1, wherein the object to be grasped includes: annular cable, express delivery box, receiver, scissors, screwdriver, toothbrush and screw.
8. The visual robotic gripper system according to any one of claims 1-7, wherein the computing device comprises:
the convolutional neural network feature extraction module is used for building a convolutional neural network feature extraction model based on a Darknet-53 skeleton;
the standard convolution separation module is in communication connection with the convolution neural network feature extraction module and is used for calling a convolution layer creation function to separate separable standard convolutions in the convolution neural network feature extraction module into unit convolutions to form a base model;
the multi-scale feature extraction module is in communication connection with the standard convolution separation module and is used for connecting different convolution layers in the base model in a jump connection mode to construct a multi-scale feature extraction model;
and the model training module is in communication connection with the multi-scale feature extraction module and is used for training the multi-scale feature extraction model to obtain the pre-trained multi-scale feature extraction model.
9. The visual robotic grasping system according to claim 8, wherein the model training module includes:
the data set creating module is used for acquiring a multi-scale data set;
and the training module is in communication connection with the data set creating module and is used for training the multi-scale feature extraction model by adopting at least one of transfer learning, parallel operation and GPU acceleration methods based on the multi-scale data set.
10. The visual robotic grasping system according to claim 9, wherein the model training module further includes:
and the model evaluation optimization module is in communication connection with the training module and is used for evaluating and optimizing the trained multi-scale feature extraction model.
CN202021517844.0U 2020-07-28 2020-07-28 Vision robot grasping system Active CN212724028U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202021517844.0U CN212724028U (en) 2020-07-28 2020-07-28 Vision robot grasping system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202021517844.0U CN212724028U (en) 2020-07-28 2020-07-28 Vision robot grasping system

Publications (1)

Publication Number Publication Date
CN212724028U true CN212724028U (en) 2021-03-16

Family

ID=74911633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202021517844.0U Active CN212724028U (en) 2020-07-28 2020-07-28 Vision robot grasping system

Country Status (1)

Country Link
CN (1) CN212724028U (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239786A (en) * 2021-05-11 2021-08-10 重庆市地理信息和遥感应用中心 Remote sensing image country villa identification method based on reinforcement learning and feature transformation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239786A (en) * 2021-05-11 2021-08-10 重庆市地理信息和遥感应用中心 Remote sensing image country villa identification method based on reinforcement learning and feature transformation

Similar Documents

Publication Publication Date Title
CN111723782A (en) Deep learning-based visual robot grabbing method and system
CN109483554B (en) Robot dynamic grabbing method and system based on global and local visual semantics
CN110692082B (en) Learning device, learning method, learning model, estimating device, and clamping system
CN111590611B (en) Article classification and recovery method based on multi-mode active perception
CN110580725A (en) Box sorting method and system based on RGB-D camera
CN111046948B (en) Point cloud simulation and deep learning workpiece pose identification and robot feeding method
CN109584298B (en) Robot-oriented autonomous object picking task online self-learning method
CN108748149B (en) Non-calibration mechanical arm grabbing method based on deep learning in complex environment
WO2012052615A1 (en) Method for the filtering of target object images in a robot system
CN111923053A (en) Industrial robot object grabbing teaching system and method based on depth vision
CN111145257B (en) Article grabbing method and system and article grabbing robot
CN110969660A (en) Robot feeding system based on three-dimensional stereoscopic vision and point cloud depth learning
CN113610921A (en) Hybrid workpiece grabbing method, device and computer-readable storage medium
CN115816460B (en) Mechanical arm grabbing method based on deep learning target detection and image segmentation
CN115070781B (en) Object grabbing method and two-mechanical-arm cooperation system
CN212724028U (en) Vision robot grasping system
CN112947458B (en) Robot accurate grabbing method based on multi-mode information and computer readable medium
WO2024067006A1 (en) Disordered wire sorting method, apparatus, and system
CN116984269A (en) Gangue grabbing method and system based on image recognition
CN115861780B (en) Robot arm detection grabbing method based on YOLO-GGCNN
CN114347028B (en) Robot tail end intelligent grabbing method based on RGB-D image
CN114193440B (en) Robot automatic grabbing system and method based on 3D vision
CN113762159B (en) Target grabbing detection method and system based on directional arrow model
CN112288819B (en) Multi-source data fusion vision-guided robot grabbing and classifying system and method
CN111331599A (en) Automatic directional article grabbing method and system based on mechanical arm

Legal Events

Date Code Title Description
GR01 Patent grant
GR01 Patent grant