CN111482967B

CN111482967B - Intelligent detection and grabbing method based on ROS platform

Info

Publication number: CN111482967B
Application number: CN202010522144.9A
Authority: CN
Inventors: 陈海永; 曹爱斌; 王涛
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2020-06-08
Filing date: 2020-06-08
Publication date: 2023-05-16
Anticipated expiration: 2040-06-08
Also published as: CN111482967A

Abstract

The invention discloses an intelligent detection and grabbing method based on an ROS platform, which utilizes a somatosensory peripheral device and an ROS system to collect and process information of a working scene, introduces a Faster RCNN network model, searches a target through the network model, improves the target detection speed and shortens the grabbing point estimation time; and the grabbing process is simulated in advance through the simulation platform so as to avoid grabbing collision caused by errors, and the mechanical arm is controlled to execute grabbing tasks according to simulation results. The method detects the target object and estimates the total time of the grabbing points to be within the range of [0.074,0.263] seconds, and the total grabbing success rate is about 87% after 20 times of grabbing statistics are carried out on 5 objects respectively.

Description

Intelligent detection and grabbing method based on ROS platform

Technical Field

The invention belongs to the field of control and motion planning of mechanical arms, in particular to an intelligent detection and grabbing method based on an ROS platform, which relates to a grabbing method of the mechanical arms for objects, and can be used for identifying the objects and finding out an optimal grabbing point to grab the objects.

Background

The control grabbing of the existing mechanical arm often uses a teaching method, for example, chinese patent publication No. CN108655026a, entitled "a robot rapid teaching sorting system and method", which saves part of the programming links, but has the following drawbacks: the precision is determined by visual inspection of a demonstrator, and the on-line programming of the disordered path teaching is difficult to obtain satisfactory effect, the teaching process is easy to cause accidents, the equipment is damaged by light collision, and the person is injured by heavy collision. According to the method, a camera is arranged above a gripped object, then an image is acquired and transmitted to an upper computer, then the image is processed to obtain the space coordinates of an object relative to the camera and the mechanical arm, and a motion plan is given and transmitted to a lower computer to achieve gripping. According to the reference, the existing algorithm for finding the target position and the grabbing point by using the deep learning method mainly aims at improving the recognition accuracy and range, but has little progress in recognition speed, and the most time-consuming part of the task is recognition.

ROS is the primary system for all information and message transmission, which makes programming easier and more adaptive. ROS is a distributed processing system. The main purpose of ROS is to make the robot program more reusable, increasing the code reuse rate, simply because it supports many existing programming languages and allows programs in different languages to work together. Another advantage of ROS is that it is a distributed system, meaning that it can handle multiple tasks on multiple computers. This may distribute cores across multiple computers, effectively configuring computing resources.

Disclosure of Invention

In order to solve the problems of long target detection time consumption, low capture precision and the like of the existing capture method, the invention provides an intelligent detection and capture method based on an ROS platform, a capture algorithm uses a Faster RCNN network model, the network model can rapidly and effectively identify a captured object, and because of the ROS platform, the scene realized by the method has more flexibility in deployment, and the scene operation efficiency is improved.

The technical scheme for solving the technical problems is as follows: the intelligent detection and grabbing method based on the ROS platform is characterized by comprising the following steps:

step one: the Kinect body feeling peripheral outer device is arranged through the peripheral fixed outer frame, the Kinect body feeling peripheral outer device is arranged above or above the side of the mechanical arm, the mechanical arm is positioned in the shooting visual field range of the camera, and the object to be grabbed placing platform is arranged in front of the mechanical arm;

step two: configuring a scene realized by the method, and carrying out communication connection on peripheral equipment of Kinect somatosensory, a desktop computer, a movable notebook computer and a mechanical arm;

step three: acquiring an RGB image of a current scene and registration depth information from a Kinect somatosensory peripheral device to obtain a registered depth image;

step four: an object detection system based on a fast RCNN network model is operated on the RGB image to search for the object to be grabbed, and the coordinates of the center of the object to be grabbed are returned in the image coordinates;

step five: finding out the 3D coordinates of the object center corresponding to the RGB camera coordinate system arranged outside the Kinect somatosensory periphery through the registered depth image, obtaining the coordinates of the object relative to the mechanical arm base through the coordinate transfer matrix according to the coordinates of the center of the object to be grabbed returned in the image coordinates, and further obtaining the coordinates of the object grabbing points relative to the mechanical arm base;

step six: and D, planning a path of the manipulator by utilizing the kinematics of the manipulator, firstly setting up an initial position of the manipulator in a desktop computer, obtaining angle parameters of all joints of the manipulator, determining an optimal path according to the coordinates of the object grabbing point obtained in the step five relative to a manipulator base, controlling the joints of the manipulator by a notebook computer, and performing target selection and executing grabbing tasks according to the grabbing point and the determined optimal path.

Compared with the prior art, the invention has the beneficial effects that: according to the method, information acquisition and processing are carried out on a working scene by utilizing peripheral somatosensory equipment and an ROS system, a Faster Region Convolutional Neural Network (Faster RCNN) network model is introduced, a target is searched through the network model, the target detection speed is improved, and the grabbing point estimation time is shortened; and the grabbing process is simulated in advance through the simulation platform so as to avoid grabbing collision caused by errors, and the mechanical arm is controlled to execute grabbing tasks according to simulation results. The method detects the target object and estimates the total time of the grabbing points to be within the range of [0.074,0.263] seconds, and the total grabbing success rate is about 87% after 20 times of grabbing statistics are carried out on 5 objects respectively.

Drawings

FIG. 1 is an implementation scenario assembly schematic of one embodiment of the method of the present invention;

FIG. 2 is a schematic diagram of a Faster RCNN network model framework;

FIG. 3 is a flow chart of target detection based on the Faster RCNN network model;

FIG. 4 is a training flow diagram of an object detection system based on the Faster RCNN network model in accordance with one embodiment of the method of the invention.

Figure 5 is a flow chart of the steps of one embodiment of the method of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and with reference to preferred embodiments.

The invention provides an intelligent detection and grabbing method (short for method) based on an ROS platform, which is realized by the following steps:

step one: the Kinect somatosensory peripheral equipment is arranged through the peripheral fixed outer frame, the Kinect somatosensory peripheral equipment is arranged above or above the side of the mechanical arm, and the mechanical arm is positioned in the shooting visual field range of the camera of the mechanical arm so as to facilitate shooting, and the object to be grabbed placing platform is arranged in front of the mechanical arm.

Step two: the method is configured to realize the scene that Kinect body feeling peripheral equipment, a desktop computer, a movable notebook computer and a mechanical arm are connected in a communication mode, specifically, the notebook computer is connected to the Internet by using WIFI, the Internet is set as an ROS main server, the IP address of the notebook computer is 192.168.1, the desktop computer is connected to the Internet by using a wireless network card, the address of the desktop computer is 192.168.1.1, the Kinect body feeling peripheral equipment is connected to the notebook computer by using a USB port, and the mechanical arm is connected to the notebook computer by using the USB port. The notebook computer and the desktop computer use an SSH communication mode, and the mechanical arm and the notebook computer are in a serial port communication mode.

The ROS system is operated on the desktop computer and the notebook computer, the desktop computer is used for processing data (object detection, three-dimensional position calculation and mechanical arm joint angle calculation) and sending instructions according to judgment, the notebook computer is used as a data collector for receiving and sending peripheral data of Kinect somatosensory peripheral, and is used as a controller for receiving instructions of the desktop computer so as to control the action of the mechanical arm. The mechanical arm is a multi-degree-of-freedom mechanical arm comprising a control system, the notebook computer sends an action instruction to the control system of the mechanical arm, and the control system controls corresponding joints to move.

Step three: and acquiring the RGB image of the current scene and the registered depth information from the Kinect somatosensory peripheral equipment to obtain a registered depth image. Specifically, the notebook computer acquires an RGB image of the current scene and registration depth information from peripheral equipment of the Kinect somatosensory, and sends the RGB image and registration depth information to a desktop computer provided with the ROS system, and the desktop computer registers a depth coordinate system to an image coordinate system by using a built-in function provided by the Kinect development environment, so as to obtain a registered depth image. The built-in function can find the RGB image at each position on the corresponding point depth image, and based on the test, the built-in function can run perfectly under OpenNI.

Step four: an object detection system based on a fast RCNN network model is operated on the RGB image to search for the object to be grabbed, and the coordinates of the center of the object to be grabbed are returned in the image coordinates.

The method comprises the following steps:

4-1: firstly, preprocessing an object detection system based on a fast RCNN network model in a desktop computer, wherein the specific operation steps are as follows:

4-1-1: a self-annotated image of the ImageNet dataset (an image of the object grabbed by the framed object) is selected as the dataset root set.

4-1-2: image rotation is carried out on the root set of the data set (in order to overcome slight visual angle distortion in the image edge of the RGB image and uneven surface), and each image is rotated at the rotation angle of-5, -4, -3, -2, -1, 2, 3, 4 and 5 degrees so as to process the visual angle distortion of the camera, so that a data sample set is obtained.

4-1-3: and (3) acquiring foreign object position coordinates and classifications in the pictures of the data sample set by using a target detection marking tool Labellmg for marking, and storing the acquired data in a txt file.

4-1-4: and (5) manufacturing the obtained txt file into an xml file in the VOC2007 data set Annotations file.

4-1-5: generating a training set, a testing set and a verification set in the VOC2007 data set according to the xml file; the format is txt.

4-1-6: and (3) downloading the VOC2007 data set, replacing data in the Annogens file with the xml file obtained in the step (4-1-4), replacing data in the ImageSets file with txt data obtained in the step (4-1-5), and placing the picture of the data sample set in the JPEGImages folder.

4-1-7: the pictures of all the data sample sets are randomly and non-repeatedly placed in a text file of 'train. Txt', a text file of 'test. Txt', and a text file of 'val. Txt'.

4-1-8: setting up a Faster RCNN network model frame, setting a class label of the Faster RCNN network model according to the class of the data sample set, and modifying parameters related to the total number of classes in the Faster RCNN network model according to the total number of the classes of the data sample set;

specifically, the label in the script "VOCinit" in the VOCdevkit software package is changed to 5. The verification iteration accounts for 20% of the picture used for verification, and this value is set to 300. Parameters of the Faster RCNN network model are changed, because the parameters are related to the class number of the object; the first modification was made on the "train_val.prototxt" files of the paths "models\fast_rcnn\prototxts\zf" and "models\fast_rcnn\prototxts\zf_fc6", modifying the input dim of "bbox_targets" and "bbox_loss_weights" to 24, which results from the formula input dim= (number of categories + background) # 4. The output number of the 7 th full connection layer is changed to 24, and the same area of the test.

4-2: and starting a training process by running a corresponding MATLAB script 'script_master_rcnn_VOC 2007_ZF.m', so as to obtain a training model based on the Faster RCNN network model.

The step 4-2 specifically comprises the following steps:

4-2-1: and (3) initializing a Faster RCNN network model according to the weight parameters pre-trained by the ImageNet data set in the step (4-1).

4-2-2: inputting the data sample set in 4-1 into a Faster RCNN network model, and obtaining a characteristic diagram of the image through a convolutional neural first layer network.

4-2-3: and introducing an RPN network generation area suggestion, judging whether a detection target exists or not through an anchor generation layer, and calculating the scaling and translation scale of the prediction frame.

4-2-4: and calculating the scaling and translation scale of the calibration frame, and carrying out fine adjustment on the prediction frame.

4-2-5: a loss function of the predicted object position and RPN is calculated.

4-2-6: the suggested boxes are mapped onto the last layer of the convolutional signature of the convolutional network.

4-2-7: a Pooling layer (RoI Pooling) is used to generate feature maps of the same size for each rectangular box.

4-2-8: classification was performed using a softmax layer.

4-2-9: regression coefficients are applied to the boundaries, and a final bounding box is made for the classification result.

4-2-10: and (3) adjusting the learning rate and the iteration times according to the training result to obtain a final training model based on the Faster RCNN network model.

4-3: and (3) transmitting the RGB images collected by the peripheral Kinect somatosensory devices into the training model based on the fast RCNN network model obtained in the step (4-2), finding out the object to be grabbed, and returning the coordinates of the center of the object to be grabbed in the image coordinates.

Step five: and finding out the 3D coordinates of the object center corresponding to the RGB camera coordinate system arranged outside the Kinect somatosensory periphery through the registered depth image, obtaining the coordinates of the object relative to the mechanical arm base through the coordinate transfer matrix according to the coordinates of the center of the object to be grabbed returned in the image coordinates, and further obtaining the coordinates of the object grabbing points relative to the mechanical arm base.

The method for calculating the coordinate transfer matrix comprises the following steps: the world coordinate system XYZ with an origin at (0, 0) is set as the base coordinate of the mechanical arm, and the origin of the preset camera coordinate system is (-0.5, 0, 0.5) relative to the world coordinate system, so that a coordinate transfer matrix needs to be established, and the steps are as follows: rotated by-90 deg. about the Z axis and rotated by 120 deg. about the Y axis, translated by-0.5, 0,0.5, respectively, with respect to the x, Y and Z axes.

The final coordinate transfer matrix of this embodiment is:

step six: step six: and D, planning a path of the manipulator by utilizing the kinematics of the manipulator, firstly setting up an initial position of the manipulator in a desktop computer, obtaining angle parameters of all joints of the manipulator, determining an optimal path according to the coordinates of the object grabbing point obtained in the step five relative to a manipulator base, controlling the joints of the manipulator by a notebook computer, and performing target selection and executing grabbing tasks according to the grabbing point and the determined optimal path.

The specific process of the step six is as follows: firstly, building an object to be grabbed and a mechanical arm model on a simulation platform Gazebo, then subscribing each joint in the model to a corresponding ROS theme, calculating the rotation angle of each joint and the coordinates of a movable joint according to the coordinates of an object grabbing point obtained in the step five relative to a mechanical arm base, simulating the force application of the mechanical arm to the object to be grabbed, verifying the actual feasibility of grabbing, analyzing the feasibility, and controlling the mechanical arm to execute grabbing tasks by a notebook computer according to the feedback result of the simulation platform.

In the first step, the peripheral installation position of the Kinect body feeling is required to have a good visual field as far as possible and is not shielded.

In the third step, the origin of the registered depth information is at the center of the image, the origin of the RGB image is at the upper left corner of the visual field, and the two are corresponded by the built-in function in the Kinect somatosensory peripheral development environment.

In the fourth step, the data acquired from the peripheral Kinect somatosensory device is used to process the data in the two-dimensional space by the Faster RCNN network model, and the object can be positioned in the three-dimensional space.

In the fourth step, the object detection system based on the fast RCNN network model is a combination of a fast R-CNN structure and a regional generation network (RPN), where the RPN network and the detection network share a convolution layer, so that the time for regional suggestion is as low as 10 ms per image. This allows the fast RCNN network model to have real-time detection speed.

In the fourth step, the number of data set samples is expanded by horizontal inversion, and a random clipping mode can be adopted.

In the fourth step, when training the model, the GPU video memory of the desktop computer should be not lower than 4GB, otherwise, overflow may be caused.

In the fifth step, the coordinate transfer matrix is fixed in the same capture scene.

In the above step six, the z-axis of each joint is set to be aligned with the joint segment based on the robot kinematics, and coordinates are set for each joint. The mechanical arm in this embodiment has 4 degrees of freedom, and the angle of each joint is limited, the first joint is rotated by-90 ° to 90 °, the second and third joints are rotated by-120 ° to 120 °, and the fourth joint is rotated by 360 °.

In the sixth step, in order to avoid collision caused by the first grabbing of the mechanical arm, the test is performed on a Gazebo simulation platform (a mechanical arm simulation platform), the Gazebo is an independent 3D simulation environment, and is suitable for multiple robot platforms, and is a popular simulation environment in the application and development of robots, and ROS is one of the platforms capable of cooperating with Gazebo. The Gazebo simulation platform is run-time loaded with a library of c++ for plug-ins that can access the API of the Gazebo simulation platform to perform various tasks, such as moving joints, applying forces to objects, etc.

In the step six, after the simulation test, the actual test is performed, the joints issue the state thereof and subscribe the control command message to be sent to the ROS host, and the plug-in program is linked to each joint and the corresponding ROS theme, so that the joint movement is realized to reach the vicinity of the object to be grabbed.

In the sixth step, in view of the fact that the grabbing points of different objects are different and not easy to identify, the depth network is not trained to identify the optimal grabbing point, but the depth image is used to find the grabbing point.

The method is used for carrying out target detection and grabbing on 5 objects to be grabbed, such as a coffee cup, a bowl, a flashlight, a medicine bottle and a disposable cup, the total time for detecting the target object and estimating grabbing points is within the range of [0.074,0.263] seconds, 20 times of grabbing statistics are carried out on each of the 5 objects, and the total grabbing success rate is about 87%.

The invention is applicable to the prior art where it is not described.

Claims

1. The intelligent detection and grabbing method based on the ROS platform is characterized by comprising the following steps:

step four: an object detection system based on a FaterRCNN network model is operated on the RGB image to search the object to be grabbed, and the coordinates of the center of the object to be grabbed are returned in the image coordinates;

step six: the method comprises the steps of utilizing the kinematics of a mechanical arm to carry out path planning of the mechanical arm, firstly setting up an initial position of the mechanical arm in a desktop computer, obtaining angle parameters of all joints of the mechanical arm, determining an optimal path according to coordinates of an object grabbing point obtained in the step five relative to a mechanical arm base, controlling the joints of the mechanical arm by a notebook computer, carrying out target selection according to the grabbing point and the determined optimal path, and executing grabbing tasks;

the specific process of the fourth step is as follows:

4-1: firstly, preprocessing an object detection system based on a FaterRCNN network model in a desktop computer;

4-2: the training process is initiated by running the corresponding MATLAB script "script _ master _ rcnn _ VOC2007_ zf.m", obtaining a training model based on a FaterRCNN network model;

4-3: transmitting the RGB images collected by the peripheral Kinect somatosensory devices into a training model based on the fast RCNN network model obtained in the step 4-2, finding out an object to be grabbed, and returning the coordinates of the center of the object to be grabbed in the image coordinates;

the specific operation steps of the step 4-1 are as follows:

4-1-1: selecting a self-annotated image of the ImageNet dataset as a dataset root set;

4-1-2: image rotation is carried out on the root set of the data set, and each image is rotated at the rotation angles of-5, -4, -3, -2, -1, 2, 3, 4 and 5 degrees so as to process the visual angle distortion of a camera and obtain a data sample set;

4-1-3: the method comprises the steps of obtaining foreign object target position coordinates and classifications in pictures of a data sample set by using a target detection marking tool Labellmg for marking, and storing obtained data in a txt file;

4-1-4: the obtained txt file is manufactured into an xml file in the VOC2007 data set Annotations file;

4-1-5: generating a training set, a testing set and a verification set in the VOC2007 data set according to the xml file; the format is txt;

4-1-6: downloading a VOC2007 data set, replacing data in the Annogens file with the xml file obtained in the step 4-1-4, replacing data in the ImageSets file with txt data obtained in the step 4-1-5, and placing a picture of the data sample set in a JPEGImages folder;

4-1-7: randomly and non-repeatedly putting the pictures of all the data sample sets into a text file of 'train. Txt', a text file of 'test. Txt' and a text file of 'val. Txt';

4-1-8: setting up a FaterRCNN network model framework, setting class labels of the FaterRCNN network model according to classes of the data sample set, and modifying parameters related to the total number of classes in the FaterRCNN network model according to the total number of classes of the data sample set.

2. The intelligent detection and grasping method based on the ROS platform of claim 1, wherein the communication connection in the second step is specifically: the notebook computer is accessed to the Internet by using WIFI, the notebook computer is set as an ROS main server, the IP address of the notebook computer is 192.168.1, the desktop computer is accessed to the Internet by using a wireless network card, the address of the notebook computer is 192.168.1.1, the peripheral of the Kinect somatosensory is accessed to the notebook computer by using a USB port, and the mechanical arm is accessed to the notebook computer by using the USB port; the notebook computer and the desktop computer use an SSH communication mode, and the mechanical arm and the notebook computer are in a serial port communication mode.

3. The intelligent detection and grasping method based on the ROS platform according to claim 1, wherein the ROS system is operated on a desktop computer and a notebook computer in the second step, the desktop computer is used for processing data and sending instructions according to judgment, the notebook computer is used as a data collector for receiving and sending Kinect somatosensory peripheral data, and is used as a controller for receiving instructions of the desktop computer so as to control the action of the mechanical arm; the mechanical arm is a multi-degree-of-freedom mechanical arm comprising a control system, the notebook computer sends an action instruction to the control system of the mechanical arm, and the control system controls corresponding joints to move.

4. The intelligent detection and grabbing method based on the ROS platform according to claim 1, wherein the specific process of the third step is as follows: the notebook computer acquires RGB images and registration depth information of the current scene from peripheral Kinect somatosensory peripherals, sends the RGB images and registration depth information to a desktop computer provided with an ROS system, and registers a depth coordinate system to an image coordinate system by using a built-in function provided by a Kinect development environment to obtain a registered depth image.

5. The method of claim 1, wherein in steps 4-1-8, the label in the script "voccinit" in the voctev kit is changed to 5; the verification iteration accounts for 20% of the picture used for verification, and this value is set to 300; the parameters of the FaterRCNN network model are modified, the first modification is made on the "train_val.protoxt" files of paths "models\fast_rcnn\prototxts\ZF" and "models\fast_rcnn\prototxts\ZF_fc6", the input dim of "bbox_targets" and "bbox_loss_weights" is modified to 24, they come from the formula input dim= (number of categories+background) ×4; the output number of the 7 th full connection layer is changed to 24, and the same area of the test.

6. The intelligent detection and grasping method based on the ROS platform according to claim 1, wherein the specific operation steps of the step 4-2 are as follows:

4-2-1: initializing a FasterRCNN network model according to weight parameters pre-trained by the ImageNet data set in the step 4-1;

4-2-2: inputting the data sample set in 4-1 into a FasterRCNN network model, and obtaining a characteristic diagram of an image through a convolutional neural first layer network;

4-2-3: introducing an RPN network generation region suggestion, judging whether a detection target exists or not through an anchor generation layer, and calculating the scaling and translation scale of a prediction frame;

4-2-4: calculating scaling and translation scales of the calibration frame, and performing fine adjustment on the prediction frame;

4-2-5: calculating a loss function of the predicted object position and the RPN;

4-2-6: mapping the suggested frame to a final layer of convolution feature map of the convolution network;

4-2-7: generating a feature map with the same size for each rectangular frame by using a pooling layer;

4-2-8: classification using softmax layers;

4-2-9: applying regression coefficients to the boundaries, and making a final bounding box for the classification result;

4-2-10: and (3) adjusting the learning rate and the iteration times according to the training result to obtain a final training model based on the FasterRCNN network model.

7. The intelligent detection and grabbing method based on the ROS platform as claimed in claim 1, wherein the specific process of the sixth step is as follows: firstly, building an object to be grabbed and a mechanical arm model on a simulation platform Gazebo, then subscribing each joint in the model to a corresponding ROS theme, calculating the rotation angle of each joint and the coordinates of a movable joint according to the coordinates of an object grabbing point obtained in the step five relative to a mechanical arm base, simulating the force application of the mechanical arm to the object to be grabbed, verifying the actual feasibility of grabbing, analyzing the feasibility, and controlling the mechanical arm to execute grabbing tasks by a notebook computer according to the feedback result of the simulation platform.