CN111055281B

CN111055281B - ROS-based autonomous mobile grabbing system and method

Info

Publication number: CN111055281B
Application number: CN201911320327.6A
Authority: CN
Inventors: 杨宇翔; 孙卫军; 高明煜; 董哲康; 林辉品; 曾毓
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2019-12-19
Filing date: 2019-12-19
Publication date: 2021-05-07
Anticipated expiration: 2039-12-19
Also published as: CN111055281A

Abstract

The invention relates to an autonomous mobile grabbing system and method based on ROS. The invention can be divided into an image acquisition module, a visual algorithm processing module, a pose detection module and a capture control module. The method realizes the functions of environment perception, visual navigation and real-time object pose estimation and grabbing of the mobile robot in strange environment. The mobile robot is combined with the machine vision, so that the robot can complete more intelligent tasks, good service is provided, and unnecessary burden is reduced. The invention has the advantages of high efficiency, high detection precision, real-time detection, good adaptability and the like, can release the human input in daily work to a great extent, and has strong market potential. The multifunctional desk is greatly convenient for people to work and live, and is beneficial to promoting the improvement of living standard.

Description

ROS-based autonomous mobile grabbing system and method

Technical Field

The invention belongs to the field of machine vision, and particularly relates to an autonomous mobile grabbing system and method based on ROS.

Background

With the growing aging problem and the compelling society to transfer more manpower, the service robots can greatly release the manpower input of daily work in consideration of the life of the old people through material resources, but at present, most service robots are single-function and conversational robots, and mobile robots with anthropomorphic, autonomous environment perception and visual guidance tasks are still in the research stage, are not mature enough and are difficult to complete complex integrated tasks. In addition, the mobile robot is more and more widely applied to disaster rescue, military investigation and other places. For high-risk places, the use of the mobile robot can reduce casualties of people to a great extent, but how to integrate the visual sensing technology with methods such as automatic control, motion planning and visual navigation, and perfect self-gripping is a key technology in the field of mobile robots. Therefore, the autonomous mobile grabbing system capable of autonomously sensing the environment, detecting in real time, autonomously moving and executing related tasks is developed, the manual labor is reduced, the safety of high-risk operation is improved, and the system has important theoretical value and practical significance.

Disclosure of Invention

In view of the above-mentioned shortcomings of the background art, the present invention is directed to an autonomous mobile grabbing system based on ROS, which accomplishes the sensing of the environment and the object grabbing function through image acquisition, visual algorithm processing, visual navigation and visual grabbing. The visual sensing technology, the visual navigation, the motion planning, the automatic control technology and the like are well integrated. Unnecessary personnel operation is saved. Unnecessary human input is liberated. Has good engineering significance.

An autonomous mobile grabbing system based on ROS is divided into four parts: the system comprises an image acquisition module, a visual navigation module, a pose detection module and a capture control module; the image acquisition module consists of an RGB-D camera, an NVIDIA Jetson TX2 and a rotating tripod head module, the RGB-D camera and the NVIDIA Jetson TX2 are responsible for acquiring color images and depth images, the rotating tripod head control module consists of a motor and a Bluetooth control panel, the tripod head is controlled to control the camera to rotate to the horizontal direction when a visual navigation task is executed, and the tripod head is controlled to control the camera to rotate to the vertical direction when a pose detection task is executed; the visual navigation module consists of a mobile trolley, an infrared sensor and an NVIDIA GeForce GTX 1080Ti server; the method comprises the following steps that an RGB-D camera collects scene color images and depth images and transmits the scene color images and the depth images to a server, the server executes an SLAM algorithm to build images and uses a move _ base library to control a mobile trolley to navigate based on ROS, and an infrared sensor is used for avoiding obstacles in the navigation process; the pose detection module is completed by an NVIDIA GeForce GTX 1080Ti server, an RGB-D camera collects a target color image and a target depth image and transmits the target color image and the target depth image to the server, and the server executes the coordinates of a PVnet regression key point of a pixel voting neural network and calculates the 6D pose of an object based on a PnP algorithm; the grabbing control module comprises a mechanical arm with 5 degrees of freedom and a controller thereof, the controller receives a target grabbing pose, and a Moveit motion planning algorithm library of the ROS is used for planning a mechanical arm motion path and executing a grabbing task; the ROS is a Robot Operating System; the 6D pose includes a 3D position and a 3D orientation.

A grabbing method of an autonomous mobile grabbing system based on ROS comprises the following steps:

the method comprises the following steps: placing the trolley in a strange environment, and acquiring the color image and the depth image by the image acquisition module;

step two: the color image and the depth image acquired by the image acquisition module are transmitted to the visual navigation module through the ROS and execute the SLAM algorithm to construct a map; marking a target position area by using a two-dimensional code icon, determining the target area by detecting the pose of the two-dimensional code icon, and autonomously moving the mobile car to the vicinity of the target area according to the path planned by the move _ base motion navigation library of the ROS;

the map construction is based on an RGBD-SLAM-V2 framework, a front-end vision odometer extracts features from each frame of color image, the pose relationship between two frames of images is calculated by using an RANSAC + ICP algorithm by means of an ORB feature descriptor, then the rear-end and loop detection optimization based on a g2o optimization library is carried out, and a dense map is generated by combining the optimized pose with a color image and a depth image; then, the position and the pose of the target area are found through the two-dimensional code icon marks

The moving trolley receives

Navigating to the target area by using the move _ base motion navigation library of the ROS;

step three: after the target area is reached, sending an instruction to a Bluetooth control panel through a message queue telemetry transmission protocol to control the steering of a camera, then preprocessing a color image acquired by an image acquisition module, and sending the preprocessed color image to a position and posture detection module through an ROS (reactive oxygen species) module;

the preprocessing of the color image specifically comprises the following steps:

f (x, y, z) represents a frame image with the size of H multiplied by W, x, y represent horizontal pixel coordinates and vertical pixel coordinates of the image, z represents the number of channels, and the color map z is 3, and when the color map is used for pose detection, the average value and variance normalization processing of a data set is used for the color map:

f(x,y,z)＝(f(x,y,z)-mean[z])/std[z]z＝3 (1)

wherein mean ═ 0.485,0.456,0.406] std ═ 0.229,0.224,0.225 ];

step four: the pose detection module calculates the 6D pose of the object through a pose detection algorithm, converts the pose into a target grabbing pose and sends the target grabbing pose to the grabbing control module through an ROS (reactive oxygen species) system;

the pose detection module calculates a 6D pose of the object through a pose detection algorithm, and specifically comprises the following steps:

constructing an improved pixel-level voting neural network PVnet, which specifically comprises the following steps:

the input f (x, y, z) of PVnet is a color map with the size of H multiplied by W, a pre-trained Resnet-18 is used as a basic network, when the size of the feature map is H/8 multiplied by W/8, an associated spatial gradient posing module with three different void rates is used for extracting features, residual error connection is repeatedly used in a convolution layer and an upper sampling layer, and then up sampling is carried out until the size of the output reaches H multiplied by W; and outputting a pixel-level prediction direction vector and a semantic segmentation result by using a convolution layer of 1 multiplied by 1 in the feature map of the last layer. The specific ASPP module is that convolution kernels with different voidage rates are used for processing feature maps, then the feature maps are connected together to expand the number of channels, and finally an output feature map with the same size is obtained through a 1x 1 convolution layer.

(a) Sending the input picture f (x, y, z) into a pixel-level voting neural network PVnet regression to obtain a semantic segmentation result and a pixel-level prediction direction vector, wherein the specific direction vector is defined as:

for a pixel p, the network outputs an object label associated therewith and a representation of 2D keypoints from the pixel p to the object

A unit direction vector of (a); v. of_k(p) is defined as:

(b) voting positioning key point coordinate

Given semantic tags and unit vectors, keypoint hypotheses are generated using a RANSAC-based voting scheme, first using termsThe semantic label finds a pixel p belonging to the target object, then two pixels are randomly selected, and a prediction unit direction vector v corresponding to the two pixels is determined_k(p) intersection as a key point

Hypothesis h of_k,iRepeating N times to generate a series of predictions of keypoints { h_k,i1,2,3, N, for the hypothesis h_k,iScore w of vote_k,iIs defined as:

wherein theta is a threshold value, p belongs to O and represents that a pixel point p belongs to an object O, and the hypothesis with the highest score is selected as the predicted value of the 2D coordinate of the key point

Wherein the pose is to be estimated

Convert to grab pose p_graspThe method specifically comprises the following steps:

according to the obtained 2D key point coordinates

And three-dimensional point coordinates of corresponding key points on the known object model

Obtaining an estimated pose relative to a camera coordinate system by using a PnP algorithm according to the corresponding relation of 2D-3D

Hand-eye matrix obtained according to hand-eye calibration

Will be provided with

Transformed to pose relative to base coordinates of the mechanical arm

According to the position and posture

Obtaining a coordinate system C relative to the base of the mechanical arm_baseEstimated object coordinate system of

A top-down grabbing mode is adopted, when the gripper at the tail end of the mechanical arm is vertically downward, the Z axis of a coordinate system at the tail end of the mechanical arm is also vertically downward, and the Z axis of a coordinate system at the base end of the mechanical arm is vertically upward; subject coordinate system C_objectAnd a robot base coordinate system C_baseThe original points are overlapped through displacement; the unit vector of the Z axis of the base coordinate system is

Object coordinate system C_objectThe unit vector of each axis is

Is calculated to obtain

And

the included angle between:

according to

Further obtaining acute angles theta between the vectors_x,θ_y,θ_z：

Then take the minimum theta_iThe corresponding axis is the final transformation axis, i is min (θ)_i)

(b) Assuming an estimated object attitude coordinate system C_objectAnd a robot base coordinate system C_baseThe transformation relationship between the rotation matrix is R and the displacement is T

And obtaining Euler angles corresponding to all the axes according to the rotation matrix as follows:

when the rotation axis i is known from (a), the corresponding rotation angle θ is obtained from the equation (6)_iRotating the base coordinate system of the mechanical arm to a Z axis vertically downwards, and then rotating the Z axis by theta_iAccording to

The final grabbing pose p can be obtained by making corresponding displacement on the displacement T in the middle_grasp；

Step four: after the mechanical arm controller receives the target grabbing pose, planning mechanical arm movement grabbing through a Moveit movement planning library of the ROS; if the mechanical arm cannot reach the target pose, the position of the trolley is adjusted correspondingly according to the feedback information, and the detection and the grabbing are carried out again until the grabbing is finished, or no object exists in the feedback;

the obtained grabbing pose p_graspThe information is sent to a mechanical arm controller of the grabbing control module in a topic form through the ROS; the mechanical arm receives the grabbing gesture p_graspAnd then, subtracting the target displacements delta x, delta y and delta z from the extreme displacement of the tail end of the mechanical arm, and judging p_graspWhether or not on the displacementExceeds the motion space of the mechanical arm; if the delta x, the delta y and the delta z are positive, the mechanical arm plans the motion of the mechanical arm through a Moveit motion planning library of the ROS, and a grabbing middle position p is set before the mechanical arm moves to a target position_{grasp_mid}，p_{grasp_mid}Is formed by p_graspIs translated upwards for a certain distance to obtain a placing position p after being grabbed_placeAlso past this intermediate position p_graspThen reaches the set placing position p_place(ii) a If the sizes of the delta x, the delta y and the delta z are negative numbers, and the sizes of the delta x, the delta y and the delta z are within the threshold values, corresponding direction movement can be carried out to enable the delta x, the delta y and the delta z to be positive, trial grabbing is carried out again, and the final visual grabbing task is completed; if one of the magnitudes of Δ x, Δ y, and Δ z is outside the threshold, no object is present in the feedback.

The invention has the beneficial effects that: the method realizes the functions of sensing unknown environment and moving and grabbing objects by adopting the modes of image acquisition, visual algorithm processing, visual navigation, visual grabbing and the like, integrates information fusion of a plurality of sensors, completes distributed design, has strong execution capacity and definite functional module division, can quickly identify targets in the environment and react, and can replace manual work to complete specific tasks. And manual labor is liberated to a certain extent. The invention has the advantages of autonomous perception, high detection precision, real-time detection of the pose of the target and the like, and has considerable theoretical value and practical value.

The specific implementation mode is as follows:

The moving trolley receives

f(x,y,z)＝(f(x,y,z)-mean[z])/std[z]z＝3 (1)

wherein mean ═ 0.485,0.456,0.406] std ═ 0.229,0.224,0.225 ];

constructing a pixel level voting neural network PVnet, which specifically comprises the following steps:

A unit direction vector of (a); v. of_k(p) is defined as:

(b) voting positioning key point coordinate

Giving semantic labels and unit vectors, generating a key point hypothesis by using a random sample consensus (RANSAC) -based voting scheme, firstly finding a pixel p belonging to a target object by using the semantic labels, then randomly selecting two pixels, and corresponding the two pixels to a prediction unit direction vector v_k(p) intersection as a key point

Wherein the pose is to be estimated

according to the obtained 2D key point coordinates

Hand-eye matrix obtained according to hand-eye calibration

Will be provided with

Transformed to pose relative to base coordinates of the mechanical arm

According to the position and posture

Object coordinate system C_objectThe unit vector of each axis is

Is calculated to obtain

And

the included angle between:

according to

Further obtaining acute angles theta between the vectors_x,θ_y,θ_z：

the obtained grabbing pose p_graspThe information is sent to a mechanical arm controller of the grabbing control module in a topic form through the ROS; the mechanical arm receives the grabbing gesture p_graspAnd then, subtracting the target displacements delta x, delta y and delta z from the extreme displacement of the tail end of the mechanical arm, and judging p_graspWhether the movement space of the mechanical arm is exceeded in displacement; if the delta x, the delta y and the delta z are positive, the mechanical arm plans the motion of the mechanical arm through a Moveit motion planning library of the ROS, and a grabbing middle position p is set before the mechanical arm moves to a target position_{grasp_mid}，p_{grasp_mid}Is formed by p_graspIs translated upwards for a certain distance to obtain a placing position p after being grabbed_placeAlso past this intermediate position p_graspThen reaches the set placing position p_place(ii) a If the sizes of the delta x, the delta y and the delta z are negative numbers, and the sizes of the delta x, the delta y and the delta z are within the threshold values, corresponding direction movement can be carried out to enable the delta x, the delta y and the delta z to be positive, trial grabbing is carried out again, and the final visual grabbing task is completed; if one of the magnitudes of Δ x, Δ y, and Δ z is outside the threshold, no object is present in the feedback.

Claims

1. An autonomous mobile grabbing system based on ROS, its characterized in that: the system is divided into four parts: the system comprises an image acquisition module, a visual navigation module, a pose detection module and a capture control module; the image acquisition module consists of an RGB-D camera, an NVIDIA Jetson TX2 and a rotating tripod head control module, the RGB-D camera and the NVIDIA Jetson TX2 are responsible for acquiring color images and depth images, the rotating tripod head control module consists of a motor and a Bluetooth control panel, the tripod head is controlled to control the camera to rotate to the horizontal direction when a visual navigation task is executed, and the tripod head is controlled to control the camera to rotate to the vertical direction when a pose detection task is executed; the visual navigation module consists of a mobile trolley, an infrared sensor and an NVIDIA GeForce GTX 1080Ti server; the method comprises the following steps that an RGB-D camera collects scene color images and depth images and transmits the scene color images and the depth images to a server, the server executes an SLAM algorithm to build images and uses a move _ base library to control a mobile trolley to navigate based on ROS, and an infrared sensor is used for avoiding obstacles in the navigation process; the pose detection module is completed by an NVIDIA GeForce GTX 1080Ti server, an RGB-D camera collects a target color image and a target depth image and transmits the target color image and the target depth image to the server, and the server executes the coordinates of the PVnet regression key points of the pixel-level voting neural network and calculates the 6D pose of the object based on a PnP algorithm; the grabbing control module comprises a mechanical arm with 5 degrees of freedom and a controller thereof, the controller receives a target grabbing pose, and a Moveit motion planning algorithm library of the ROS is used for planning a mechanical arm motion path and executing a grabbing task; the ROS is a Robot Operating System; the 6D pose includes a 3D position and a 3D orientation.

2. The grabbing method of an ROS-based autonomous mobile grabbing system according to claim 1, characterized in that:

the map construction is based on an RGBD-SLAM-V2 framework, a front-end visual odometer extracts features from each frame of color image, and a RANSAC + ICP algorithm meter is used by means of an ORB feature descriptorCalculating the pose relationship between the two images, then carrying out rear end and loop detection optimization based on a g2o optimization library, and combining the optimized pose with a color image and a depth image to generate a dense map; then, the position and the pose of the target area are found through the two-dimensional code icon marks

The moving trolley receives

step three: after the target area is reached, sending an instruction to a Bluetooth control board through a message queue telemetry transmission protocol to control the RGB-D camera to turn, then preprocessing the color image acquired by the image acquisition module, and sending the preprocessed color image to the attitude and position detection module through an ROS (reactive oxygen species) module;

f(x,y,z)＝(f(x,y,z)-mean[z])/std[z] z＝3 (1)

wherein mean ═ 0.485,0.456,0.406] std ═ 0.229,0.224,0.225 ];

(a) sending the input picture f (x, y, z) into an improved pixel-level voting neural network PVnet regression to obtain a semantic segmentation result and a pixel-level prediction direction vector, wherein the specific direction vector is defined as:

for a pixel p, the network outputsObject tag associated therewith and representing 2D keypoints from pixel p to object

A unit direction vector of (a); v. of_k(p) is defined as:

(b) voting positioning key point coordinate

Wherein the pose is to be estimated

according to the obtained 2D key point coordinates

Hand-eye matrix obtained according to hand-eye calibration

Will be provided with

Transformed to pose relative to base coordinates of the mechanical arm

According to the position and posture

Object coordinate system C_objectThe unit vector of each axis is

Is calculated to obtain

And

the included angle between:

according to

Further obtaining acute angles theta between the vectors_x,θ_y,θ_z：

(b) Assuming an estimated object pose coordinate system C_objectAnd a robot base coordinate system C_baseThe transformation relationship between the rotation matrix is R and the displacement is T

when the rotation axis i is known from (a), the corresponding rotation angle θ is obtained from the equation (6)_iTo make the machineThe arm base coordinate system is rotated to the Z axis vertically downwards, and then the Z axis is rotated by theta_iAccording to

the obtained grabbing pose p_graspThe information is sent to a mechanical arm controller of the grabbing control module in a topic form through the ROS; mechanical arm receiving and grabbing pose p_graspThen, according to the final limit displacement of the mechanical arm minus the target displacement delta x, delta y and delta z, judging p_graspWhether the movement space of the mechanical arm is exceeded in displacement; if the delta x, the delta y and the delta z are positive, the mechanical arm plans the motion of the mechanical arm through a Moveit motion planning library of the ROS, and a grabbing middle position p is set before the mechanical arm moves to a target position_{grasp_mid}，p_{grasp_mid}Is formed by p_graspIs translated upwards for a certain distance to obtain a placing position p after being grabbed_placeAlso past this intermediate position p_graspThen reaches the set placing position p_place(ii) a If the sizes of the delta x, the delta y and the delta 0z are negative numbers, and the sizes of the delta 1x, the delta y and the delta z are within the threshold values, the corresponding direction movement can be carried out to ensure that the delta x, the delta y and the delta z are positive, trial grabbing is carried out again, and the final visual grabbing task is completed; if one of the magnitudes of Δ x, Δ y, and Δ z is outside the threshold, no object is present in the feedback.

3. The grabbing method of an ROS-based autonomous mobile grabbing system according to claim 2, characterized in that: the improved pixel-level voting neural network PVnet is as follows:

the input f (x, y, z) of PVnet is a color map with the size of H multiplied by W, a pre-trained Resnet-18 is used as a basic network, when the size of the feature map is H/8 multiplied by W/8, an associated spatial gradient posing module with three different void rates is used for extracting features, residual error connection is repeatedly used in a convolution layer and an upper sampling layer, and then up sampling is carried out until the size of the output reaches H multiplied by W; outputting a pixel-level prediction direction vector and a semantic segmentation result by using a 1 multiplied by 1 convolutional layer in the feature map of the last layer; the specific ASPP module is that convolution kernels with different voidage rates are used for processing feature maps, then the feature maps are connected together to expand the number of channels, and finally an output feature map with the same size is obtained through a 1x 1 convolution layer.