CN114800511A

CN114800511A - Dual-stage mechanical arm grabbing planning method and system based on multiplexing structure

Info

Publication number: CN114800511A
Application number: CN202210489365.XA
Authority: CN
Inventors: 彭刚; 王浩
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-07-29
Anticipated expiration: 2042-04-29
Also published as: CN114800511B

Abstract

The invention discloses a dual-stage mechanical arm grabbing planning method and system based on a multiplexing structure, wherein the method comprises the following steps: acquiring multi-view data by changing a grabbing scene, generating grabbing postures to form a grabbing posture prediction data set, and training a grabbing posture prediction network of a multiplexing structure until convergence to obtain a grabbing posture prediction model; capturing the point cloud in the capture attitude prediction data set as a capture attitude evaluation data set, and training a capture attitude evaluation network of the multiplexing structure until convergence to obtain a capture attitude evaluation model; inputting the single view point cloud of the scene to be grabbed into a grabbing posture prediction model, inputting the predicted grabbing posture into a grabbing posture evaluation model to obtain quality scores, sorting according to the quality scores, and selecting K grabbing postures which are ranked at the front for guiding the mechanical arm to grab. According to the method, the robust capture of the unknown object in the multi-target stacking scene is realized through two-stage deep learning capture planning based on the multiplexing structure.

Description

Dual-stage mechanical arm grabbing planning method and system based on multiplexing structure

Technical Field

The invention belongs to the technical field of robot application, and particularly relates to a dual-stage mechanical arm grabbing planning method and system based on a multiplexing structure.

Background

In recent years, industrial robots have been used in welding, painting, palletizing, assembling, machining and other manufacturing processes. However, most of the current industrial applications of robots rely on a structured production environment, some fixed track points are collected by a manual teaching method, and some specific actions are completed by the robots according to the preset track points. When the production environment of a factory changes, a large amount of time is spent to gather the track points again, and the industrial production cost is greatly increased. In recent years, demands for unstructured environments such as warehouse logistics and home environments have been increasing, and intelligentization of robot technology has become a mainstream direction. The robot grabbing task is particularly common and important, and is a research hotspot in the field of the current robots, so that the deep research on the robot grabbing technology has important significance for promoting the intelligent process of the robot and improving the production efficiency.

The existing research shows that the docking and grabbing method based on the deep vision has better universality, flexibility and performance, but the robustness of the grabbing task of the robot is influenced by factors such as task environment, algorithm performance, data processing method and data quality. Therefore, the deep research of the robot grabbing planning method based on the deep vision and the design of a robust algorithm model have important significance for promoting the intelligent process of the robot grabbing operation.

Therefore, the technical problems of unreasonable grabbing strategy, poor algorithm robustness and low grabbing success rate exist in the prior art.

Disclosure of Invention

Aiming at the defects or improvement requirements in the prior art, the invention provides a dual-stage mechanical arm grabbing planning method and system based on a multiplexing structure, so that the technical problems of unreasonable grabbing strategy, poor algorithm robustness and low grabbing success rate in the prior art are solved.

In order to achieve the above object, according to an aspect of the present invention, there is provided a dual-stage manipulator grabbing planning method based on a multiplexing structure, including:

inputting the single view point cloud of a scene to be grabbed into a grabbing posture prediction model to obtain grabbing postures corresponding to each point in the point cloud to form a grabbing posture set, inputting the grabbing postures in the grabbing posture set into a grabbing posture evaluation model, sequencing the grabbing postures in the obtained grabbing posture set according to quality scores, and selecting K grabbing postures in the front of the ranking to guide a mechanical arm to grab operation;

the grabbing posture prediction model is obtained by training in the following mode:

acquiring multi-view data by changing a grabbing scene, performing three-dimensional reconstruction on the multi-view data to acquire scene complete point cloud, generating grabbing postures by using the scene complete point cloud to form a grabbing posture prediction data set, and training a grabbing posture prediction network to be convergent by using the grabbing posture prediction data set to obtain a grabbing posture prediction model;

the grabbing posture evaluation model is obtained by training in the following mode:

taking the point cloud which does not collide with the gripper at the tail end of the robot and is positioned in the gripping range of the gripper at the tail end of the robot in the gripping posture prediction data set as a gripping posture evaluation data set, and training a gripping posture evaluation network to be convergent by using the gripping posture evaluation data set to obtain a gripping posture evaluation model;

the grabbing posture prediction network and the grabbing posture evaluation network both adopt a multiplexing structure, and the multiplexing structure indicates that except the first layer network, the input of each subsequent layer network is connected with the input data of the first layer network or the output of the previous layer network.

Furthermore, except the first layer network, the input of each subsequent layer network in the grabbing posture prediction network is connected with the input data of the first layer network.

Further, except for the first layer of network, the input of each subsequent layer of network is connected with the output of the previous layer of network.

Further, the multi-view data comprises a camera pose and a depth image, and is acquired in the following manner:

aiming at a certain grabbing posture, executing a grabbing task, and changing a grabbing scene by reducing the number of objects in the scene;

aiming at a certain grabbing posture, horizontally pushing a certain distance from the periphery to the center direction, and changing the position of an object so as to change a grabbing scene;

aiming at a certain grabbing posture, carrying out grabbing action and then placing the grabbing posture above the center of the scene to enable the grabbing posture to fall freely, so that the grabbing scene is changed;

and randomly selecting one of the modes for changing the grabbing scene, continuing to acquire multi-view data if a certain grabbing gesture is successfully executed, randomly selecting from the remaining modes for changing the grabbing scene if the certain grabbing gesture cannot be successfully planned, and ending the data acquisition of the current round if no grabbing gesture is executed finally.

Further, the specific way of generating the capture gesture by using the scene complete point cloud includes:

randomly sampling n points for any single-view point cloud C' in the scene complete point cloud C to obtain a point set P; aiming at each point in the point set P, acquiring a normal set of all points in a certain radius in the scene complete point cloud C through radius query, thereby establishing a local coordinate system;

aiming at each point in the point set P, acquiring a set P ' of all points in a certain radius in the scene complete point cloud C through radius query, rotating a local coordinate system around a y axis to acquire a grabbing coordinate system, converting each point in the set P ' from a world coordinate system to the grabbing coordinate system, and selecting points in the set P ' which meet the height range of a closed area of a gripper at the tail end of a robot to form a local area point set;

the method comprises the steps that a grabbing coordinate system is retreated for a certain distance along the direction of an x axis away from a grabbed object to obtain an initial grabbing posture position, a plurality of groups of gripper finger positions are arranged along the direction of a y axis, and when point cloud concentrated in a local area point collides with a gripper finger model and a closed area of two parallel fingers of a gripper contains the point cloud, the group of gripper finger positions are used as grabbing positions in the direction of the y axis in the grabbing posture of the point cloud;

selecting a central position from all the grabbing positions in the y-axis direction in the grabbing posture of the point cloud, then advancing to the x-axis direction in a fixed step length in a grabbing coordinate system until the point cloud collides with the two-finger gripper model, and taking the advancing position in the x-axis direction at the moment as the grabbing position in the x-axis direction in the grabbing posture of the point cloud.

Further, the training of the grasp posture prediction model further comprises:

performing quality scoring on the grabbing posture of each point cloud in the local area point set, wherein the quality scoring is calculated in the following mode:

finding the maximum value max and the minimum value min of the point cloud in the local area on the y axis relative to the grabbing gesture, respectively counting the points meeting the conditions that y is more than max-thr and y is less than min-thr as two groups of contact points, calculating the mean value of the position of each group of contact points as a proxy contact point, and thr represents a distance threshold;

according to a vector v formed by two proxy contact points, counting an angle theta between v and a normal line of each point in each group of contact points _y The number of the contact points is smaller than the preset angle value, if the number of the contact points is larger than the set value, the contact points are considered to meet the force closing condition when the friction cone angle is theta, and the minimum value theta of the angle of the left contact point meeting the force closing condition is obtained _left And the minimum value theta of the right contact point angle _right ；

Calculating a final grabbing posture quality evaluation score according to the following formula:

wherein, score _left Score of left contact point, score _right Score of right contact point, score _y The left and right contact points are connected with a score.

And calculating a final grabbing posture quality evaluation score, namely a real score in subsequent training.

Further, the training of the grasp posture prediction model further comprises:

training a grabbing attitude prediction network by using a grabbing attitude prediction data set, taking an error between an output grabbing attitude position prediction offset and a real offset as an offset loss function, taking an error between a unit vector predicting a grabbing attitude in an x direction and a real vector predicting the grabbing attitude in the x direction as an x-direction loss function, taking an error between a unit vector predicting the grabbing attitude in a y direction and a real vector predicting the grabbing attitude in the y direction as a y-direction loss function, taking an error between a predicted grabbing attitude width and a real grabbing attitude width as a mean square error loss function, and taking an error between a predicted grabbing attitude quality evaluation score and a real score as a quality evaluation loss function, and training the grabbing posture prediction network to converge by taking the minimum of an offset loss function, a loss function in the x direction, a loss function in the y direction, a mean square error loss function and a quality evaluation loss function as targets.

The predicted gripping posture position offset, the predicted gripping posture unit vector in the x direction, the predicted gripping posture unit vector in the y direction, the predicted gripping posture width and the predicted gripping posture quality evaluation score are all obtained by predicting a gripping posture prediction model, and the real offset, the real vector in the x direction, the real vector in the y direction, the real gripping posture width and the real score are obtained through correlation calculation.

Further, the quality assessment loss function is:

wherein i, j and k respectively represent that the predicted grabbing attitude quality evaluation score is larger than a set large score threshold value, the predicted grabbing attitude quality evaluation score is between a set small score threshold value and a set large score threshold value, and the predicted grabbing attitude quality evaluation score is smaller than a point set n of the set small score threshold value ₁ ，n ₂ ，n ₃ N represents the total number of point clouds in the local region point set, s _i ，s _j ，s _k Respectively representing the predicted grabbing attitude quality evaluation scores corresponding to i, j and k,

and respectively representing real scores corresponding to i, j and k, wherein the real scores are final grabbing attitude quality evaluation scores calculated according to a formula. The prediction scores are obtained by capturing the prediction model of the attitude.

Further, the point cloud coordinate in the clamping range of the robot end gripper meets the following conditions:

wherein, outer _ diameter is the outer width of the gripper, hand _ depth is the width of the closed area of the gripper, and hand _ height is the height of the closed area of the gripper. (x, y, z) are point cloud coordinates in the capture coordinate system.

According to another aspect of the present invention, there is provided a dual-stage mechanical arm grabbing planning system based on a multiplexing structure, including:

the grabbing attitude prediction model training module is used for changing a grabbing scene to acquire multi-view data, performing three-dimensional reconstruction on the multi-view data to acquire scene complete point cloud, generating grabbing attitudes by using the scene complete point cloud to form a grabbing attitude prediction data set, and training a grabbing attitude prediction network to converge by using the grabbing attitude prediction data set to obtain a grabbing attitude prediction model;

the grabbing posture evaluation model training module is used for taking point clouds which do not collide with the robot tail end clamp holder in the grabbing posture prediction data set and are located in the clamping range of the robot tail end clamp holder as a grabbing posture evaluation data set, and training a grabbing posture evaluation network to be convergent by using the grabbing posture evaluation data set to obtain a grabbing posture evaluation model;

the system comprises a grabbing planning module, a grabbing attitude estimation module and a control module, wherein the grabbing planning module is used for inputting single-view-point cloud of a scene to be grabbed into a grabbing attitude prediction model to obtain grabbing attitudes corresponding to each point in the point cloud to form a grabbing attitude set, inputting the grabbing attitudes in the grabbing attitude set into a grabbing attitude estimation model, sequencing the grabbing attitudes in the obtained grabbing attitude set according to quality scores and selecting K grabbing attitudes which are ranked at the top for guiding a mechanical arm to grab operation;

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

(1) according to the invention, real data are acquired in a self-supervision manner, a real grabbing attitude prediction and evaluation data set is constructed, and the grabbing strategy provided by the grabbing attitude prediction and evaluation model is more reasonable due to the dependence of the mechanical arm grabbing planning method on the existing object data set. The invention constructs a grabbing posture prediction and evaluation network based on a multiplexing structure, and improves the utilization rate of shallow network data by connecting the input of a current layer network with the output of a shallow network. And predicting a high-quality grabbing attitude through the grabbing attitude prediction model, and then performing secondary quality evaluation through the grabbing attitude evaluation model to improve the score prediction precision. The invention can guide the mechanical arm to carry out robust grabbing of unknown objects in the unstructured environment, and improves the grabbing success rate of the mechanical arm.

(2) The input of each subsequent layer of network is connected with the input data of the first layer of network except the first layer of network in the captured attitude prediction network, so that the performance of the set network is improved in various aspects compared with a PointNet + + basic network, the errors of an approach angle and a width are respectively reduced by 6.5% and 6%, and the forward propagation time increment of the network with the multiplexing structure is very small compared with the PointNet + + basic network.

(3) The input of each subsequent network except the first network is connected with the output of the previous network, the network performance of the multiplexing structure is improved greatly, the recall rate and the precision are improved by 2.4 percent and 1.8 percent respectively, and the network performance can be improved on the premise of ensuring the real-time performance.

(4) The invention changes the grabbing scene in various ways, thereby acquiring various data which accord with the reality, being better applied to the real grabbing environment and improving the grabbing success rate. The core idea of the grabbing attitude generation method is that the grabbing attitude x direction of a certain point in the point cloud is parallel to the normal vector of the surface of the point cloud, so that the method accords with human intuition and can generate high-quality grabbing attitude.

(5) The invention simplifies the complex force closing condition into the angle between the left contact point and the right contact point and the normal vector and the angle between the connecting line of the left contact point and the right contact point and the y axis, and can efficiently evaluate the grabbing attitude quality. And performing collision detection on the predicted grabbing postures, filtering out low-quality grabbing postures, and then intercepting point clouds in a grabbing posture closed area.

(6) The invention improves the training precision through the constraint of various loss functions, so that the prediction result is more accurate. The high-quality and low-quality grabbing attitude samples in the scene point cloud are unbalanced, the high-quality occupation ratio is small, and the loss function improves the loss ratio of the high-quality grabbing attitude according to the proportion, so that the prediction precision of the high-quality grabbing attitude is improved.

Drawings

Fig. 1 is a flowchart of a two-stage mechanical arm grabbing planning method based on a multiplexing structure according to an embodiment of the present invention;

FIG. 2 is a point cloud obtaining method for a closed area of a gripper according to an embodiment of the present invention;

fig. 3 (a) is a schematic diagram of a first multiplexing structure provided in the embodiment of the present invention;

fig. 3 (b) is a schematic diagram of a second multiplexing structure provided in the embodiment of the present invention;

fig. 4 is a diagram of a grab attitude prediction network structure according to an embodiment of the present invention;

fig. 5 is a diagram of a grab pose estimation network structure according to an embodiment of the present invention;

FIG. 6 is a schematic view of a gripper two-finger model provided by an embodiment of the present invention;

FIG. 7 (a1) is a color diagram of a single-object scene according to an embodiment of the present invention;

fig. 7 (b1) is a point cloud diagram of a single-object scene provided by an embodiment of the present invention;

fig. 7 (c1) is a diagram illustrating the result of the single-object scene grabbing gesture according to the embodiment of the present invention;

FIG. 7 (a2) is a color diagram of a multi-object scene provided by an embodiment of the present invention;

fig. 7 (b2) is a cloud image of multiple object scene points provided by the embodiment of the present invention;

fig. 7 (c2) is a diagram illustrating the multi-object scene capture pose result provided by the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

As shown in fig. 1, a dual-stage mechanical arm grabbing planning method based on a multiplexing structure includes:

inputting a scene single view point cloud to be grabbed into a grabbing gesture prediction model to obtain grabbing gestures corresponding to each point in the point cloud to form a grabbing gesture set, inputting the grabbing gestures in the grabbing gesture set into a grabbing gesture evaluation model, sequencing the grabbing gestures in the obtained grabbing gesture set according to quality scores, and selecting K grabbing gestures ranked at the top for guiding a mechanical arm to grab;

the grabbing posture prediction model is obtained by training in the following way:

Specifically, the push strategy includes:

aiming at a certain grabbing posture, a grabbing task is executed, the number of objects in a scene is reduced, and therefore the grabbing scene is changed; aiming at a certain grabbing posture, horizontally pushing a certain distance from the periphery to the center direction, and changing the position of an object so as to change a grabbing scene; aiming at a certain grabbing posture, carrying out grabbing action and then placing the grabbing posture above the center of the scene to enable the grabbing posture to fall freely, so that the grabbing scene is changed;

the pushing strategy is selected randomly during final execution, multi-view data acquisition is continued if a certain grabbing gesture is successfully executed, random selection is performed from the rest strategies if planning is not successful, and data acquisition of the current round is finished if no grabbing gesture can be executed;

the multi-perspective data includes a camera pose and a depth image.

The three-dimensional reconstruction method comprises the following steps:

(1) building a cuboid bounding box: in the region where three-dimensional reconstruction is required, bounding boxes with a length, width and height of L, W, H are established.

(2) Voxelization: dividing a cuboid bounding box into L side lengths _υ Can be divided into small squares, i.e. voxel units

And (4) each voxel.

(3) Obtaining depth image depth of one frame _i And corresponding camera pose

(4) Taking a voxel g from the bounding box, converting the voxel g to a point p below the world coordinate system, and calculating the position of the point p in the camera coordinate system

And finally, back-projecting to a corresponding pixel point x in the depth image according to the camera internal parameter K.

(5) If the depth value at point x is val (x) and the distance from point v to the origin of the camera coordinate system is di v (v), the tsdf value of voxel g can be calculated according to the following formula:

where sdf (g) is the sign distance function value and u is the truncation distance.

(6) The weight of voxel g is calculated according to the formula w (g) cos (θ)/di upsilon (upsilon), where θ is the angle between the projection ray and the normal vector of the p-point surface. And (4) repeating the steps (4) to (6) until all the voxels are traversed.

(7) Fusing the current frame TSDF, the weight W, the global TSDF and the weight W according to the following formula:

(8) and (5) repeating the steps (3) to (7) until all depth images are traversed, and outputting the complete scene point cloud by using a ray projection method according to the final TSDF model.

By utilizing the characteristic of high positioning precision of the mechanical arm, the obtained camera pose precision is higher, the problem of poor camera pose estimation precision in three-dimensional reconstruction is solved, and then the three-dimensional reconstruction is carried out on the multi-view depth map and the camera pose by using a TSDF method.

The method for generating the grabbing gesture according to the scene complete point cloud comprises the following steps:

(1) preprocessing an input point cloud: aiming at the complete point cloud C, the specific process is as follows: 1) random sampling; 2) removing invalid points in the point cloud; 3) removing points which are not in the set space; 4) voxelization point cloud; 5) calculating a point cloud surface normal; 6) redefining the normal. And aiming at the single-view point cloud C', randomly sampling n points in the complete point cloud process to obtain a point set P.

(2) Local coordinate system calculation: aiming at each point in the point set P, acquiring a normal set of all points in a certain radius in the complete point cloud C by a radius query method

Where n represents the number of dots. Taking N as an input, the calculation formula of the local coordinate system of the point is as follows:

the method comprises the following steps that eigen upsilon alue is a characteristic value calculation function, eigen upsilon ector is a characteristic vector calculation function, max and min are maximum and minimum functions respectively, index is an index function, abs function is to ensure that the direction of the vector is the same as the direction of a normal line, M is a square matrix, va is a characteristic value, and xe is a characteristic vector.

(3) Capturing a point cloud of the attitude area, and intercepting: aiming at each point in the point set P, acquiring a set of all points in a certain radius in the complete point cloud C by a radius query method

Taking P' and the coordinate system obtained in the step (2) as input, and the specific process is as follows: 1) rotating the coordinate system by 180 degrees around the y axis to obtain a grabbing coordinate system; 2) transforming a coordinate system, namely transforming P' from a world coordinate system to a grabbing coordinate system; 3) satisfies in interception P

And (5) obtaining a local area point set by the conditional points.

(4) Evaluation of gripper finger placement position: the specific process is as follows: 1) retreating the coordinate system along the x-axis direction for a certain distance to obtain an initial position of a grabbing posture; 2) a plurality of groups of gripper finger positions are arranged along the y-axis direction; 3) and (3) evaluating whether two finger positions of the gripper are established according to two conditions of whether the point cloud in the local area point set collides with the two finger models and whether the closed area contains the point cloud, and if the two finger positions are not established, indicating that the point position is not feasible to be grabbed.

(5) Calculating the grabbing attitude position: the method mainly aims to obtain a suitable three-dimensional coordinate of the grabbing posture, and the specific process is as follows: 1) adjusting in the y direction, and selecting a central position from all the two positions meeting the conditions in the step (4); 2) and adjusting the x direction, and advancing to the x axis direction by step length until the point cloud in the local area point set collides with the two finger models.

(6) Calculating the width of the grabbing gesture: and (3) acquiring a closed area point cloud by using a holder closed area point cloud acquisition method, then counting the maximum value and the minimum value of all point clouds in the y-axis direction, wherein the absolute value of the difference is the width.

(7) And (3) evaluating the quality of the grabbing posture: aiming at each grabbing gesture, a closed area point cloud is obtained by using a gripper closed area point cloud obtaining method, and then quality evaluation is carried out on the grabbing gestures by using a force closure-based grabbing gesture quality evaluation method.

The grasping posture quality evaluation method based on force closure comprises the following steps:

(1) counting left and right contact points: firstly, finding the maximum value max and the minimum value min of the point cloud in the local point cloud set on the y axis, respectively counting the points meeting the conditions that y is more than max-thr and y is less than min-thr, and then calculating the mean value of the positions of two groups of contact points as a proxy contact point. thr represents a distance threshold.

(2) Calculating the contact point angle: according to a vector v formed by two proxy contact points, counting the number of each group of contact points, wherein the number of the contact points satisfies that the angle between v and the normal of each point is smaller than a certain fixed value theta, if the number is larger than a set value min _ num, the contact point is considered to satisfy a force closing condition when the friction cone angle is theta, and finally solving the minimum theta satisfying the condition _min A value, the value being associated with a quality assessment score.

(3) Calculating the fraction: calculating a final grabbing posture quality evaluation score according to the following formula:

wherein theta is _left 、θ _right 、θ _y The minimum value of the angle of the left contact point, the minimum value of the angle of the right contact point and the angle value of the y axis of the v and coordinate system are respectively.

As shown in fig. 2, the method for capturing the point cloud in the area of the gesture prediction data set includes:

(1) and (3) coordinate system transformation: because the point cloud and the grabbing postures are under a world coordinate system, the grabbing postures are various, the point cloud of a closed area cannot be conveniently intercepted, and the point cloud needs to be converted into the grabbing posture coordinate system;

(2) collision detection: performing collision detection according to the gripper two-finger model;

(3) point cloud interception: intercepting the point clouds without collision according to the following formula conditions to obtain original local point clouds to form an original local point cloud set;

(4) resampling: and randomly sampling the original local point cloud set to a fixed number of point clouds to form a capture attitude evaluation data set.

As shown in fig. 3 (a), the first multiplexing structure processing manner is: except for the first layer of network, the input of each subsequent layer of network is connected with the input data, so that the input data can be repeatedly utilized by each layer of network.

As shown in fig. 3 (b), the second multiplexing structure processing manner is: except for the first layer of network, the input of each subsequent layer of network is connected with the output of the previous layer, so that each layer of network can repeatedly utilize all previous data.

As shown in FIG. 4, the grab pose prediction network outputs include Offset Block, Approach Block, Close Block, Width Block, and Score Block, with the following implications and penalty functions:

the Offset Block aims at predicting the grabbing attitude space Offset, and the specific formula is as follows:

wherein

Is a certain in the point cloudDot

The predicted offset amount of the grasping posture position of (1),

is pc _i The true offset of (c). The offset is the offset between the gripping posture position and the gripped object position.

The purpose of the Approach Block is to predict a unit vector of the grabbing attitude in the x direction, and to make the unit vector as close as possible to a real vector, so that the loss function can be designed to calculate the angle difference between the predicted vector and the real vector, and the smaller the angle difference is, the better the angle difference is. The specific formula is as follows:

wherein

Is the predicted vector of the grabbing posture of a certain point in the x direction,

is an Approach real unit vector, and norm is a vector normalization function.

The purpose of the Close Block is to predict a unit vector in the y direction of the grabbing attitude, and the specific formula is as follows:

wherein

Is a prediction vector of a certain point in the y direction of the grabbing posture,

is the true vector in the y direction.

The loss function of the Width Block is mean square error loss, and the specific formula is as follows:

wherein

For a predicted grasp pose width at a point,

the width of the gesture is actually grabbed. Similar to the Offset Block,

needs to be normalized so that w _i Has a value range of [0, 1 ]]。

Score Block predicts the grabbing attitude quality evaluation Score of each point in the point cloud, and the specific formula is as follows:

wherein i, j and k respectively represent point set indexes with prediction scores of s being more than 0.7, s being more than 0.01 and less than 0.7 and s being less than 0.01, and n represents the total number of point clouds.

As shown in fig. 5, the grab pose evaluation network outputs Score Block, meaning that the grab pose quality evaluation Score of each point in the predicted point cloud, and the loss function is as follows:

wherein s is _i Representing the grasping attitude quality evaluation score of the grasping attitude evaluation network prediction corresponding to the intercepted point cloud i,

the real score corresponding to i is expressed, and the real score is calculated according to a formulaAnd (4) finally obtaining the gesture quality evaluation score.

Table 1 captured pose prediction network training results with multiplexing structure introduced

The point cloud network models in fig. 4 and 5 are multiplexing structures. In order to effectively evaluate the effectiveness of the multiplexing structure provided by the present invention, the multiplexing structure 1 and the multiplexing structure 2 are respectively introduced into a capture attitude prediction network based on PointNet + +, and the experimental results are shown in table 1, and can be analyzed from the data: 1) compared with the PointNet + + basic network performance, the network introduced with the multiplexing structure is improved in all aspects, wherein the overall performance introduced with the multiplexing structure 1 is the best, the angle and width errors of the apreach are respectively reduced by 6.5% and 6%, and the improvement in other aspects is more general; 2) the forward propagation time of the multiplexing structure network introduced with additional model parameters is increased compared with the forward propagation time of the PointNet + + basic network, and the timeliness of the multiplexing structure 2 introduced with more parameters is slightly worse than that of the multiplexing structure 1 introduced with less parameters, but the difference is only about 10 ms.

In summary, the invention selects the PointNet + + network with the multiplexing structure 1 as the capture attitude prediction network.

Table 2 captured pose estimation network training results with introduction of multiplexing structure

Experimental methods	Recall rate	Accuracy of measurement	Fractional error	Time/ms
					PointNet	0.822	0.835	0.093	2.2
Multiplexing structure 1	0.839	0.841	0.094	2.3
					Multiplexing structure 2	0.846	0.853	0.091	2.4

In order to effectively evaluate the effectiveness of the multiplexing structure provided by the invention, a multiplexing structure 1 and a multiplexing structure 2 are respectively introduced into a grabbing posture evaluation network based on PointNet, the experimental results are shown in Table 2, and the following data can be analyzed: 1) in the aspect of network performance, the network performance of the introduced multiplexing structure is improved, wherein the network introduced with the multiplexing structure 2 is improved to the maximum extent, and the recall rate and the precision are respectively improved by 2.4 percent and 1.8 percent; 2) in the aspect of real-time performance, similar to the grabbing posture prediction network, the PointNet, the multiplexing structure 1 and the multiplexing structure 2 are sequentially arranged from high to low, but the maximum difference is only 0.2 ms.

In summary, the PointNet network incorporating the multiplexing structure 2 is preferably selected as the grab pose estimation network.

As shown in FIG. 6, the unshaded area is the gripper two-finger model body, and the shaded area is the closed area of the gripper two parallel fingers. The gripper two-finger model can be described by four parameters (hand _ depth, hand _ height, hand _ width, finger _ width), wherein hand _ depth is the width of the gripper closing region, hand _ height is the height of the gripper closing region, hand _ width is the length of the gripper closing region, and finger _ width is the thickness of the gripper two fingers.

outer _ diameter is the outer width of the holder, equal to hand _ width +2 finger _ width

And backing a certain distance to represent hand _ depth-init _ bit, wherein init _ bit is the initial depth of the holder.

Pushing a certain distance represents any distance of the capture pose to the scene center.

In the embodiment of the present invention, finger _ width is 0.01, hand _ outer _ diameter is 0.11, hand _ depth is 0.06, hand _ height is 0.02, and init _ bit is 0.01.

Example 1

A dual-stage mechanical arm grabbing planning method based on a multiplexing structure comprises the following steps:

the control mechanical arm adopts a pushing strategy to collect multi-view data, and three-dimensional reconstruction is carried out to obtain complete point cloud of a scene;

generating a grabbing attitude according to the scene complete point cloud, acquiring a grabbing attitude prediction data set, intercepting the point cloud in the area of the grabbing attitude prediction data set, and acquiring a grabbing attitude evaluation data set;

training a grabbing posture prediction and evaluation network based on a multiplexing structure according to the grabbing posture prediction and evaluation data set to obtain a grabbing posture prediction and evaluation model;

and taking the single-view point cloud of the scene to be grabbed as an input grabbing attitude prediction, inputting a prediction result into a grabbing attitude evaluation model, and acquiring a group of high-quality grabbing attitudes for guiding the mechanical arm to grab.

The three-dimensional reconstruction method comprises the following steps:

(1) building a cuboid bounding box: and in the area needing three-dimensional reconstruction, bounding boxes with the length, width and height of 40cm, 40cm and 20cm are established respectively.

(2) Voxelization: the cuboid bounding box is divided into small grids with the side length of 2mm, namely voxel units, and a total number of 4000000 voxels can be divided.

(3) Obtaining depth image depth of one frame _i And corresponding camera pose

(5) If the depth value at point x is di upsilon (upsilon) which is the distance from point upsilon to the origin of the camera coordinate system, the tsdf value of voxel g can be calculated according to the following formula:

wherein sdf (g) is a value of a symbol distance function, and u is 0.01 m.

(1) preprocessing an input point cloud: aiming at the complete point cloud C, the specific process is as follows: 1) random sampling; 2) removing invalid points in the point cloud; 3) removing points which are not in the set space; 4) voxelization point cloud; 5) calculating a point cloud surface normal; 6) redefining the normal. For the single-view point cloud C', 2048 points are randomly sampled in the process of the complete point cloud to obtain a point set P.

the eigen upsueue is a characteristic value calculation function, the eigen upsuecter is a characteristic vector calculation function, max and min are maximum and minimum functions respectively, index is an index function, and the abs function ensures that the direction of a vector to be calculated is the same as the direction of a normal.

Taking the coordinate system obtained in the step (2) and the P' as input, and the specific process is as follows: 1) rotating the coordinate system by 180 degrees around the y axis to obtain a grabbing coordinate system; 2) transforming a coordinate system, namely transforming P' from a world coordinate system to a grabbing coordinate system; 3) satisfies in interception P

And (5) obtaining a local area point set by the conditional points.

(4) Evaluation of gripper finger placement position: the specific process is as follows: 1) retreating the coordinate system along the x-axis direction by 0.05m to obtain an initial position of a grabbing posture; 2) 5 groups of gripper two-finger positions are arranged along the y-axis direction; 3) and (3) evaluating whether two positions of the gripper are established according to two conditions of whether the point cloud collides with the two-finger model and whether the closed area contains the point cloud, and if 5 groups of the two positions are not established, indicating that the point position is not feasible to be grabbed.

(5) Calculating the grabbing attitude position: the method mainly aims to obtain a suitable three-dimensional coordinate of the grabbing posture, and the specific process is as follows: 1) adjusting in the y direction, and selecting a central position from all the two positions meeting the conditions in the step (4); 2) and adjusting in the x direction, and advancing in the x axis direction by the step size of 0.005m until the point cloud collides with the two finger models.

(6) Calculating the grabbing attitude width: and (3) acquiring a closed area point cloud by using a holder closed area point cloud acquisition method, then counting the maximum value and the minimum value of all point clouds in the y-axis direction, wherein the absolute value of the difference is the width.

Further, the grabbing posture quality evaluation method based on force closure is as follows:

(1) counting left and right contact points: firstly, finding the maximum value and the minimum value on the y axis, respectively counting the points meeting the conditions that y is more than max-0.003 and y is less than min-0.003, and then calculating the mean value of the positions of two groups of contact points as proxy contact points.

(2) Calculating the contact point angle: according to a vector v formed by two proxy contact points, counting the number of each group of contact points, wherein the angle between the contact points and the normal line of each point is smaller than a certain fixed value theta, if the number is larger than 2, the contact point is considered to meet a force closing condition when the friction cone angle is theta, and finally solving the minimum theta meeting the condition _min A value, the value being associated with a quality assessment score.

The method for intercepting and capturing the regional point cloud of the attitude prediction data set comprises the following steps:

(2) collision detection: performing collision detection according to the gripper simplified model;

(3) point cloud interception: intercepting the point cloud according to the following formula conditions to obtain an original local point cloud;

(4) resampling: 256 size point clouds were randomly sampled.

The deep learning-based two-stage grabbing planning method comprises the following steps:

(1) pretreatment: the method comprises the steps of point cloud coordinate transformation, voxelization down-sampling point cloud, working space filtering point cloud and proportional sampling point cloud;

(2) and (3) predicting the grabbing attitude: inputting the preprocessed single view point cloud into an improved grabbing attitude prediction model to obtain a grabbing attitude corresponding to each point in the point cloud;

(3) and (3) post-treatment: the method comprises the steps of filtering and grabbing postures of a working space, filtering and grabbing postures in a direction, clustering the grabbing postures and sequencing the grabbing postures;

(4) and (3) calculating the grabbing attitude: recalculating the gripping pose of the post-processed high-quality gripping pose to replace the predicted gripping pose, wherein the calculation method is a simplified version of the method in claim 3, and the gripping pose quality score labeling process is reduced compared with the original version;

(5) and (3) evaluation of grabbing postures: for each grabbing gesture, firstly intercepting local point clouds of the grabbing gestures and inputting the local point clouds into a grabbing gesture evaluation model to obtain a high-quality grabbing gesture set after reevaluation;

(6) and (4) quality sorting and selecting, sorting according to the quality scores and selecting 5 grabbing postures which are ranked at the top.

As shown in fig. 7 (a1), fig. 7 (b1), fig. 7 (c1), fig. 7 (a2), fig. 7 (b2), and fig. 7 (c2), the grab planning method provided by the present invention can work effectively in both single-item and multi-item scenarios, and has high robustness. In addition, the method can effectively filter out the areas crowded with objects according to the distribution of the objects in the scene, and preferentially search for the independent objects beneficial to grabbing, so that a grabbing operation space is reserved for the mechanical arm end effector, and the grabbing habit of people is met.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A dual-stage mechanical arm grabbing planning method based on a multiplexing structure is characterized by comprising the following steps:

2. The method for planning grabbing of a dual-stage mechanical arm based on a multiplexing structure of claim 1, wherein the inputs of each subsequent layer except the first layer of network in the grabbing attitude prediction network are connected with the input data of the first layer of network.

3. The method for planning grabbing of a dual-stage mechanical arm based on a multiplexing structure of claim 1 or 2, wherein the inputs of each subsequent layer except the first layer of network in the grabbing attitude assessment network are connected with the outputs of the previous layer of network.

4. The method for planning grabbing of a dual-stage mechanical arm based on a multiplexing structure of claim 1 or 2, wherein the multi-view data comprises camera pose and depth images, and is acquired by:

5. The method for planning grabbing of a dual-stage mechanical arm based on a multiplexing structure as claimed in claim 1 or 2, wherein the specific way of generating the grabbing gesture by using the scene complete point cloud comprises:

6. The method for planning the grabbing of a dual-stage manipulator based on a multiplexing structure of claim 5, wherein the training of the grabbing pose prediction model further comprises:

7. The method for planning the grabbing of a dual-stage manipulator based on a multiplexing structure of claim 6, wherein the training of the grabbing pose prediction model further comprises:

8. The method for planning grabbing of a dual-stage robotic arm based on a multiplexing structure of claim 7, wherein the quality assessment loss function is:

and respectively representing real scores corresponding to i, j and k, wherein the real scores are final grabbing attitude quality evaluation scores calculated according to a formula.

9. The method for planning grabbing of a dual-stage mechanical arm based on a multiplexing structure as claimed in claim 1 or 2, wherein the point cloud coordinates in the clamping range of the robot end gripper satisfy the following conditions:

wherein, outer _ diameter is the outer width of the gripper, hand _ depth is the width of the closed area of the gripper, and hand _ height is the height of the closed area of the gripper.

10. A dual-stage mechanical arm grabbing planning system based on a multiplexing structure is characterized by comprising: