CN114851201A - Mechanical arm six-degree-of-freedom vision closed-loop grabbing method based on TSDF three-dimensional reconstruction - Google Patents

Mechanical arm six-degree-of-freedom vision closed-loop grabbing method based on TSDF three-dimensional reconstruction Download PDF

Info

Publication number
CN114851201A
CN114851201A CN202210551360.5A CN202210551360A CN114851201A CN 114851201 A CN114851201 A CN 114851201A CN 202210551360 A CN202210551360 A CN 202210551360A CN 114851201 A CN114851201 A CN 114851201A
Authority
CN
China
Prior art keywords
grabbing
mechanical arm
camera
voxel
coordinate system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210551360.5A
Other languages
Chinese (zh)
Other versions
CN114851201B (en
Inventor
欧林林
徐靖
禹鑫燚
周利波
魏岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202210551360.5A priority Critical patent/CN114851201B/en
Publication of CN114851201A publication Critical patent/CN114851201A/en
Application granted granted Critical
Publication of CN114851201B publication Critical patent/CN114851201B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1679Programme controls characterised by the tasks executed
    • B25J9/1692Calibration of manipulator
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Abstract

The invention relates to a mechanical arm six-degree-of-freedom vision closed-loop grabbing method based on TSDF three-dimensional reconstruction. The method comprises the following steps: the method comprises the following steps: calibrating a mechanical arm base coordinate system and a camera coordinate system by a Zhang-Yongyou camera calibration method and an ArUco Markers calibration method; step two: performing three-dimensional scene reconstruction on the acquired image information by using a TSDF function so as to reduce environmental noise points between objects; step three: establishing a reinforcement learning network model; step four: reversely projecting the predicted grabbing attitude of the end effector to a three-dimensional reconstruction scene, and judging the grabbing quality predicted this time; step five: the grabbing movement of the mechanical arm is completed through the forward and backward kinematics of the robot; step six: performing reinforcement learning model training to enable the mechanical arm to complete the grabbing action; the invention overcomes the defects of the prior art, provides the mechanical arm six-degree-of-freedom vision closed-loop grabbing system which is easy to realize and high in applicability and is based on TSDF three-dimensional reconstruction, reduces environmental noise caused by shielding and stacking among objects, and reduces depth errors caused by interference of a single vision sensor. In addition, the system can realize quick and real-time target detection and complete the grabbing action while ensuring high precision.

Description

Mechanical arm six-degree-of-freedom vision closed-loop grabbing method based on TSDF three-dimensional reconstruction
Technical Field
The invention belongs to a technology for grabbing an object by a mechanical arm with six degrees of freedom in an unstructured environment, and particularly belongs to the field of robot grabbing control.
Background
Object grabbing has always been an important problem in robotics, but there has been no satisfactory corresponding solution. Due to the multifunctional control system of the robot arm, the end effector with high freedom degree can be flexibly controlled in three-dimensional space, and the smart grabbing of objects and the dynamic response capability to the environment change are realized. Recently, with the rapid development of deep learning and reinforcement learning and the construction of a corresponding system, a plurality of feasible ideas are provided for the intelligent mechanical arm grabbing mode.
Although the mechanical arm 6 has high practical value in the grabbing control with freedom degrees and is suitable for more complex operating environments, most of the current data-driven grabbing algorithms only carry out grabbing from top to bottom (4 Dof: x, y, z, yaw) in a simple desktop setting or utilize physical analysis to grasp a proper grabbing posture. However, due to the limitation of three-dimensional motion grabbing, the algorithm has great limitation to the application scene, and the mechanical arm end effector can only approach the object from the vertical downward direction, and in some cases, cannot grab the object along the direction. For example, it is difficult for the gripper to grasp a horizontally placed plate. In addition, the methods based on physical analysis not only need to calculate a large amount of experimental environment data, but also need to calculate and estimate an accurate target model, so that a large amount of time and calculation cost are inevitably generated, and the physical model of an unstructured target object is often not universal, so that most of algorithms are difficult to be directly applied to grabbing of novel objects, and the robustness and the generalization of the system are poor.
Thus, the mechanical arm 6 degree of freedom (6-DOF) grasping idea is proposed. Although, the pointentgpd proposed by Hongzhuo Liang employs a two-step sampling-evaluation method, a large number of samples are evaluated to determine a reliable grab pose. However, this method is certainly quite time consuming. (Hongzhu Liang et al, "Detecting gram configurations from point sets". In:2019International Conference on Robotics and Automation (ICRA). IEEE.2019, pp.3629-3635). Florence et al perform gesture transitions from the existing grab gesture. However, the success rate of the algorithm is relatively low for the geometric shapes of the target objects and objects which are not recorded in the data set, so that the algorithm cannot be popularized to a new application scene. (Peter Florence, Lucas Manueli, and Russ Tedrake. "Dense Object Nets: Learning Dense Visual Object Descriptors By and For robotics management". In: Conference on Robot Learning (CoRL) (2018)). Recently, a point cloud scene is reconstructed from a partial view acquired by an RGBD camera, and then after object features are extracted by using a designed Pointernet + + network model, 6Dof grabbing posture regression is directly completed and grabbing planning is performed on the reconstructed point cloud scene. (P.Ni, W.Zhang, X.Zhu, and Q.Cao, "Learning an end-to-end spatial gradient generation and refining algorithm from simulation," Machine Vision and Applications, vol.32, No.1, pp.1-12,2021.).
Disclosure of Invention
The invention overcomes the defects of the prior art and provides the six-degree-of-freedom grabbing method of the mechanical arm, which is easy to implement and high in applicability. The invention designs a feature extraction network and a Policy Gradient reinforcement learning framework, and can realize positioning and six-degree-of-freedom plan grabbing of unrecorded objects in a training data set.
The method takes an image sequence as input, and firstly constructs a mechanical arm coordinate system and a camera coordinate system through a Zhang Yongyou camera calibration method and an Aruco Markers calibration method. Color and depth information of the capture stage is then acquired using an Intel Rrealsense D435 depth camera. And performing three-dimensional scene reconstruction on the acquired image information through a Truncated Signaled Distance Function (TSDF) so as to reduce environmental noise points between objects. Intercepting a two-dimensional overlook image of a reconstructed scene from top to bottom, taking the two-dimensional overlook image as input of a neural network model for capturing in the text design, coding and characteristic extraction are carried out on the input image by the model, and a predicted capture area and a mechanical arm capture posture of a current environment are input. The structure of the network model is shown in figure 1. The model adopts a Renet full convolution feature extraction network architecture to learn to extract unstructured features from RGB-D partial observation. The model channel-connects color features and depth features, using 2 additional Batch Normalization (BN), Nonlinear activation functions (ReLU), and 1 × 1 conditional Layers interlaces. And then, aggregating feature embedding of the same pixel unit by using Average Pooling (Average Pooling), and performing linear up-sampling to enable convolution features to be shared in different positions and directions, wherein output pictures are consistent with input resolution, selecting coordinates of a hotspot pattern block with the maximum grabbing probability, corresponding the coordinates to a hemispherical three-dimensional grabbing gesture point (see figure 2) defined before an experiment, and taking the corresponding 6-dimensional gesture as a predicted 6-degree-of-freedom grabbing gesture. And then, back projecting the predicted grabbing attitude input by the model into a reconstructed three-dimensional scene, intercepting a depth image under the current attitude from the three-dimensional scene by using a TSDF function, and judging the grabbing quality at this time according to the depth information at the two ends of the clamper. And finally, when the depth of the target coordinate point in the rendered image is less than or equal to the depth (15cm) of the finger tips at the two ends of the gripper, inputting the predicted gesture into the positive kinematics of the robot to obtain the corresponding mechanical arm motion track, and finishing the current grabbing action of the mechanical arm. If the crawling is successful, the reward of the agent is assigned to be 1, otherwise, the reward of the agent is assigned to be 0. The system grabbing flow chart is shown in figure 3.
The invention relates to a mechanical arm six-degree-of-freedom vision closed-loop grabbing method based on TSDF three-dimensional reconstruction, which comprises the following specific steps of:
step 1: calibrating a mechanical arm base coordinate system and a camera coordinate system by a Zhang friend camera calibration method and an ArUco Markers calibration method:
firstly, the Intel D415 depth camera is vertically fixed at the tail end of the mechanical arm, so that the camera and the clamping jaw of the end effector are in fixed posture transformation, and the binocular camera can acquire image information of objects on the grabbing operation table. Then, calibrating the internal reference and the external reference of the camera by a Zhang Zhengyou camera calibration method and an Aruco Markers calibration method, and constructing a three-dimensional coordinate system with the mechanical arm base as an origin. The calibration formulas of the internal reference and the external reference of the camera are as follows:
Figure BDA0003650139520000041
Figure BDA0003650139520000042
Figure BDA0003650139520000043
wherein x is w ,y w ,z w Three-dimensional coordinates under a world coordinate system; x is the number of c ,y c ,z c The coordinates are coordinates under an image coordinate system; u and v are pixel coordinate systems; z is a radical of m Is a point in the assumed camera coordinate system; p is an imaging point in an image coordinate system; f is an imaging distance; d x ,d y The coordinates of the pixel points in the image coordinate system are obtained; r and T are a rotation matrix and a translation matrix. Equation (1) is a world coordinate system to camera coordinate system equation. Formula (2) is a formula from a camera coordinate system to an ideal image coordinate system. Equation (3) is a camera coordinate system to pixel coordinate system matrix, i.e., an internal reference matrix.
Step 2: and (3) carrying out three-dimensional scene reconstruction on the acquired image information by using a TSDF function so as to reduce the environmental noise points between the objects:
step 2.1: in an actual experiment, a single sensor is adopted to obtain visual information, and information (color information and depth information) captured by a binocular camera on a grabbing operating platform is obtained through OpenCV.
Step 2.2: due to light, mutual shielding of objects and other environmental factors, a large noise point exists in a depth image acquired by a sensor, and therefore the capturing and prediction of the network model on the image are influenced. Firstly, spherical point sampling is carried out on a grabbing operation table before grabbing each time, and depth errors in a single image are reduced by sampling local images for multiple times. Secondly, the sampling point posture and the sampling image information are fused into a three-dimensional reconstruction voxel grid by utilizing a TSDF function. For a single voxel, not only the intuitive x, y, z coordinates are contained, but also two other values (SDF and CAM) are used to represent the voxel-to-surface distance. Where SDF is a signed distance function for depth map fusion, which uses 0 for the surface, positive values for points outside the model, negative values for points inside the model, and a value proportional to the distance of the point to the nearest surface. The surface is easily extracted as the zero crossing point of the function.
Step 2.3: the coordinate system where the voxel grid is located is defined as a world coordinate system, and the voxel coordinate can be expressed as
Figure BDA0003650139520000051
Pose of the camera is
Figure BDA0003650139520000052
The camera's reference matrix is K.
Figure BDA0003650139520000053
Therefore, the voxel grid in the world coordinate system is subjected to transformation matrix obtained by ICP registration
Figure BDA0003650139520000054
Projecting the image data to a camera coordinate system, and converting the image data to an image coordinate system I according to a camera internal reference matrix K ij Wherein i ∈ (0,1,..., n), j ∈ (0,1,..., m).
Figure BDA0003650139520000055
After projecting the voxel grid to the image coordinate system, it is first necessary to calculate the initial SDF value for each voxel, as shown in equation (18)
Figure BDA0003650139520000056
In the formula
Figure BDA0003650139520000057
Indicating the position information of the jth voxel in the world coordinate system,
Figure BDA0003650139520000058
indicates the position information of the ith camera position in the world coordinate system, dep (I) ij ) And the depth value of the jth voxel in the ith camera depth image coordinate system is represented.
The sdf value for each voxel is then truncated according to equation (19).
Figure BDA0003650139520000059
Wherein tsdf is i For voxel truncation distance value, trunc represents artificially set truncation distance, which can be understood as an error value of camera depth information, if the error is larger, the value is set to be larger, otherwise, much information acquired by the depth camera may be lost, and the item trunc is set to be 1.
After truncating the sdf value for each voxel, the tsdf value for each voxel is recalculated according to equation (20):
Figure BDA0003650139520000061
wherein ω is j (x, y, z) is the weight of the voxel in the current frame global voxel grid, ω j-1 (x, y, z) is the weight of the voxel in the voxel grid of the previous frame, tsdf j For the distance of the voxel in the current frame global data cube to the object surface, tsdf j-1 Is the distance, V, from the voxel in the last frame of global data cube to the surface of the object .z Denotes the z-axis coordinate of the voxel in the camera coordinate system, and D (u, v) denotes the depth value at the current frame depth image (u, v).
Equation (9) calculates a weight value for each voxel.
w j =w j-1 +w j (9)
And rendering the environment information of the grabbing operation table from top to bottom, and taking the environment information as the input of the network model.
And step 3: establishing a reinforcement learning network model:
step 3.1: designing a mechanical arm grabbing strategy, and defining an action space for reinforcement learning: the invention designs a two-dimensional convolution network to directly regress the 6-DoF grabbing posture by increasing the network dimension. The gripper defines the gripping action as shown in figure 2.
Step 3.2: converting each element in the array into an angle of rotation of the tail end of the mechanical arm around three coordinate axes of x, y and z respectively, wherein a specific conversion formula is as follows:
Figure BDA0003650139520000062
Figure BDA0003650139520000063
r z =(best_pix_ind[2]*θ) (12)
wherein a is x The rotation angle of the tail end of the mechanical arm around the x axis is represented as the roll angle of the end effector; b is a mixture of y The rotation angle of the tail end of the mechanical arm around the y axis is expressed, and the rotation angle is the pitch angle of the end effector; r is a radical of hydrogen z The rotation angle of the tail end of the mechanical arm around the z axis is represented as the yaw angle of the end effector;
Figure BDA0003650139520000064
deviation values were set for the experiments.
Step 3.3: designing a feature extraction network model: the model adopts a Resnet full convolution feature extraction network architecture to learn to extract unstructured features from RGB-D partial observation. The multilayer feature extraction Convolutional layer extracts the features of the color image and the depth image, and then channel-connects the color feature and the depth feature by using a Concat function, and keeps the resolution of the output thermodynamic diagram consistent with the input image by using 2 additional Batch Normalization (BN), Nonlinear activation functions (ReLU), 1 × 1 volumetric Layers, and upsampling. And selecting the corresponding gesture in the action space corresponding to the maximum vector value coordinate in the tensor map as the predicted grabbing gesture to be output.
Step 3.4: the network is forward-inferred by the following formula:
Figure BDA0003650139520000071
Figure BDA0003650139520000072
Figure BDA0003650139520000073
Figure BDA0003650139520000074
where equation (13) represents the expected return in state s, action a, where g t Indicating the grabbing action taken at time t, s t Indicating the state at time t, r t Representing the reward at time t; equation (14) represents the total reward function of the network; equation (15) is a state distribution function; equation (16) represents a state-action function.
Step 3.5, designing a reinforcement learning network loss function, and further improving the detection precision of the algorithm by adopting a calculation cross entropy function:
Figure BDA0003650139520000075
where τ is s 0 a 0 s 1 a 1 ...s n a n .., Markov process.
Due to the fact that
Figure BDA0003650139520000076
P r { a | s } ═ pi (s, a), so equation (18) can be obtained.
Figure BDA0003650139520000081
The weight update function is as follows:
Figure BDA0003650139520000082
Figure BDA0003650139520000083
Figure BDA0003650139520000084
wherein f is ω SxA → R is a pair
Figure BDA0003650139520000085
When f is an approximation function of ω When the minimum value Δ ω is 0, the formula (21) can be derived
Step 3.6: setting reinforcement learning reward: if the crawling is successful, the reward of the agent is assigned to be 1, otherwise, the reward of the agent is assigned to be 0.
And 4, step 4: and (3) reversely projecting the predicted grabbing attitude of the end effector to a three-dimensional reconstruction scene, and judging the grabbing quality of the prediction: and reversely projecting the 6-degree-of-freedom grabbing posture output by the network into a three-dimensional voxel scene established before grabbing, and intercepting a depth image under the predicted posture through the projection of a viewing pyramid, wherein the projection of the viewing pyramid is shown in an attached figure 4. Where the viewing pyramid is a three-dimensional volume whose position is related to the camera's external reference matrix, the shape of the viewing pyramid determines how the model is projected from the pixel space to the plane. Perspective projection uses a pyramid as the view frustum, with the camera located at the vertex of the pyramid. The pyramid is truncated by two planes, one behind the other, forming a Frustum of a pyramid called View cluster, the model of which is shown in fig. 5, wherein only the model located inside cluster is visible. Two planes are more specific in this clipping space, called near clip plane (near clip plane) and round clip plane (far clip plane), respectively.
The distance position between the parallel clamping jaw and the object is inferred by comparing the depth information of the two ends of the clamp in the rendered image, so that the grabbing quality of the predicted grabbing posture is judged, and the closed loop feedback of the robot arm 6-degree-of-freedom grabbing system is formed.
And 5: the grabbing and moving of the mechanical arm are completed through the forward and backward kinematics of the robot:
firstly, when the depth of the two ends of the gripper in the depth image rendered in the step 4 is less than or equal to the fingertip depth of the gripper, namely the predicted gesture of the gripper is positioned on the two sides of the object, the predicted gesture at this time is judged to be capable of being grabbed. And then, controlling the mechanical arm to execute a grabbing plan according to the predicted grabbing posture: and 6 joint angle degrees of the mechanical arm in the current state are solved through the inverse kinematics of the robot. And then inputting the predicted 6-dimensional grabbing posture of the tail end of the mechanical arm output by the strong learning network in the step 3 into positive kinematics of the robot, and solving the movement track of the end effector of the mechanical arm converted from the current posture to the predicted posture point. And when the end effector of the mechanical arm moves to the predicted grabbing posture, the robot sends a clamp closing signal to try to grab. After the gripper is closed, the end effector vertically moves upwards by 15cm, a depth image of the effector after moving upwards is obtained from an Intel Rrealsense D435 camera, and whether actual grabbing succeeds or not is judged by calculating the depth of two ends of the gripper. When the capturing is successful, the reinforcement learning model reward is assigned to be 1; and when the grabbing fails, the reinforcement learning network reward is assigned to be 0.
Step 6: and (3) performing reinforcement learning model training to ensure that the mechanical arm finishes the grabbing action:
and continuously repeating the steps 2 to 5 in a CoppelliaSim Edu simulation environment, reducing a loss function in the reinforcement learning model through reinforcement learning reward, and updating the model weight parameter. And finally, importing the trained weight parameters into a real mechanical arm UR5, carrying out experimental debugging, repeating the steps 1 to 5, and completing a six-degree-of-freedom closed-loop grabbing task of the mechanical arm.
In conclusion, the method has the advantage that the 6-dimensional grabbing posture of the mechanical arm is directly predicted and output by increasing the two-dimensional neural network dimension. Meanwhile, the invention carries out three-dimensional reconstruction on the complex grabbing environment through the TSDF function, reduces the environmental noise caused by shielding and stacking among objects, and reduces the depth error caused by the interference of a single vision sensor. Meanwhile, a reinforcement learning network is designed aiming at the method, the defects of complex calculation and high time cost of mechanical arm grabbing gesture deduction through traditional physics are overcome, and the problem that the mechanical arm 6-DOF grabbing gesture cannot be applied to target objects which are not recorded in a data set is solved. The method not only ensures the high grabbing success rate of the mechanical arm model, but also is beneficial to the generalization of reinforcement learning, namely the method can be applied to new grabbed objects, and solves the problems of time-consuming calculation of the traditional method and reduction of instability of the point cloud model of the input part. The invention realizes the real-time detection of the captured object and the function of 6-DOF capture.
Drawings
FIG. 1 is a block diagram of a network model in the present invention;
FIG. 2 is a diagram of the attitude angles of the hemispherical three-dimensional gripper defined by the present invention;
FIG. 3 is a flow chart of the system capture of the present invention;
FIG. 4 is a view of a viewing pyramid projection;
FIG. 5 is a View Frustum model diagram;
FIG. 6 is an exemplary graph of an experiment capture;
Detailed Description
The invention is further illustrated in the following with reference to the accompanying drawings.
The invention discloses a mechanical arm six-degree-of-freedom vision closed-loop grabbing method based on TSDF three-dimensional reconstruction, which is shown in the attached drawing 6 as an example of object grabbing and comprises the following specific processes:
step 1: calibrating a mechanical arm base coordinate system and a camera coordinate system by a Zhang friend camera calibration method and an ArUco Markers calibration method:
firstly, an Intel D415 depth camera is vertically fixed at the tail end of a mechanical arm, so that the camera and a clamping jaw of an end effector are in fixed posture transformation, and a binocular camera can acquire image information of an object on a grabbing operation table. Then, calibrating the internal reference and the external reference of the camera by a Zhang Zhengyou camera calibration method and an ArUco Markers calibration method, and constructing a three-dimensional coordinate system with the mechanical arm base as an origin. The calibration formulas of the internal reference and the external reference of the camera are as follows:
Figure BDA0003650139520000111
Figure BDA0003650139520000112
Figure BDA0003650139520000113
wherein x is w ,y w ,z w Three-dimensional coordinates under a world coordinate system; x is a radical of a fluorine atom c ,y c ,z c The coordinates are under an image coordinate system; u and v are pixel coordinate systems; z is a radical of m Is a point in the assumed camera coordinate system; p is an imaging point in an image coordinate system; f is an imaging distance; d x ,d y The coordinates of the pixel points in the image coordinate system are obtained; r and T are a rotation matrix and a translation matrix. Equation (1) is a world coordinate system to camera coordinate system equation. Formula (2) is a formula from a camera coordinate system to an ideal image coordinate system. Equation (3) is a camera coordinate system to pixel coordinate system matrix, i.e., an internal reference matrix.
Step 2: and (3) carrying out three-dimensional scene reconstruction on the acquired image information by using a TSDF function so as to reduce the environmental noise points between the objects:
step 2.1: in an actual experiment, a single sensor is adopted to obtain visual information, and information (color information and depth information) captured by a binocular camera on a grabbing operating platform is obtained through OpenCV.
Step 2.2: due to light, mutual shielding of objects and other environmental factors, a large noise point exists in a depth image acquired by a sensor, and therefore the capturing and prediction of the network model on the image are influenced. Firstly, spherical point sampling is carried out on a grabbing operation table before grabbing each time, and depth errors in a single image are reduced by sampling local images for multiple times. Secondly, the sampling point posture and the sampling image information are fused into a three-dimensional reconstruction voxel grid by utilizing a TSDF function. For a single voxel, not only the intuitive x, y, z coordinates are contained, but also two other values (SDF and CAM) are used to represent the voxel-to-surface distance. Where SDF is a signed distance function for depth map fusion, which uses 0 for the surface, positive values for points outside the model, negative values for points inside the model, and a value proportional to the distance of the point to the nearest surface. The surface is easily extracted as the zero crossing point of the function. If the depth value D (u, V) at the ith frame depth image D (u, V) is not 0, comparing D (u, V) with z in the voxel camera coordinates V (x, y, z), and if D (u, V) is larger than z, indicating that the voxel is closer to the camera for reconstructing the outer surface of the scene; if D (u, v) is less than z, this voxel is further from the camera and is reconstructed as the interior surface of the scene. I.e. the voxel point is outside the surface (closer to the camera side), the SDF value is positive and the voxel is inside the surface, the SDF value is negative.
Step 2.3: the coordinate system where the voxel grid is located is defined as a world coordinate system, and the voxel coordinate can be expressed as
Figure BDA0003650139520000121
Pose of the camera is
Figure BDA0003650139520000122
The camera's reference matrix is K.
Figure BDA0003650139520000123
Therefore, the voxel grid in the world coordinate system is subjected to transformation matrix obtained by ICP registration
Figure BDA0003650139520000124
Projecting the image data to a camera coordinate system, and converting the image data to an image coordinate system I according to a camera internal reference matrix K ij Wherein i ∈ (0,1,..., n), j ∈ (0,1,..., m).
Figure BDA0003650139520000131
After projecting the voxel grid to the image coordinate system, it is first necessary to calculate the initial SDF value for each voxel, as shown in equation (18)
Figure BDA0003650139520000132
In the formula
Figure BDA0003650139520000133
Indicating the position information of the jth voxel in the world coordinate system,
Figure BDA0003650139520000134
indicates the position information of the ith camera position in the world coordinate system, dep (I) ij ) Representing the depth value of the jth voxel in the ith camera depth image coordinate system.
The sdf value for each voxel is then truncated according to equation (19).
Figure BDA0003650139520000135
Where, trunc represents an artificially set truncation distance, which may be understood as an error value of the camera depth information, if the error is larger, the value is set to be larger, otherwise, much information acquired by the depth camera may be lost, and this entry trunc is set to 1.
After truncating the sdf value for each voxel, the tsdf value for each voxel is recalculated according to equation (20):
Figure BDA0003650139520000136
wherein omega j (x, y, z) is the weight of the voxel in the current frame global voxel grid, ω j-1 (x, y, z) is the weight of the voxel in the voxel grid of the previous frame, tsdf j For the distance of the voxel in the current frame global data cube to the object surface, tsdf j-1 Is the distance, V, from the voxel in the last frame of global data cube to the surface of the object .z Denotes the z-axis coordinate of the voxel in the camera coordinate system, and D (u, v) denotes the depth value at the current frame depth image (u, v).
Equation (9) calculates a weight value for each voxel.
w j =w j-1 +w j (9)
And rendering the environment information of the grabbing operation table from top to bottom, and taking the environment information as the input of the network model.
And step 3: establishing a reinforcement learning network model:
step 3.1: designing a mechanical arm grabbing strategy, and defining an action space for reinforcement learning: the target grasp is generally divided into two grasp position areas and a three-dimensional rotation amount, and the grasp area and the three-dimensional rotation amount are predicted by two network models, respectively. The method uses a two-dimensional convolution network to directly regress the 6-DoF grabbing posture by increasing the network dimension. The invention is defined in the space of grabbing motion of a single object as 16 × 28 motions, wherein 16 is the rotation motion (22.5 °) of the end effector around the z coordinate axis, and 28 × 28 is the number of the rotation motions performed on the x and y coordinate axes by the end effector. The gripper defines the gripping action as shown in figure 2.
Step 3.2: converting each element in the array into an angle of rotation of the tail end of the mechanical arm around three coordinate axes of x, y and z respectively, wherein a specific conversion formula is as follows:
Figure BDA0003650139520000141
Figure BDA0003650139520000142
r z =(best_pix_ind[2]*θ) (12)
wherein a is x The rotation angle of the tail end of the mechanical arm around the x axis is expressed, and the rotation angle is the roll angle of the end effector; b y The rotation angle of the tail end of the mechanical arm around the y axis is expressed, and the rotation angle is the pitch angle of the end effector; r is z The rotation angle of the tail end of the mechanical arm around the z-axis is represented, and the yaw angle of the end effector is obtained.
In the actual grab example, the deviation value is brought in
Figure BDA0003650139520000143
And theta is 180/16, and the predicted grabbing posture is calculated.
Step 3.3: designing a feature extraction network model: the model adopts a Resnet full convolution feature extraction network architecture to learn to extract unstructured features from RGB-D partial observation. The multi-layer feature extraction Convolutional layer extracts the features of the color image and the depth image, then channel connects the color features and the depth features through the Concat function, and keeps the resolution of the output thermal diagram consistent with the resolution of the input image by adopting 2 additional Batch Normalization (BN), Nonlinear activation functions (ReLU), 1 × 1 conditional Layers, and upsampling. And selecting the corresponding gesture in the action space corresponding to the maximum vector value coordinate in the tensor map as the predicted grabbing gesture to be output.
Step 3.4: the network is forward-inferred by the following formula:
Figure BDA0003650139520000151
Figure BDA0003650139520000152
Figure BDA0003650139520000153
Figure BDA0003650139520000154
wherein the formula (13) represents in the state s Expected reward under action a, where g t Indicating the grabbing action taken at time t, s t Indicating the state at time t, r t Representing the reward at time t; equation (14) represents the total reward function of the network; equation (15) is a state distribution function; equation (16) represents a state-action function.
Step 3.5, designing a reinforcement learning network loss function, and further improving the detection precision of the algorithm by adopting a calculation cross entropy function:
Figure BDA0003650139520000155
where τ is s 0 a 0 s 1 a 1 ...s n a n .., Markov process.
Due to the fact that
Figure BDA0003650139520000156
P r { a | s } ═ pi (s, a), so equation (18) can be obtained.
Figure BDA0003650139520000157
The weight update function is as follows:
Figure BDA0003650139520000161
Figure BDA0003650139520000162
Figure BDA0003650139520000163
wherein f is ω SxA → R is a pair
Figure BDA0003650139520000164
When f is an approximation function of ω When the minimum value Δ ω is 0, the formula (21) can be derived
Step 3.6: setting reinforcement learning reward: if the crawling is successful, the reward of the agent is assigned to be 1, otherwise, the reward of the agent is assigned to be 0.
And 4, step 4: and (3) reversely projecting the predicted grabbing attitude of the end effector to a three-dimensional reconstruction scene, and judging the grabbing quality of the prediction: and reversely projecting the 6-degree-of-freedom grabbing posture output by the network into a three-dimensional voxel scene established before grabbing, and intercepting a depth image under the predicted posture through the projection of a viewing pyramid, wherein the projection of the viewing pyramid is shown in an attached figure 4. Where the view frustum is a three-dimensional volume whose position is related to the camera's external parameter matrix, the shape of the view frustum determines how the model is projected from the pixel space to the plane. Perspective projection uses a pyramid as the view frustum, with the camera located at the vertex of the pyramid. The pyramid is truncated by two planes, one behind the other, forming a Frustum of a pyramid called View cluster, the model of which is shown in fig. 5, wherein only the model located inside cluster is visible. In this clipping space, two planes are more specialized, called near clip plane (near clip plane) and far clip plane (far clip plane), respectively.
The distance position between the parallel clamping jaw and the object is inferred by comparing the depth information of the two ends of the clamp in the rendered image, so that the grabbing quality of the predicted grabbing posture is judged, and the closed loop feedback of the robot arm 6-degree-of-freedom grabbing system is formed.
And 5: the grabbing and moving of the mechanical arm are completed through the forward and backward kinematics of the robot:
firstly, when the depth of the two ends of the gripper in the depth image rendered in the step 4 is less than or equal to the fingertip depth of the gripper, namely the predicted gesture of the gripper is positioned on the two sides of the object, the predicted gesture at this time is judged to be capable of being grabbed. And then, controlling the mechanical arm to execute a grabbing plan according to the predicted grabbing posture: 6 joint angle degrees of the mechanical arm in the current state are solved through inverse kinematics of the robot. And then inputting the predicted 6-dimensional grabbing posture of the tail end of the mechanical arm output by the strong learning network in the step 3 into positive kinematics of the robot, and solving the movement track of the end effector of the mechanical arm converted from the current posture to the predicted posture point. And when the end effector of the mechanical arm moves to the predicted grabbing posture, the robot sends a clamp closing signal to try to grab. After the gripper is closed, the end effector vertically moves upwards by 15cm, a depth image of the effector after moving upwards is obtained from an Intel Rrealsense D435 camera, and whether actual grabbing succeeds or not is judged by calculating the depth of two ends of the gripper. When the capturing is successful, the reinforcement learning model reward is assigned to be 1; and when the grabbing fails, the reinforcement learning network reward is assigned to be 0.
Step 6: and (3) performing reinforcement learning model training to ensure that the mechanical arm finishes the grabbing action:
and (4) importing the trained model weight parameters into a real mechanical arm UR5, carrying out experimental debugging, repeating the steps 1 to 5, and completing a six-degree-of-freedom closed-loop grabbing task of the mechanical arm. Through 200-round grabbing tests, the grabbing success rate of the mechanical arm six-degree-of-freedom grabbing method based on the TSDF for the unstructured objects is 85.3% (under the condition that 10 target objects are randomly stacked). When the number of target objects is increased to 15 random stacking situations, the grabbing success rate of the invention is still 81.5%. In conclusion, the reinforcement learning network is designed, the defects that the calculation of the grabbing attitude of the mechanical arm is complex and the time cost is high through the traditional physical derivation are overcome, and the problem that the 6-DOF grabbing attitude of the mechanical arm cannot be applied to target objects which are not recorded in a data set is solved. The method provided by the invention not only ensures a high capturing success rate of the mechanical arm model, but also is beneficial to the generalization of reinforcement learning, can be applied to unrecorded captured objects, and solves the problems of time-consuming calculation of the traditional method and reduction of instability of the point cloud model of the input part.

Claims (7)

1. A mechanical arm six-degree-of-freedom vision closed-loop grabbing method based on TSDF three-dimensional reconstruction is characterized in that: the method comprises the following steps:
step 1: calibrating a mechanical arm base coordinate system and a camera coordinate system by a Zhang-Yongyou camera calibration method and an ArUco Markers calibration method;
step 2: performing three-dimensional scene reconstruction on the acquired image information by using a TSDF function so as to reduce environmental noise points between objects;
and step 3: establishing a reinforcement learning network model;
and 4, step 4: reversely projecting the predicted grabbing attitude of the end effector to a three-dimensional reconstruction scene, and judging the grabbing quality predicted this time;
and 5: the grabbing movement of the mechanical arm is completed through the forward and backward kinematics of the robot;
step 6: and performing reinforcement learning model training to enable the mechanical arm to complete the grabbing action.
2. The mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction as claimed in claim 1, wherein: the step 1 comprises the following specific steps:
firstly, the Intel D415 depth camera is vertically fixed at the tail end of the mechanical arm, so that the camera and the clamping jaw of the end effector are in fixed posture transformation, and the binocular camera can acquire image information of an object on the grabbing operation table. Then, calibrating the internal reference and the external reference of the camera by a Zhang Zhengyou camera calibration method and an Aruco Markers calibration method, and constructing a three-dimensional coordinate system with the mechanical arm base as an origin. The calibration formulas of the internal reference and the external reference of the camera are as follows:
Figure FDA0003650139510000011
Figure FDA0003650139510000012
Figure FDA0003650139510000021
wherein x is w ,y w ,z w Three-dimensional coordinates under a world coordinate system; x is the number of c ,y c ,z c The coordinates are under an image coordinate system; u and v are pixel coordinate systems; z is a radical of m Is a point in the assumed camera coordinate system; p is an imaging point in an image coordinate system; f is an imaging distance; d x ,d y The coordinates of the pixel points in the image coordinate system are obtained; r and T are a rotation matrix and a translation matrix. Equation (1) is a world coordinate system to camera coordinate system equation. Formula (2) is a formula from a camera coordinate system to an ideal image coordinate system. Equation (3) is a camera coordinate system to pixel coordinate system matrix, i.e., an internal reference matrix.
3. The mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction as claimed in claim 1, wherein: the step 2 comprises the following specific steps:
2.1): in an actual experiment, a single sensor is adopted to obtain visual information, and information (color information and depth information) captured by a binocular camera on a grabbing operating platform is obtained through OpenCV.
2.2): due to light, mutual shielding of objects and other environmental factors, a large noise point exists in a depth image acquired by a sensor, and therefore the capturing and prediction of the network model on the image are influenced. Firstly, spherical point sampling is carried out on a grabbing operation table before grabbing each time, and depth errors in a single image are reduced by sampling local images for multiple times. Secondly, the sampling point posture and the sampling image information are fused into a three-dimensional reconstruction voxel grid by utilizing a TSDF function. For a single voxel, not only the intuitive x, y, z coordinates are contained, but also two other values (SDF and CAM) are used to represent the voxel-to-surface distance. Where SDF is a signed distance function for depth map fusion, which uses 0 for the surface, positive values for points outside the model, negative values for points inside the model, and a value proportional to the distance of the point to the nearest surface. The surface is easily extracted as the zero crossing point of the function. As shown in fig. 2-4, if the depth value D (u, V) at the i-th frame depth image D (u, V) is not 0, comparing D (u, V) with z in the voxel camera coordinates V (x, y, z), and if D (u, V) is greater than z, indicating that the voxel is closer to the camera for reconstructing the outer surface of the scene; if D (u, v) is less than z, this voxel is further from the camera and is reconstructed as the interior surface of the scene. I.e. the voxel point is outside the surface (closer to the camera side), the SDF value is positive and the voxel is inside the surface, the SDF value is negative.
2.3): the coordinate system where the voxel grid is located is defined as a world coordinate system, and the voxel coordinate can be expressed as
Figure FDA0003650139510000031
Figure FDA0003650139510000032
Position of cameraIs in the posture of
Figure FDA0003650139510000033
The camera's reference matrix is K.
Figure FDA0003650139510000034
Therefore, the voxel grid in the world coordinate system is subjected to transformation matrix obtained by ICP registration
Figure FDA0003650139510000035
Projecting the image data to a camera coordinate system, and converting the image data to an image coordinate system I according to a camera internal reference matrix K ij Wherein i ∈ (0,1,..., n), j ∈ (0,1,..., m).
Figure FDA0003650139510000036
After projecting the voxel grid to the image coordinate system, it is first necessary to calculate the initial SDF value for each voxel, as shown in equation (6)
Figure FDA0003650139510000037
In the formula
Figure FDA0003650139510000038
Indicating the position information of the jth voxel in the world coordinate system,
Figure FDA0003650139510000039
indicates the position information of the ith camera position in the world coordinate system, dep (I) ij ) Representing the depth value of the jth voxel in the ith camera depth image coordinate system.
The sdf value for each voxel is then truncated according to equation (19).
Figure FDA00036501395100000310
Wherein tsdf is i For voxel truncation distance value, trunc represents artificially set truncation distance, which can be understood as an error value of camera depth information, if the error is larger, the value is set to be larger, otherwise, much information acquired by the depth camera may be lost, and the item trunc is set to be 1.
After truncating the sdf value for each voxel, the tsdf value for each voxel is recalculated according to equation (20):
Figure FDA00036501395100000311
wherein ω is j (x, y, z) is the weight of the voxel in the current frame global voxel grid, ω j-1 (x, y, z) is the weight of the voxel in the voxel grid of the previous frame, tsdf j For the distance of the voxel in the current frame global data cube to the object surface, tsdf j-1 Is the distance, V, from the voxel in the last frame of global data cube to the surface of the object .z Denotes the z-axis coordinate of the voxel in the camera coordinate system, and D (u, v) denotes the depth value at the current frame depth image (u, v).
Equation (9) calculates a weight value for each voxel.
w j =w j-1 +w j (9)
And rendering the environment information of the grabbing operation table from top to bottom, and taking the environment information as the input of the network model.
4. The mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction as claimed in claim 1, wherein: the step 3 comprises the following specific steps:
3.1): designing a mechanical arm grabbing strategy, and defining an action space for reinforcement learning: the target grasp is generally divided into two grasp position areas and a three-dimensional rotation amount, and the grasp area and the three-dimensional rotation amount are predicted by two network models, respectively. The method uses a two-dimensional convolution network to directly regress the 6-DoF grabbing posture by increasing the network dimension. The invention is defined in the space of grabbing motion of a single object as 16 × 28 motions, wherein 16 is the rotation motion (22.5 °) of the end effector around the z coordinate axis, and 28 × 28 is the number of the rotation motions performed on the x and y coordinate axes by the end effector. The gripper defines the gripping action as shown in figure 2.
3.2): converting each element in the array into an angle of rotation of the tail end of the mechanical arm around three coordinate axes of x, y and z respectively, wherein a specific conversion formula is as follows:
a x =((best_pix_ind[0]-14)*30/28)-pi (10)
b y =((best_pix_ind[1]-14)*30/28) (11)
r z =(best_pix_ind[2]*180/16) (12)
wherein a is x The rotation angle of the tail end of the mechanical arm around the x axis is represented as the roll angle of the end effector; b y The rotation angle of the tail end of the mechanical arm around the y axis is expressed, and the rotation angle is the pitch angle of the end effector; r is z The rotation angle of the tail end of the mechanical arm around the z axis is represented as the yaw angle of the end effector, and theta is an experimental set deviation value.
3.3): designing a feature extraction network model: the model adopts a Resnet full convolution feature extraction network architecture to learn to extract unstructured features from RGB-D partial observation. The multi-layer feature extraction Convolutional layer extracts the features of the color image and the depth image, then channel connects the color features and the depth features through the Concat function, and keeps the resolution of the output thermal diagram consistent with the resolution of the input image by adopting 2 additional Batch Normalization (BN), Nonlinear activation functions (ReLU), 1 × 1 conditional Layers, and upsampling. And selecting the corresponding gesture in the action space corresponding to the maximum vector value coordinate in the tensor map as the predicted grabbing gesture to be output.
3.4): the network is forward-inferred by the following formula:
Figure FDA0003650139510000051
Figure FDA0003650139510000052
Figure FDA0003650139510000053
Figure FDA0003650139510000054
where equation (13) represents the expected return in state s, action a, where g t Indicating the grabbing action taken at time t, s t Indicating the state at time t, r t Representing the reward at time t; equation (14) represents the total reward function of the network; equation (15) is a state distribution function; equation (16) represents a state-action function.
And 3.5) designing a reinforcement learning network loss function, and adopting a calculation cross entropy function, thereby further improving the detection precision of the algorithm:
Figure FDA0003650139510000055
where τ is s 0 a 0 s 1 a 1 ...s n a n .., Markov process.
Due to the fact that
Figure FDA0003650139510000061
P r { a | s } ═ pi (s, a), so equation (18) can be obtained.
Figure FDA0003650139510000062
The weight update function is as follows:
Figure FDA0003650139510000063
Figure FDA0003650139510000064
Figure FDA0003650139510000065
wherein f is ω SxA → R is a pair
Figure FDA0003650139510000066
When f is an approximation function of ω When the minimum value Δ ω is 0, the formula (21) can be derived
3.6): setting reinforcement learning reward: if the crawling is successful, the reward of the agent is assigned to be 1, otherwise, the reward of the agent is assigned to be 0.
5. The mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction as claimed in claim 1, wherein: the step 4 comprises the following specific steps:
and (3) reversely projecting the predicted grabbing attitude of the end effector into a three-dimensional reconstruction scene, and judging the grabbing quality of the prediction: and reversely projecting the 6-degree-of-freedom grabbing posture output by the network into a three-dimensional voxel scene established before grabbing, and intercepting a depth image under the predicted posture through the projection of a viewing pyramid, wherein the projection of the viewing pyramid is shown in an attached figure 4. Where the viewing pyramid is a three-dimensional volume whose position is related to the camera's external reference matrix, the shape of the viewing pyramid determines how the model is projected from the pixel space to the plane. Perspective projection uses a pyramid as the view frustum, with the camera located at the vertex of the pyramid. The pyramid is truncated by two planes, one behind the other, forming a Frustum of a pyramid called View cluster, the model of which is shown in fig. 5, wherein only the model located inside cluster is visible. In this clipping space, two planes are more specialized, called near clip plane (near clip plane) and far clip plane (far clip plane), respectively.
The distance position between the parallel clamping jaw and the object is inferred by comparing the depth information of the two ends of the clamp in the rendered image, so that the grabbing quality of the predicted grabbing posture is judged, and the closed loop feedback of the robot arm 6-degree-of-freedom grabbing system is formed.
6. The mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction as claimed in claim 1, wherein: the step 5 comprises the following steps:
and when the depth of the two ends of the gripper in the rendered depth image is less than or equal to the fingertip depth of the gripper, namely the predicted gesture of the gripper is positioned on the two sides of the object, judging that the predicted gesture can be grabbed. And then, controlling the mechanical arm to execute a grabbing plan according to the predicted grabbing posture: 6 joint angle degrees of the mechanical arm in the current state are solved through inverse kinematics of the robot. And then inputting the predicted 6-dimensional grabbing posture of the tail end of the mechanical arm output by the strong learning network in the step 3 into positive kinematics of the robot, and solving the movement track of the end effector of the mechanical arm converted from the current posture to the predicted posture point. And when the end effector of the mechanical arm moves to the predicted grabbing posture, the robot sends a clamp closing signal to try to grab. After the gripper is closed, the end effector vertically moves upwards by 15cm, a depth image of the effector after moving upwards is obtained from an Intel Rrealsense D435 camera, and whether actual grabbing succeeds or not is judged by calculating the depth of two ends of the gripper. When the capturing is successful, the reinforcement learning model reward is assigned to be 1; and when the grabbing fails, the reinforcement learning network reward is assigned to be 0.
7. The mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction as claimed in claim 1, wherein: the step 6 comprises the following specific steps:
and continuously repeating the steps 2 to 5 in a CoppelliaSim Edu simulation environment, reducing a loss function in the reinforcement learning model through reinforcement learning reward, and updating the model weight parameter. And finally, importing the trained weight parameters into a real mechanical arm UR5, carrying out experimental debugging, repeating the steps 1 to 5, and completing a six-degree-of-freedom closed-loop grabbing task of the mechanical arm.
CN202210551360.5A 2022-05-18 2022-05-18 Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction Active CN114851201B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210551360.5A CN114851201B (en) 2022-05-18 2022-05-18 Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210551360.5A CN114851201B (en) 2022-05-18 2022-05-18 Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction

Publications (2)

Publication Number Publication Date
CN114851201A true CN114851201A (en) 2022-08-05
CN114851201B CN114851201B (en) 2023-09-05

Family

ID=82638419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210551360.5A Active CN114851201B (en) 2022-05-18 2022-05-18 Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction

Country Status (1)

Country Link
CN (1) CN114851201B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984388A (en) * 2023-02-28 2023-04-18 江西省智能产业技术创新研究院 Spatial positioning accuracy evaluation method, system, storage medium and computer
CN116524217A (en) * 2023-07-03 2023-08-01 北京七维视觉传媒科技有限公司 Human body posture image matching method and device, electronic equipment and storage medium
CN117549307A (en) * 2023-12-15 2024-02-13 安徽大学 Robot vision grabbing method and system in unstructured environment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9205562B1 (en) * 2014-08-29 2015-12-08 Google Inc. Integration of depth points into a height map
CN106548466A (en) * 2015-09-16 2017-03-29 富士通株式会社 The method and apparatus of three-dimensional reconstruction object
CN110310362A (en) * 2019-06-24 2019-10-08 中国科学院自动化研究所 High dynamic scene three-dimensional reconstruction method, system based on depth map and IMU
CN112476434A (en) * 2020-11-24 2021-03-12 新拓三维技术(深圳)有限公司 Visual 3D pick-and-place method and system based on cooperative robot
CN112801988A (en) * 2021-02-02 2021-05-14 上海交通大学 Object grabbing pose detection method based on RGBD and deep neural network
CN113192128A (en) * 2021-05-21 2021-07-30 华中科技大学 Mechanical arm grabbing planning method and system combined with self-supervision learning
CN113752255A (en) * 2021-08-24 2021-12-07 浙江工业大学 Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9205562B1 (en) * 2014-08-29 2015-12-08 Google Inc. Integration of depth points into a height map
CN106548466A (en) * 2015-09-16 2017-03-29 富士通株式会社 The method and apparatus of three-dimensional reconstruction object
CN110310362A (en) * 2019-06-24 2019-10-08 中国科学院自动化研究所 High dynamic scene three-dimensional reconstruction method, system based on depth map and IMU
CN112476434A (en) * 2020-11-24 2021-03-12 新拓三维技术(深圳)有限公司 Visual 3D pick-and-place method and system based on cooperative robot
CN112801988A (en) * 2021-02-02 2021-05-14 上海交通大学 Object grabbing pose detection method based on RGBD and deep neural network
CN113192128A (en) * 2021-05-21 2021-07-30 华中科技大学 Mechanical arm grabbing planning method and system combined with self-supervision learning
CN113752255A (en) * 2021-08-24 2021-12-07 浙江工业大学 Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984388A (en) * 2023-02-28 2023-04-18 江西省智能产业技术创新研究院 Spatial positioning accuracy evaluation method, system, storage medium and computer
CN115984388B (en) * 2023-02-28 2023-06-06 江西省智能产业技术创新研究院 Spatial positioning precision evaluation method, system, storage medium and computer
CN116524217A (en) * 2023-07-03 2023-08-01 北京七维视觉传媒科技有限公司 Human body posture image matching method and device, electronic equipment and storage medium
CN116524217B (en) * 2023-07-03 2023-08-25 北京七维视觉传媒科技有限公司 Human body posture image matching method and device, electronic equipment and storage medium
CN117549307A (en) * 2023-12-15 2024-02-13 安徽大学 Robot vision grabbing method and system in unstructured environment
CN117549307B (en) * 2023-12-15 2024-04-16 安徽大学 Robot vision grabbing method and system in unstructured environment

Also Published As

Publication number Publication date
CN114851201B (en) 2023-09-05

Similar Documents

Publication Publication Date Title
JP6546618B2 (en) Learning apparatus, learning method, learning model, detection apparatus and gripping system
CN114851201B (en) Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction
CN109934864B (en) Residual error network deep learning method for mechanical arm grabbing pose estimation
CN110298886B (en) Dexterous hand grabbing planning method based on four-stage convolutional neural network
CN113524194A (en) Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning
CN114912287B (en) Robot autonomous grabbing simulation system and method based on target 6D pose estimation
CN111085997A (en) Capturing training method and system based on point cloud acquisition and processing
CN113276106B (en) Climbing robot space positioning method and space positioning system
Taryudi et al. Eye to hand calibration using ANFIS for stereo vision-based object manipulation system
CN111152227A (en) Mechanical arm control method based on guided DQN control
CN112512755A (en) Robotic manipulation using domain-invariant 3D representations predicted from 2.5D visual data
CN114882109A (en) Robot grabbing detection method and system for sheltering and disordered scenes
Gulde et al. RoPose: CNN-based 2D pose estimation of industrial robots
Van Tran et al. BiLuNetICP: A deep neural network for object semantic segmentation and 6D pose recognition
CN113752255B (en) Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning
Schaub et al. 6-dof grasp detection for unknown objects
CN112975968A (en) Mechanical arm simulation learning method based on third visual angle variable main body demonstration video
JP7051751B2 (en) Learning device, learning method, learning model, detection device and gripping system
CN116852353A (en) Method for capturing multi-target object by dense scene mechanical arm based on deep reinforcement learning
CN115578460A (en) Robot grabbing method and system based on multi-modal feature extraction and dense prediction
JP7349423B2 (en) Learning device, learning method, learning model, detection device and grasping system
CN114998573A (en) Grabbing pose detection method based on RGB-D feature depth fusion
CN114882113A (en) Five-finger mechanical dexterous hand grabbing and transferring method based on shape correspondence of similar objects
CN113436293A (en) Intelligent captured image generation method based on condition generation type countermeasure network
Bai et al. Kinect-based hand tracking for first-person-perspective robotic arm teleoperation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant