CN117103277A - Mechanical arm sensing method based on multi-mode data fusion - Google Patents

Mechanical arm sensing method based on multi-mode data fusion Download PDF

Info

Publication number
CN117103277A
CN117103277A CN202311297470.4A CN202311297470A CN117103277A CN 117103277 A CN117103277 A CN 117103277A CN 202311297470 A CN202311297470 A CN 202311297470A CN 117103277 A CN117103277 A CN 117103277A
Authority
CN
China
Prior art keywords
mechanical arm
camera
matrix
target object
coordinate system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311297470.4A
Other languages
Chinese (zh)
Inventor
谢世朋
苏士伟
谢静岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202311297470.4A priority Critical patent/CN117103277A/en
Publication of CN117103277A publication Critical patent/CN117103277A/en
Pending legal-status Critical Current

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • B25J9/161Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1661Programme controls characterised by programming, planning systems for manipulators characterised by task planning, object-oriented languages
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems

Abstract

The invention relates to the technical field of computer vision, and discloses a mechanical arm sensing method based on multi-mode data fusion, which comprises the following steps: performing camera calibration and hand-eye calibration; acquiring an image and a depth map by adopting an ROS (reactive oxygen species) driving camera, and calculating a center point and a pose matrix of a target object by adopting a YOLOv5 image recognition algorithm; designing a coordinate system transfer algorithm, and transferring a homogeneous matrix of the target object under a camera coordinate system to a homogeneous matrix under a mechanical arm base coordinate system; and designing a motion control algorithm to realize motion control of the mechanical arm and complete a set task. The invention integrates the RGB image, the depth image, the tactile system of the dexterous hand and other data of a plurality of modes to identify the target object, calculates the pose information of the center point of the object, operates the target object through the dexterous hand on the mechanical arm, improves the accuracy of sensing the operation target object by the tactile system of the dexterous hand, and realizes that the mechanical arm accurately grabs the target object.

Description

Mechanical arm sensing method based on multi-mode data fusion
Technical Field
The invention relates to the technical field of computer vision, in particular to a mechanical arm sensing method based on multi-mode data fusion.
Background
With the continuous development of the fields of computers, control theory, machine vision sensors and the like, the robot technology enters a high-speed development stage. Robots use robotic arms to perform more meaningful tasks and functions, which can improve production efficiency and quality of life in production and living. In factories, for example, robots can carry out cargo handling and loading and unloading tasks; in a welding workshop, the robot can help people to realize tasks such as welding, screwing and the like. When the robot executes various tasks, the end of the tool on the mechanical arm needs to be guided to reach the designated position through vision so as to effectively complete various designated tasks. The robot acquires environmental information by using RGB image information, depth map information and multi-mode sensor data such as a smart hand, identifies the category of a target object, estimates the pose of the target object, guides the tail end of a mechanical arm to reach the target object, and then executes various tasks, so that the robot is more intelligent. The RGB image may reflect real world features and the depth image may reflect key information such as real world dimensions. Through fusion of the RGB image and the depth image, the advantages of the RGB image and the depth image can be integrated, and the method is more suitable for various scenes in the practical application process.
The RGB image adopts the traditional YOLOv5 to identify the pixel center point of the object surface, the three-dimensional coordinates of the object surface can be calculated according to the depth information, and the three-dimensional coordinates are suitable for regular objects, but the YOLOv5 cannot identify the center point of the whole object. And the traditional sensing method has single mode, reduces the accuracy of sensing the operation target object by using the touch system of the dexterous hand, and can not realize accurate grabbing of the target object by the mechanical arm.
Disclosure of Invention
The invention provides a manipulator perception method based on multi-mode data fusion, which is used for identifying a target object by fusing data of a plurality of modes such as RGB images, depth images, a touch system of a smart hand and the like, calculating pose information of a center point of the object, operating the target object by the smart hand on the manipulator, improving accuracy of perceiving the operating target object by the touch system of the smart hand, and realizing accurate grabbing of the target object by the manipulator.
The invention provides a mechanical arm sensing method based on multi-mode data fusion, which comprises the following steps:
performing camera calibration and hand-eye calibration, wherein the camera calibration is performed by adopting a calibreateCAmera function of opencv, and the hand-eye calibration is performed by adopting a calibre handeye function of opencv;
acquiring an image and a depth map by adopting an ROS (reactive oxygen species) driving camera, and calculating a center point and a pose matrix of a target object by adopting a YOLOv5 image recognition algorithm;
designing a coordinate system transfer algorithm, and transferring a homogeneous matrix of the target object under a camera coordinate system to a homogeneous matrix under a mechanical arm base coordinate system;
and designing a motion control algorithm to realize motion control of the mechanical arm and complete a set task.
Further, the step of designing the motion control algorithm to realize motion control of the mechanical arm and complete the set task comprises the following steps:
judging whether a target object is identified by using a YOLOv5 image identification algorithm; if not, moving the robot body and continuing to identify the target object; if yes, obtaining the pose of the object through the depth information and the camera internal parameters;
judging whether an object center point needs to be identified, if so, starting VoteNet three-dimensional point cloud identification to obtain an object center point pose, and calculating an object surface pixel center point by combining with YOLOv5 pixel center point weighting, so as to perform a coordinate system transfer algorithm; if not, directly carrying out a coordinate system transfer algorithm;
performing a mechanical arm motion control algorithm, and judging whether the mechanical arm can reach the position; if not, returning to the step of moving the robot body to identify the target object; if yes, the mechanical arm is operated, and a dexterous hand is adopted to grasp or operate the target object until the task is completed.
Further, the step of acquiring an image and a depth map by using the ROS to drive the camera and calculating a center point and a pose matrix of the target object by using a YOLOv5 image recognition algorithm comprises the following steps:
compiling an ROS camera to drive and drive the camera, calling topics of RGB images and depth information of the camera, converting the RGB images into Mat matrixes of a cv library through cv_bridge, and storing pictures into jpg files;
loading a model of deep learning YOLOv5, identifying, calculating pixels of each target object pixel center point, and storing an identification result as a structure type;
carrying out VoteNet three-dimensional point cloud identification, loading a model for identification, determining the three-dimensional coordinates of the center of each target object, and assisting a YOLOv5 image identification algorithm to obtain the surface pixel position;
the depth camera obtains depth information of the position of the pixel, and obtains three-dimensional coordinates (x, y, z) of a center point of the target object according to the internal reference matrix, wherein the depth information is d; the coordinates of the center pixel point of the target object are known as follows: (u, v) internal reference matrix:three-dimensional coordinates: z=d; x= (u-c) x )*z/f x ;y=(u-c y )*z/f y
Further, the step of designing a coordinate system transfer algorithm to transfer the homogeneous matrix of the target object under the camera coordinate system to the homogeneous matrix under the mechanical arm base coordinate system includes:
the object can be identified by utilizing YOLOv5, a translation vector (x, y, z) of the object is obtained, a rotation vector of the default object is (0, 0), the default object is converted into a rotation matrix, and then the rotation matrix and the translation vector are spliced into a pose matrix M of 4*4 target2cam
The hand-eye calibration matrix M is obtained through hand-eye calibration cam2gripper
The homogeneous matrix M of the tail end of the mechanical arm under the mechanical arm base coordinate system can be obtained through the interface of the mechanical arm gripper2base Calculating a homogeneous matrix M of the tail end of the mechanical arm under the mechanical arm base coordinate system through a formula target2base The formula is M target2base =M gripper2base *M cam2gripper *M target2cam
Calculating the pose homogeneous matrix M which the mechanical arm actually should reach gripperWithTool2base Through the conversion matrix M of the tool tail end under the mechanical arm TCP mechanical arm tail end coordinate system tool2gripper The method comprises the steps of carrying out a first treatment on the surface of the Comprising the following steps: the homogeneous matrix of the target object under the basic coordinate system is equal to the homogeneous matrix of the tool end under the basic coordinate system, namely M target2base =M tool2base The method comprises the steps of carrying out a first treatment on the surface of the According to the coordinate system conversion relation, the tail end coordinate system of the mechanical armTransferring homogeneous matrix of lower tool end to mechanical arm polar coordinate system tool2base =M gripperWithTool2base *M tool2gripper
According to the two formulas of the steps, a homogeneous matrix M of the position to be reached of the tail end of the mechanical arm under the mechanical arm base coordinate system can be calculated gripperWithTool2base =M target2base *(M tool2gripper ) -1
The homogeneous matrix of 4*4 is decomposed into 3*3 rotation matrix and translation vector Location by matrix decomposition: (x, y, z), the rotation matrix is converted into a rotation vector (Rx, ry, rz), and the translation vector and the rotation vector are provided to a moveL interface of the mechanical arm to perform linear motion to a specified position.
Further, in the step of performing camera calibration and hand-eye calibration, firstly obtaining an internal parameter of the camera includes:
after the camera is started, the topic message camera_info of the camera is checked, and the internal reference coefficient and distortion coefficient of the camera can be checked;
calibrating the camera, shooting a plurality of pictures with different poses on the calibration plate by adopting the camera, wherein the three-dimensional actual coordinates of each corner point of the calibration plate and the two-dimensional pixel coordinates on the picture are in one-to-one correspondence, and thus the correspondence between the world actual coordinates and the pixel coordinates can be obtained;
calibrating the camera by utilizing a calibretecamera function of opencv, inputting the 2D point coordinates and the 3D point coordinates of a calibration plate and the size of a grid of the calibration plate, and outputting an internal reference matrix and distortion coefficients of the camera and external reference coefficients, wherein the external reference coefficients comprise rotation vectors and translation vectors.
Further, the step of calibrating the camera specifically includes:
preparing a calibration plate with 11 multiplied by 8 grids, and fixing the calibration plate on a wall surface;
the position of a base coordinate system of the mechanical arm is kept unchanged, the tail end of the mechanical arm is moved by using a mechanical arm demonstrator, and a camera moves along with the tail end of the mechanical arm;
the tail end of the mechanical arm moves to a position, and when a picture is taken, the position and the gesture information displayed on the demonstrator are recorded;
moving the tail end of the mechanical arm to another position, repeatedly shooting and recording, shooting 14 pictures with different positions and angles in total, and recording 14 groups of data;
programming by adopting VScode programming software and C++ programming language, setting parameters of the number of pictures, the size of each checkerboard on the calibration plate and the number of corner points of each row and each column on the calibration plate, extracting 88 corner points on the calibration plate and storing, and then carrying out sub-pixel refinement on each corner point, wherein the aim is to refine the corner points extracted in the prior course and store sub-pixel corner points;
assuming that a calibration plate is placed on a plane with Z=0 in a world coordinate system, initializing three-dimensional coordinates of each corner point on the calibration plate, calibrating a camera by using a calibrecat ePara function of opencv, wherein an input end is the three-dimensional coordinates and corresponding two-dimensional coordinates of each corner point, and an output end is an internal reference matrix and distortion coefficients of the camera, and a rotation vector and a translation vector of each picture;
and evaluating the calibration result, carrying out reprojection calculation on the three-dimensional coordinates of each angular point through the obtained internal and external parameters of the camera to obtain a new projection point, and then carrying out error estimation with the previous two-dimensional projection point to obtain the average pixel error of each image.
Further, the step of performing hand-eye calibration specifically includes:
the camera is calibrated to obtain an external parameter matrix corresponding to 14 groups of images, namely a rotation vector and a translation vector, and the representation form of the gesture rotation vector is converted into the representation form of the rotation matrix;
the pose matrix at the tail end of the mechanical arm under the mechanical arm base coordinate system corresponding to each image consists of a translation vector and Euler angles, and the representation form of the Euler angles is converted into the representation form of a rotation matrix for the purpose of taking the pose matrix as an input end of a hand-eye calibration matrix for calculating opencv in the follow-up process;
adopting a calibre handeye function of opencv, wherein the input end is a rotation matrix and a translation vector of the tail end of the mechanical arm under a mechanical arm coordinate system and a rotation matrix and a translation vector of a calibration plate under a camera coordinate system, and the output end is a hand-eye calibration matrix, and the calibration matrix at the moment consists of the rotation matrix and the translation matrix and is converted into a 4 multiplied by 4 hand-eye calibration alignment matrix;
the hand-eye calibration accuracy test is performed, because the relative positions of the calibration plate and the mechanical arm base coordinate system are kept unchanged, the calibration effect is verified, the homogeneous matrix at the tail end of the mechanical arm under the mechanical arm base coordinate system, the hand-eye calibration homogeneous matrix and the homogeneous matrix of the calibration plate under the camera coordinate system corresponding to each image are multiplied, and then a translation matrix is extracted from the result;
and (3) carrying out error analysis, wherein the 14 groups of translation matrixes can observe the calibration effect, and the maximum error in the X, Y, Z direction is obtained to determine whether the requirements are met.
Further, the motion process of the mechanical arm comprises:
logging in a mechanical arm, initializing initial angles of 6 joints by the mechanical arm, and ensuring that a camera at the tail end of the mechanical arm can observe a target object;
initializing the ROS, subscribing RGB topic information and depth information of the camera, carrying out image recognition, and returning recognition information, wherein the recognition information comprises a category and a pixel of a center point of a recognition frame;
establishing a state machine, and judging the running state of the mechanical arm and whether a camera identifies a target object or not; wherein, the running state of the mechanical arm includes: the mechanical arm is in a standby state, and the robot can normally walk at the moment; the mechanical arm is in an abnormal state, and the robot stops waiting at the moment; the mechanical arm is in a working state, and the robot stops waiting at the moment; the camera on the end of the mechanical arm does not observe the target;
identifying a target object, and judging whether the distance between the target object and a base coordinate system of the mechanical arm exceeds the control range of the mechanical arm;
newly starting a thread, releasing the ROS topic message in real time, and sending the running state of the real-time mechanical arm to the robot main control;
calculating the position of the target object under the base coordinate system of the mechanical arm according to the pixel and depth information of the center of the target object and the internal reference of the camera, and giving an initial gesture;
the mechanical arm is operated to the front of the target object, the target object is grabbed, a reasonable path planning is designed by utilizing the SDK of the mechanical arm, the mechanical arm is operated, and the mechanical arm is restored to an initial state after the mechanical arm is operated;
the mechanical arm tail end tool uses a mechanical arm dexterous hand with a touch system to grasp or operate the target object, and judges whether the target object is successfully grasped or not or whether the target object is successfully operated or not through the touch system and feedback information.
The beneficial effects of the invention are as follows:
1. establishing visual guidance of cooperative work between hands and eyes, calculating a hand and eye calibration matrix through hand and eye calibration, and obtaining internal parameters of a camera; pose information of a target object under the base coordinate system of the mechanical arm is obtained through RGB images and depth images, then the pose information is transmitted to a controller, the pose information of the target object under the base coordinate system of the mechanical arm is obtained through corresponding calculation, the mechanical arm is operated to reach a designated position, the position at the moment is the initial position for executing a task, and the precision of the pose information obtained through the camera is the premise that the mechanical arm can accurately reach the designated position.
2. The improved recognition method of YOLOv5 combined with the volntet three-dimensional point cloud target detection is provided, a two-dimensional RGB image and a three-dimensional point cloud image are fused, a visual guiding system is constructed, and the grabbing and operating tasks of a target object are realized.
3. The multi-mode data perception method based on two-dimensional RGB images, depth images, point clouds and smart hand touch is provided, the advantages of each mode are utilized to be combined, image recognition is carried out by using improved YOLOv5, the accuracy of pixel point recognition of the YOLOv5 image is enhanced by utilizing a surrounding frame recognized by the votenet three-dimensional point clouds in combination with depth information of a depth camera, the accuracy of operation target object is perceived by utilizing a touch system of the smart hand, and the mechanical arm can accurately grasp the target object;
4. the RGB image can extract geometric information, semantic information and texture information, and the characteristics of the point cloud are enhanced; and (3) acquiring the center point of the bounding box of the object by adopting an improved votenet three-dimensional point cloud target detection method, and carrying out weighted calculation by combining with the pixel center point identified by YOLOv5, thereby improving the accuracy of the pixel coordinate identification of the center point.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The traditional visual pose estimation method is a pose recognition method based on Aruco codes, the Aruco codes are required to be attached near a target object, the comparison is obvious, and the appearance is not attractive, and the deep learning pose recognition method based on the invention firstly uses YOLOv5 to recognize the target object to obtain a detection frame of the target object, and belongs to the category; and then aligning the two-dimensional RGB image with the depth image to obtain the depth information of the center point pixel of the target object, and further obtaining the position information of the target object. The specific flow comprises the following steps:
as shown in fig. 1, the invention provides a mechanical arm sensing method based on multi-mode data fusion, which comprises the following steps:
s1, camera calibration and hand-eye calibration are carried out, wherein the camera calibration is carried out by adopting a calibreateCAmera function of opencv, and the hand-eye calibration is carried out by adopting a calibreateHandeye function of opencv.
S1.1, firstly, obtaining an internal parameter coefficient of a camera, wherein the main method for obtaining the internal parameter coefficient of the camera is as follows:
s1.1.1, installing a camera driver and an ROS, and after the camera is started, checking a topic message camera_info of the camera to check an internal parameter coefficient and a distortion coefficient of the camera;
s1.1.2, calibrating the camera by using a correcting calibration method, shooting a plurality of pictures with different poses by using the camera to the calibration plate, wherein the three-dimensional actual coordinates of each corner point of the calibration plate and the two-dimensional pixel coordinates on the picture are in one-to-one correspondence, and the corresponding relation between the world actual coordinates and the pixel coordinates can be obtained.
S1.1.3 calibrating the camera by utilizing a calibrecode function of opencv, inputting the 2D point coordinates and the 3D point coordinates of the calibration plate and the size of the grid of the calibration plate, and outputting an internal reference matrix and distortion coefficients of the camera and external reference coefficients, wherein the external reference coefficients comprise rotation vectors and translation vectors.
S1.2, a specific calibration method for camera calibration and hand-eye calibration comprises the following steps:
s1.2.1, camera calibration: the camera calibration method of step S1.1.3 is selected because the rotation and translation vectors of each calibration plate picture need to be acquired in order to provide input data for subsequent hand-eye calibration matrix determinations. The obtained internal reference matrix, distortion coefficient and external reference coefficient are shown in the figure. The specific implementation method comprises the following steps:
step one: preparing a calibration plate with 11 multiplied by 8 grids, and fixing the calibration plate on a wall surface;
step two: the position of the base coordinate system of the mechanical arm is kept unchanged, the mechanical arm demonstrator is used for moving the tail end of the mechanical arm, and the camera is fixed at the tail end of the mechanical arm by using the scheme that eyes are on hands, so that the camera moves along with the tail end of the mechanical arm;
step three: the tail end of the mechanical arm moves to a position, and when a picture is taken, the position and the gesture information displayed on the demonstrator are recorded;
step four: and (3) moving the tail end of the mechanical arm to another position, repeating the third shooting and recording, shooting 14 pictures at different positions and angles altogether, and recording 14 groups of data.
Step five: programming by adopting VScode programming software and C++ programming language, setting parameters such as the number of pictures, the size of each checkerboard on a calibration plate, the number of corner points of each row and each column on the calibration plate and the like, extracting 88 corner points on the calibration plate and storing, and then carrying out sub-pixel refinement on each corner point, wherein the aim is to refine the corner points extracted in the prior course and store sub-pixel corner points;
step six: the calibration board is assumed to be placed on a plane with Z=0 in a world coordinate system, the three-dimensional coordinates of each corner point on the calibration board are initialized, the camera is calibrated by using the calibretecamera function of opencv, the input end is the three-dimensional coordinates and the corresponding two-dimensional coordinates of each corner point, and the output end is an internal reference matrix and distortion coefficients of the camera, and the rotation vector and the translation vector of each picture.
Step seven: evaluating the calibration result, carrying out reprojection calculation on the three-dimensional coordinates of each angular point through the obtained internal and external parameters of the camera to obtain a new projection point, and then carrying out error estimation with the previous two-dimensional projection point to obtain an average pixel error of each image;
step eight: and (5) storing a calibration result.
S1.2.2, hand-eye calibration: the purpose of hand-eye calibration is to design an algorithm to obtain a hand-eye calibration matrix, and the conversion relation between a mechanical arm terminal coordinate system and a camera coordinate system is known by solving the hand-eye calibration matrix according to the scheme of the adopted eyes on the hand. The specific implementation method comprises the following steps:
step one: when the camera is calibrated, the external parameter matrix corresponding to 14 groups of images, namely the rotation vector and the translation vector, is obtained, and then the representation form of the gesture rotation vector is converted into the representation form of the rotation matrix;
step two: the pose matrix at the tail end of the mechanical arm under the mechanical arm base coordinate system corresponding to each image consists of a translation vector and Euler angles, and the representation form of the Euler angles needs to be converted into the representation form of a rotation matrix in order to take the pose matrix as the input end of the opencv calculation hand-eye calibration matrix in the follow-up process.
Step three: the calibre handeye function of opencv is used, the input end is a rotation matrix and a translation vector of the tail end of the mechanical arm under the mechanical arm coordinate system and a rotation matrix and a translation vector of the calibration plate under the camera coordinate system, the output end is a hand-eye calibration matrix, and the calibration matrix at the moment consists of the rotation matrix and the translation matrix and needs to be converted into a 4-by-4 hand-eye calibration alignment matrix.
Step four: and (3) testing the hand-eye calibration accuracy, wherein the calibration effect is verified because the relative positions of the calibration plate and the mechanical arm base coordinate system are kept unchanged. The homogeneous matrix at the tail end of the mechanical arm under the mechanical arm base coordinate system, the homogeneous matrix for calibrating the hand and the eye and the homogeneous matrix for calibrating the board under the camera coordinate system corresponding to each image are multiplied, and then the translation matrix is extracted from the result.
Step five: error analysis, namely, the calibration effect can be observed by 14 groups of translation matrixes, the minimum X direction is-1.120 m, the maximum X direction is-1.196 m, and the maximum error is 0.004m; the minimum of the Y direction is 0.121m, the minimum of the Y direction is 0.132m, and the maximum error of the Y direction is 0.011m; the minimum Z direction is-0.102 m, the minimum Z direction is-0.095 m, and the maximum Z direction error is 0.007m. The maximum error in all directions of XYZ is 1.1cm, which meets the requirements.
S2, acquiring an image and a depth map by adopting an ROS (reactive oxygen species) driving camera, and calculating a central point and a pose matrix of a target object by adopting a YOLOv5 image recognition algorithm; the specific method comprises the following steps:
s2.1, compiling an ROS camera driver, and driving a camera; instructions to: roslaunch orbdec_camera gemini2.Launch.
S2.2, calling topics of RGB images and depth information of a camera: the RGB image is converted into Mat matrix of a cv library through cv_bridge, and the pictures are stored into jpg files;
s2.3, loading a model of deep learning YOLOv5, identifying, calculating pixels of each target object pixel center point, and storing an identification result as a structure type;
in the YOLOv5 image recognition, collecting a data set, shooting pictures at different positions and postures of a target object, and then marking the pictures by using labelIMG; sending the marked pictures to YOLOv5 for training, and training a model; then converting the pt model based on the pyTorch into an onnx intermediate model; using Atlas200DK developer suite, after deployment of the environment, the original model needs to be converted into a model suitable for the lifting hardware using ATC tools (. Om); copying the converted model into a model folder, reasoning pictures to be identified, identifying a target object, and giving out two-dimensional pixel coordinates of the target object; according to the internal parameters of the camera and the depth information of the camera at the pixel points, solving the three-dimensional coordinates of the target object under the camera coordinate system; . The default robot is opposite to the object, and the initial pose of the camera on the mechanical arm can be adjusted to be opposite to the object, so the initial pose defaults to (0, 0), and if the robot is not opposite to the target object, the pose needs to be calculated through the angles of the vehicle body and the target object.
S2.4, voteNet three-dimensional point cloud recognition, namely, loading a model for recognition, determining the three-dimensional coordinates of the center of each target object, and assisting a YOLOv5 image recognition algorithm to obtain the surface pixel position.
S2.5, the depth camera obtains depth information of the position of the pixel, and three-dimensional coordinates (x, y, z) of a center point of the target object are obtained according to the internal reference matrix, wherein the depth information is d;
the coordinates of the center pixel point of the target object are known as follows: (u, v) internal reference matrix:
three-dimensional coordinates: z=d; x= (u-c) x )*z/f x ;y=(u-c y )*z/f y
S3, designing a coordinate system transfer algorithm, and transferring the homogeneous matrix of the target object under the camera coordinate system to the homogeneous matrix under the mechanical arm base coordinate system; the specific method comprises the following steps:
s3.1, a target object can be identified by using YOLOv5, a translation vector (x, y, z) of the target object is obtained, a rotation vector of a default target object is (0, 0) and is converted into a rotation matrix, and then the rotation matrix and the translation vector are spliced into a pose matrix M of 4*4 target2cam
S3.2, obtaining a hand-eye calibration matrix M through hand-eye calibration cam2gripper
S3.3, obtaining a homogeneous matrix M of the tail end of the mechanical arm under the mechanical arm base coordinate system through an interface of the mechanical arm gripper2base Calculating a homogeneous matrix M of the tail end of the mechanical arm under the mechanical arm base coordinate system by the following formula target2base
M target2base =M gripper2base *M cam2gripper *M target2cam
S3.4, considering that the tail end of the mechanical arm needs to be provided with a tool, such as a dexterous hand, the mechanical arm real time is calculatedPose homogeneous matrix M that should be reached gripperWithTool2base Through the conversion matrix M of the tool tail end under the mechanical arm TCP mechanical arm tail end coordinate system tool2gripper
The homogeneous matrix of the target object under the basic coordinate system is equal to the homogeneous matrix of the tool tail end under the basic coordinate system, namely:
M target2base =M tool2base
according to the coordinate system conversion relation, transferring the homogeneous matrix of the tool tail end under the mechanical arm tail end coordinate system to the mechanical arm polar coordinate system:
M tool2base =M gripperWithTool2base *M tool2gripper
s3.5, calculating a homogeneous matrix of the position to be reached of the tail end of the mechanical arm under the mechanical arm base coordinate system according to two formulas in the step S3.4:
M gripperWithTool2base =M target2base *(M tool2gripper ) -1
the homogeneous matrix of 4*4 is decomposed into 3*3 rotation matrix and translation vector Location by matrix decomposition: (x, y, z), the rotation matrix is converted into a rotation vector (Rx, ry, rz), and the translation vector and the rotation vector are provided to a moveL interface of the mechanical arm to perform linear motion to a specified position.
And S4, designing a motion control algorithm to realize motion control of the mechanical arm and complete a set task.
As shown in fig. 1, completing grabbing or manipulating the target object includes:
s41, judging whether a target object is identified by adopting a YOLOv5 image identification algorithm; if not, moving the robot body and continuing to identify the target object; if yes, obtaining the pose of the object through the depth information and the camera internal parameters;
s42, judging whether an object center point needs to be identified, if so, starting VoteNet three-dimensional point cloud identification to obtain an object center point pose, calculating an object surface pixel center point by combining with YOLOv5 pixel center point weighting, and further carrying out a coordinate system transfer algorithm; if not, directly carrying out a coordinate system transfer algorithm;
s43, performing a mechanical arm motion control algorithm, and judging whether the mechanical arm can reach the position; if not, returning to the step of moving the robot body to identify the target object; if yes, the mechanical arm is operated, and a dexterous hand is adopted to grasp or operate the target object until the task is completed.
Calculating the pose matrix of the object comprises adopting VScode programming software and C++ programming language to carry out algorithm design, and the specific implementation method comprises the following steps:
step one: observing an object in real time through a camera, and taking a picture;
step two: obtaining a pose matrix of a target object center point under a camera coordinate system through the YOLOv5 image recognition algorithm, the depth image information and the camera internal reference matrix;
step three: the algorithm is designed, and the pose matrix of the object is converted into a 4 multiplied by 4 homogeneous matrix through VScode software and C++ programming, so that matrix operation is convenient to use;
step four: acquiring a pose matrix of the lower end of a mechanical arm base coordinate system from a mechanical arm controller, and then converting a representation method of Euler angles and translation matrices into a representation method of homogeneous matrices;
step five: the hand-eye calibration homogeneous matrix calculated by the previous hand-eye calibration is added, three homogeneous matrices are obtained at the moment, matrix multiplication is carried out on the three matrices, the homogeneous matrix of the target object under the mechanical arm base coordinate system can be obtained, and then the homogeneous matrix is converted into a rotation vector and a translation vector.
Step six: and (3) calibrating by the mechanical arm TCP to obtain the position and the gesture from the tail end of the smart hand of the end effector to the tail end of the mechanical arm, and transferring the gesture of the tail end of the mechanical arm under the base coordinate system of the mechanical arm to the gesture of the tail end of the smart hand under the base coordinate system.
The invention is developed by adopting a UR5 mechanical arm, and the UR5 flexible robot is very suitable for optimizing lightweight collaboration processes, such as picking, placing and testing. UR5 is easy to program and enables quick setup, achieving a desirable balance between machine size and power, being an ideal choice for automated lightweight machining tasks. The UR robot can simulate the movement range of human arms, can carry out dragging teaching, and can set the road point only by moving the robot to the target position. The mechanical arm rotates or translates according to the rotation vector and the translation vector to reach the appointed position. The movement process of the mechanical arm is as follows:
a. logging in a mechanical arm, initializing initial angles of 6 joints by the mechanical arm, and ensuring that a camera at the tail end of the mechanical arm can observe a target object;
initializing the ROS, subscribing RGB topic information and depth information of the camera, providing topic information of an RGB image for Atlas200DK developer suite to perform image recognition, and returning recognition information, wherein the recognition information comprises category and pixels of a recognition frame center point;
c. and establishing a state machine, and judging the running state of the mechanical arm and whether the camera recognizes the target object. Totally divided into 4 states: (1) The mechanical arm is in a standby state, and the robot can normally walk at the moment; (2) The mechanical arm is in an abnormal state, and the robot stops waiting at the moment; (3) The mechanical arm is in a working state, and the robot stops waiting at the moment; (4) No object is observed by the camera on the end of the robotic arm.
d. Identifying a target object, and judging whether the distance between the target object and a base coordinate system of the mechanical arm exceeds the control range of the mechanical arm;
e. and newly starting a thread, releasing the ROS topic message in real time, and sending the running state of the real-time mechanical arm to the robot main control.
f. According to the pixel and depth information of the center of the target object and the internal parameters of the camera, the position of the target object under the base coordinate system of the mechanical arm can be calculated, and an initial gesture is given.
g. The mechanical arm is operated to the front of the target object, the target object is grabbed, a reasonable path planning is designed by utilizing the SDK of the mechanical arm, the mechanical arm is operated, and the mechanical arm is restored to an initial state after the mechanical arm is operated.
h. The mechanical arm tail end tool uses a mechanical arm dexterous hand with a touch system to grasp or operate the target object, and judges whether the target object is successfully grasped or not or whether the target object is successfully operated or not through the touch system and feedback information.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes using the descriptions and drawings of the present invention or directly or indirectly applied to other related technical fields are included in the scope of the invention.

Claims (8)

1. A mechanical arm sensing method based on multi-mode data fusion is characterized by comprising the following steps:
performing camera calibration and hand-eye calibration, wherein the camera calibration is performed by adopting a calibreateCAmera function of opencv, and the hand-eye calibration is performed by adopting a calibre handeye function of opencv;
acquiring an image and a depth map by adopting an ROS (reactive oxygen species) driving camera, and calculating a center point and a pose matrix of a target object by adopting a YOLOv5 image recognition algorithm;
designing a coordinate system transfer algorithm, and transferring a homogeneous matrix of the target object under a camera coordinate system to a homogeneous matrix under a mechanical arm base coordinate system;
and designing a motion control algorithm to realize motion control of the mechanical arm and complete a set task.
2. The method for sensing the mechanical arm based on multi-mode data fusion according to claim 1, wherein the step of designing the motion control algorithm to realize motion control of the mechanical arm and complete the given task comprises the following steps:
judging whether a target object is identified by using a YOLOv5 image identification algorithm; if not, moving the robot body and continuing to identify the target object; if yes, obtaining the pose of the object through the depth information and the camera internal parameters;
judging whether an object center point needs to be identified, if so, starting VoteNet three-dimensional point cloud identification to obtain an object center point pose, and calculating an object surface pixel center point by combining with YOLOv5 pixel center point weighting, so as to perform a coordinate system transfer algorithm; if not, directly carrying out a coordinate system transfer algorithm;
performing a mechanical arm motion control algorithm, and judging whether the mechanical arm can reach the position; if not, returning to the step of moving the robot body to identify the target object; if yes, the mechanical arm is operated, and a dexterous hand is adopted to grasp or operate the target object until the task is completed.
3. The method for sensing a mechanical arm based on multi-mode data fusion according to claim 1, wherein the step of acquiring the image and the depth map by using the ROS-driven camera and calculating the center point and the pose matrix of the target by using the YOLOv5 image recognition algorithm comprises the following steps:
compiling an ROS camera to drive and drive the camera, calling topics of RGB images and depth information of the camera, converting the RGB images into Mat matrixes of a cv library through cv_bridge, and storing pictures into jpg files;
loading a model of deep learning YOLOv5, identifying, calculating pixels of each target object pixel center point, and storing an identification result as a structure type;
carrying out VoteNet three-dimensional point cloud identification, loading a model for identification, determining the three-dimensional coordinates of the center of each target object, and assisting a YOLOv5 image identification algorithm to obtain the surface pixel position;
the depth camera obtains depth information of the position of the pixel, and obtains three-dimensional coordinates (x, y, z) of a center point of the target object according to the internal reference matrix, wherein the depth information is d; the coordinates of the center pixel point of the target object are known as follows: (u, v) internal reference matrix:three-dimensional coordinates: z=d; x= (u-c) x )*z/f x ;y=(u-c y )*z/f y
4. The method for sensing a manipulator arm based on multi-modal data fusion according to claim 1, wherein the step of designing a coordinate system transfer algorithm to transfer a homogeneous matrix of the target object in a camera coordinate system to a homogeneous matrix of the target object in a manipulator arm base coordinate system comprises:
the object can be identified by utilizing YOLOv5, a translation vector (x, y, z) of the object is obtained, a rotation vector of the default object is (0, 0), the default object is converted into a rotation matrix, and then the rotation matrix and the translation vector are spliced into a pose matrix M of 4*4 target2cam
The hand-eye calibration matrix M is obtained through hand-eye calibration cam2gripper
The homogeneous matrix M of the tail end of the mechanical arm under the mechanical arm base coordinate system can be obtained through the interface of the mechanical arm gripper2base Calculating a homogeneous matrix M of the tail end of the mechanical arm under the mechanical arm base coordinate system through a formula target2base The formula is M target2base =M gripper2base *M cam2gripper *M target2cam
Calculating the pose homogeneous matrix M which the mechanical arm actually should reach gripperWithTool2base Through the conversion matrix M of the tool tail end under the mechanical arm TCP mechanical arm tail end coordinate system tool2gripper The method comprises the steps of carrying out a first treatment on the surface of the Comprising the following steps: the homogeneous matrix of the target object under the basic coordinate system is equal to the homogeneous matrix of the tool end under the basic coordinate system, namely M target2base =M tool2base The method comprises the steps of carrying out a first treatment on the surface of the According to the coordinate system conversion relation, transferring the homogeneous matrix of the tool tail end under the mechanical arm tail end coordinate system to M under the mechanical arm polar coordinate system tool2base =M gripperWithTool2base *M tool2gripper
According to the two formulas of the steps, a homogeneous matrix M of the position to be reached of the tail end of the mechanical arm under the mechanical arm base coordinate system can be calculated gripperWithTool2base =M target2base *(M tool2gripper ) -1
The homogeneous matrix of 4*4 is decomposed into 3*3 rotation matrix and translation vector Location by matrix decomposition: (x, y, z), the rotation matrix is converted into a rotation vector (Rx, ry, rz), and the translation vector and the rotation vector are provided to a moveL interface of the mechanical arm to perform linear motion to a specified position.
5. The method for sensing a manipulator arm based on multi-modal data fusion according to claim 1, wherein in the steps of calibrating a camera and calibrating a hand and an eye, reference coefficients of the camera are obtained first, comprising:
after the camera is started, the topic message camera_info of the camera is checked, and the internal reference coefficient and distortion coefficient of the camera can be checked;
calibrating the camera, shooting a plurality of pictures with different poses on the calibration plate by adopting the camera, wherein the three-dimensional actual coordinates of each corner point of the calibration plate and the two-dimensional pixel coordinates on the picture are in one-to-one correspondence, and thus the correspondence between the world actual coordinates and the pixel coordinates can be obtained;
calibrating the camera by utilizing a calibretecamera function of opencv, inputting the 2D point coordinates and the 3D point coordinates of a calibration plate and the size of a grid of the calibration plate, and outputting an internal reference matrix and distortion coefficients of the camera and external reference coefficients, wherein the external reference coefficients comprise rotation vectors and translation vectors.
6. The method for sensing a manipulator arm based on multi-modal data fusion of claim 5, wherein the step of calibrating the camera specifically comprises:
preparing a calibration plate with 11 multiplied by 8 grids, and fixing the calibration plate on a wall surface;
the position of a base coordinate system of the mechanical arm is kept unchanged, the tail end of the mechanical arm is moved by using a mechanical arm demonstrator, and a camera moves along with the tail end of the mechanical arm;
the tail end of the mechanical arm moves to a position, and when a picture is taken, the position and the gesture information displayed on the demonstrator are recorded;
moving the tail end of the mechanical arm to another position, repeatedly shooting and recording, shooting 14 pictures with different positions and angles in total, and recording 14 groups of data;
programming by adopting VScode programming software and C++ programming language, setting parameters of the number of pictures, the size of each checkerboard on the calibration plate and the number of corner points of each row and each column on the calibration plate, extracting 88 corner points on the calibration plate and storing, and then carrying out sub-pixel refinement on each corner point, wherein the aim is to refine the corner points extracted in the prior course and store sub-pixel corner points;
assuming that a calibration plate is placed on a plane with Z=0 in a world coordinate system, initializing three-dimensional coordinates of each corner point on the calibration plate, calibrating a camera by using a calibrecat ePara function of opencv, wherein an input end is the three-dimensional coordinates and corresponding two-dimensional coordinates of each corner point, and an output end is an internal reference matrix and distortion coefficients of the camera, and a rotation vector and a translation vector of each picture;
and evaluating the calibration result, carrying out reprojection calculation on the three-dimensional coordinates of each angular point through the obtained internal and external parameters of the camera to obtain a new projection point, and then carrying out error estimation with the previous two-dimensional projection point to obtain the average pixel error of each image.
7. The method for sensing a manipulator arm based on multi-modal data fusion of claim 6, wherein the step of performing hand-eye calibration comprises:
the camera is calibrated to obtain an external parameter matrix corresponding to 14 groups of images, namely a rotation vector and a translation vector, and the representation form of the gesture rotation vector is converted into the representation form of the rotation matrix;
the pose matrix at the tail end of the mechanical arm under the mechanical arm base coordinate system corresponding to each image consists of a translation vector and Euler angles, and the representation form of the Euler angles is converted into the representation form of a rotation matrix for the purpose of taking the pose matrix as an input end of a hand-eye calibration matrix for calculating opencv in the follow-up process;
adopting a calibre handeye function of opencv, wherein the input end is a rotation matrix and a translation vector of the tail end of the mechanical arm under a mechanical arm coordinate system and a rotation matrix and a translation vector of a calibration plate under a camera coordinate system, and the output end is a hand-eye calibration matrix, and the calibration matrix at the moment consists of the rotation matrix and the translation matrix and is converted into a 4 multiplied by 4 hand-eye calibration alignment matrix;
the hand-eye calibration accuracy test is performed, because the relative positions of the calibration plate and the mechanical arm base coordinate system are kept unchanged, the calibration effect is verified, the homogeneous matrix at the tail end of the mechanical arm under the mechanical arm base coordinate system, the hand-eye calibration homogeneous matrix and the homogeneous matrix of the calibration plate under the camera coordinate system corresponding to each image are multiplied, and then a translation matrix is extracted from the result;
and (3) carrying out error analysis, wherein the 14 groups of translation matrixes can observe the calibration effect, and the maximum error in the X, Y, Z direction is obtained to determine whether the requirements are met.
8. The method for sensing the mechanical arm based on multi-mode data fusion according to claim 1, wherein the motion process of the mechanical arm comprises:
logging in a mechanical arm, initializing initial angles of 6 joints by the mechanical arm, and ensuring that a camera at the tail end of the mechanical arm can observe a target object;
initializing the ROS, subscribing RGB topic information and depth information of the camera, carrying out image recognition, and returning recognition information, wherein the recognition information comprises a category and a pixel of a center point of a recognition frame;
establishing a state machine, and judging the running state of the mechanical arm and whether a camera identifies a target object or not; wherein, the running state of the mechanical arm includes: the mechanical arm is in a standby state, and the robot can normally walk at the moment; the mechanical arm is in an abnormal state, and the robot stops waiting at the moment; the mechanical arm is in a working state, and the robot stops waiting at the moment; the camera on the end of the mechanical arm does not observe the target;
identifying a target object, and judging whether the distance between the target object and a base coordinate system of the mechanical arm exceeds the control range of the mechanical arm;
newly starting a thread, releasing the ROS topic message in real time, and sending the running state of the real-time mechanical arm to the robot main control;
calculating the position of the target object under the base coordinate system of the mechanical arm according to the pixel and depth information of the center of the target object and the internal reference of the camera, and giving an initial gesture;
the mechanical arm is operated to the front of the target object, the target object is grabbed, a reasonable path planning is designed by utilizing the SDK of the mechanical arm, the mechanical arm is operated, and the mechanical arm is restored to an initial state after the mechanical arm is operated;
the mechanical arm tail end tool uses a mechanical arm dexterous hand with a touch system to grasp or operate the target object, and judges whether the target object is successfully grasped or not or whether the target object is successfully operated or not through the touch system and feedback information.
CN202311297470.4A 2023-10-09 2023-10-09 Mechanical arm sensing method based on multi-mode data fusion Pending CN117103277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311297470.4A CN117103277A (en) 2023-10-09 2023-10-09 Mechanical arm sensing method based on multi-mode data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311297470.4A CN117103277A (en) 2023-10-09 2023-10-09 Mechanical arm sensing method based on multi-mode data fusion

Publications (1)

Publication Number Publication Date
CN117103277A true CN117103277A (en) 2023-11-24

Family

ID=88796649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311297470.4A Pending CN117103277A (en) 2023-10-09 2023-10-09 Mechanical arm sensing method based on multi-mode data fusion

Country Status (1)

Country Link
CN (1) CN117103277A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117381800A (en) * 2023-12-12 2024-01-12 菲特(天津)检测技术有限公司 Hand-eye calibration method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117381800A (en) * 2023-12-12 2024-01-12 菲特(天津)检测技术有限公司 Hand-eye calibration method and system
CN117381800B (en) * 2023-12-12 2024-02-06 菲特(天津)检测技术有限公司 Hand-eye calibration method and system

Similar Documents

Publication Publication Date Title
JP2019508273A (en) Deep-layer machine learning method and apparatus for grasping a robot
US11911912B2 (en) Robot control apparatus and method for learning task skill of the robot
CN113677486A (en) System and method for constraint management of one or more robots
JP2013193202A (en) Method and system for training robot using human assisted task demonstration
WO2016193781A1 (en) Motion control system for a direct drive robot through visual servoing
CN113379849B (en) Robot autonomous recognition intelligent grabbing method and system based on depth camera
Lepora et al. Pose-based tactile servoing: Controlled soft touch using deep learning
CN117103277A (en) Mechanical arm sensing method based on multi-mode data fusion
Mavrakis et al. Analysis of the inertia and dynamics of grasped objects, for choosing optimal grasps to enable torque-efficient post-grasp manipulations
CN113103230A (en) Human-computer interaction system and method based on remote operation of treatment robot
Huang et al. Grasping novel objects with a dexterous robotic hand through neuroevolution
Shahzad et al. A vision-based path planning and object tracking framework for 6-DOF robotic manipulator
Nuzzi et al. Hands-Free: a robot augmented reality teleoperation system
Nandikolla et al. Teleoperation robot control of a hybrid eeg-based bci arm manipulator using ros
Makita et al. Offline direct teaching for a robotic manipulator in the computational space
US20220032468A1 (en) Robotic drawing
CN112384335A (en) System and method for natural task assignment for one or more robots
CN211890823U (en) Four-degree-of-freedom mechanical arm vision servo control system based on RealSense camera
CN114888768A (en) Mobile duplex robot cooperative grabbing system and method based on multi-sensor fusion
KR20230138487A (en) Object-based robot control
JP2021061014A (en) Learning device, learning method, learning model, detector, and gripping system
Lin Embedding Intelligence into Robotic Systems-Programming, Learning, and Planning
Rivera-Calderón et al. Online assessment of computer vision and robotics skills based on a digital twin
Silva et al. Aros: An anthropomorphic robot for human-robot interaction and coordination studies
Jiménez et al. Autonomous object manipulation and transportation using a mobile service robot equipped with an RGB-D and LiDAR sensor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination