CN117103277A

CN117103277A - Mechanical arm sensing method based on multi-mode data fusion

Info

Publication number: CN117103277A
Application number: CN202311297470.4A
Authority: CN
Inventors: 谢世朋; 苏士伟; 谢静岩
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2023-11-24

Abstract

The invention relates to the technical field of computer vision, and discloses a mechanical arm sensing method based on multi-mode data fusion, which comprises the following steps: performing camera calibration and hand-eye calibration; acquiring an image and a depth map by adopting an ROS (reactive oxygen species) driving camera, and calculating a center point and a pose matrix of a target object by adopting a YOLOv5 image recognition algorithm; designing a coordinate system transfer algorithm, and transferring a homogeneous matrix of the target object under a camera coordinate system to a homogeneous matrix under a mechanical arm base coordinate system; and designing a motion control algorithm to realize motion control of the mechanical arm and complete a set task. The invention integrates the RGB image, the depth image, the tactile system of the dexterous hand and other data of a plurality of modes to identify the target object, calculates the pose information of the center point of the object, operates the target object through the dexterous hand on the mechanical arm, improves the accuracy of sensing the operation target object by the tactile system of the dexterous hand, and realizes that the mechanical arm accurately grabs the target object.

Description

Mechanical arm sensing method based on multi-mode data fusion

Technical Field

The invention relates to the technical field of computer vision, in particular to a mechanical arm sensing method based on multi-mode data fusion.

Background

With the continuous development of the fields of computers, control theory, machine vision sensors and the like, the robot technology enters a high-speed development stage. Robots use robotic arms to perform more meaningful tasks and functions, which can improve production efficiency and quality of life in production and living. In factories, for example, robots can carry out cargo handling and loading and unloading tasks; in a welding workshop, the robot can help people to realize tasks such as welding, screwing and the like. When the robot executes various tasks, the end of the tool on the mechanical arm needs to be guided to reach the designated position through vision so as to effectively complete various designated tasks. The robot acquires environmental information by using RGB image information, depth map information and multi-mode sensor data such as a smart hand, identifies the category of a target object, estimates the pose of the target object, guides the tail end of a mechanical arm to reach the target object, and then executes various tasks, so that the robot is more intelligent. The RGB image may reflect real world features and the depth image may reflect key information such as real world dimensions. Through fusion of the RGB image and the depth image, the advantages of the RGB image and the depth image can be integrated, and the method is more suitable for various scenes in the practical application process.

The RGB image adopts the traditional YOLOv5 to identify the pixel center point of the object surface, the three-dimensional coordinates of the object surface can be calculated according to the depth information, and the three-dimensional coordinates are suitable for regular objects, but the YOLOv5 cannot identify the center point of the whole object. And the traditional sensing method has single mode, reduces the accuracy of sensing the operation target object by using the touch system of the dexterous hand, and can not realize accurate grabbing of the target object by the mechanical arm.

Disclosure of Invention

The invention provides a manipulator perception method based on multi-mode data fusion, which is used for identifying a target object by fusing data of a plurality of modes such as RGB images, depth images, a touch system of a smart hand and the like, calculating pose information of a center point of the object, operating the target object by the smart hand on the manipulator, improving accuracy of perceiving the operating target object by the touch system of the smart hand, and realizing accurate grabbing of the target object by the manipulator.

The invention provides a mechanical arm sensing method based on multi-mode data fusion, which comprises the following steps:

performing camera calibration and hand-eye calibration, wherein the camera calibration is performed by adopting a calibreateCAmera function of opencv, and the hand-eye calibration is performed by adopting a calibre handeye function of opencv;

acquiring an image and a depth map by adopting an ROS (reactive oxygen species) driving camera, and calculating a center point and a pose matrix of a target object by adopting a YOLOv5 image recognition algorithm;

designing a coordinate system transfer algorithm, and transferring a homogeneous matrix of the target object under a camera coordinate system to a homogeneous matrix under a mechanical arm base coordinate system;

and designing a motion control algorithm to realize motion control of the mechanical arm and complete a set task.

Further, the step of designing the motion control algorithm to realize motion control of the mechanical arm and complete the set task comprises the following steps:

judging whether a target object is identified by using a YOLOv5 image identification algorithm; if not, moving the robot body and continuing to identify the target object; if yes, obtaining the pose of the object through the depth information and the camera internal parameters;

judging whether an object center point needs to be identified, if so, starting VoteNet three-dimensional point cloud identification to obtain an object center point pose, and calculating an object surface pixel center point by combining with YOLOv5 pixel center point weighting, so as to perform a coordinate system transfer algorithm; if not, directly carrying out a coordinate system transfer algorithm;

performing a mechanical arm motion control algorithm, and judging whether the mechanical arm can reach the position; if not, returning to the step of moving the robot body to identify the target object; if yes, the mechanical arm is operated, and a dexterous hand is adopted to grasp or operate the target object until the task is completed.

Further, the step of acquiring an image and a depth map by using the ROS to drive the camera and calculating a center point and a pose matrix of the target object by using a YOLOv5 image recognition algorithm comprises the following steps:

compiling an ROS camera to drive and drive the camera, calling topics of RGB images and depth information of the camera, converting the RGB images into Mat matrixes of a cv library through cv_bridge, and storing pictures into jpg files;

loading a model of deep learning YOLOv5, identifying, calculating pixels of each target object pixel center point, and storing an identification result as a structure type;

carrying out VoteNet three-dimensional point cloud identification, loading a model for identification, determining the three-dimensional coordinates of the center of each target object, and assisting a YOLOv5 image identification algorithm to obtain the surface pixel position;

the depth camera obtains depth information of the position of the pixel, and obtains three-dimensional coordinates (x, y, z) of a center point of the target object according to the internal reference matrix, wherein the depth information is d; the coordinates of the center pixel point of the target object are known as follows: (u, v) internal reference matrix:three-dimensional coordinates: z=d; x= (u-c) _x )*z/f _x ；y＝(u-c _y )*z/f _y 。

Further, the step of designing a coordinate system transfer algorithm to transfer the homogeneous matrix of the target object under the camera coordinate system to the homogeneous matrix under the mechanical arm base coordinate system includes:

the object can be identified by utilizing YOLOv5, a translation vector (x, y, z) of the object is obtained, a rotation vector of the default object is (0, 0), the default object is converted into a rotation matrix, and then the rotation matrix and the translation vector are spliced into a pose matrix M of 4*4 _target2cam ；

The hand-eye calibration matrix M is obtained through hand-eye calibration _cam2gripper ；

The homogeneous matrix M of the tail end of the mechanical arm under the mechanical arm base coordinate system can be obtained through the interface of the mechanical arm _gripper2base Calculating a homogeneous matrix M of the tail end of the mechanical arm under the mechanical arm base coordinate system through a formula _target2base The formula is M _target2base ＝M _gripper2base *M _cam2gripper *M _target2cam ；

Calculating the pose homogeneous matrix M which the mechanical arm actually should reach _{gripperWithTool2base} Through the conversion matrix M of the tool tail end under the mechanical arm TCP mechanical arm tail end coordinate system _tool2gripper The method comprises the steps of carrying out a first treatment on the surface of the Comprising the following steps: the homogeneous matrix of the target object under the basic coordinate system is equal to the homogeneous matrix of the tool end under the basic coordinate system, namely M _target2base ＝M _tool2base The method comprises the steps of carrying out a first treatment on the surface of the According to the coordinate system conversion relation, the tail end coordinate system of the mechanical armTransferring homogeneous matrix of lower tool end to mechanical arm polar coordinate system _tool2base ＝M _{gripperWithTool2base} *M _tool2gripper ；

According to the two formulas of the steps, a homogeneous matrix M of the position to be reached of the tail end of the mechanical arm under the mechanical arm base coordinate system can be calculated _{gripperWithTool2base} ＝M _target2base *(M _tool2gripper ) ^-1 ；

The homogeneous matrix of 4*4 is decomposed into 3*3 rotation matrix and translation vector Location by matrix decomposition: (x, y, z), the rotation matrix is converted into a rotation vector (Rx, ry, rz), and the translation vector and the rotation vector are provided to a moveL interface of the mechanical arm to perform linear motion to a specified position.

Further, in the step of performing camera calibration and hand-eye calibration, firstly obtaining an internal parameter of the camera includes:

after the camera is started, the topic message camera_info of the camera is checked, and the internal reference coefficient and distortion coefficient of the camera can be checked;

calibrating the camera, shooting a plurality of pictures with different poses on the calibration plate by adopting the camera, wherein the three-dimensional actual coordinates of each corner point of the calibration plate and the two-dimensional pixel coordinates on the picture are in one-to-one correspondence, and thus the correspondence between the world actual coordinates and the pixel coordinates can be obtained;

calibrating the camera by utilizing a calibretecamera function of opencv, inputting the 2D point coordinates and the 3D point coordinates of a calibration plate and the size of a grid of the calibration plate, and outputting an internal reference matrix and distortion coefficients of the camera and external reference coefficients, wherein the external reference coefficients comprise rotation vectors and translation vectors.

Further, the step of calibrating the camera specifically includes:

preparing a calibration plate with 11 multiplied by 8 grids, and fixing the calibration plate on a wall surface;

the position of a base coordinate system of the mechanical arm is kept unchanged, the tail end of the mechanical arm is moved by using a mechanical arm demonstrator, and a camera moves along with the tail end of the mechanical arm;

the tail end of the mechanical arm moves to a position, and when a picture is taken, the position and the gesture information displayed on the demonstrator are recorded;

moving the tail end of the mechanical arm to another position, repeatedly shooting and recording, shooting 14 pictures with different positions and angles in total, and recording 14 groups of data;

programming by adopting VScode programming software and C++ programming language, setting parameters of the number of pictures, the size of each checkerboard on the calibration plate and the number of corner points of each row and each column on the calibration plate, extracting 88 corner points on the calibration plate and storing, and then carrying out sub-pixel refinement on each corner point, wherein the aim is to refine the corner points extracted in the prior course and store sub-pixel corner points;

assuming that a calibration plate is placed on a plane with Z=0 in a world coordinate system, initializing three-dimensional coordinates of each corner point on the calibration plate, calibrating a camera by using a calibrecat ePara function of opencv, wherein an input end is the three-dimensional coordinates and corresponding two-dimensional coordinates of each corner point, and an output end is an internal reference matrix and distortion coefficients of the camera, and a rotation vector and a translation vector of each picture;

and evaluating the calibration result, carrying out reprojection calculation on the three-dimensional coordinates of each angular point through the obtained internal and external parameters of the camera to obtain a new projection point, and then carrying out error estimation with the previous two-dimensional projection point to obtain the average pixel error of each image.

Further, the step of performing hand-eye calibration specifically includes:

the camera is calibrated to obtain an external parameter matrix corresponding to 14 groups of images, namely a rotation vector and a translation vector, and the representation form of the gesture rotation vector is converted into the representation form of the rotation matrix;

the pose matrix at the tail end of the mechanical arm under the mechanical arm base coordinate system corresponding to each image consists of a translation vector and Euler angles, and the representation form of the Euler angles is converted into the representation form of a rotation matrix for the purpose of taking the pose matrix as an input end of a hand-eye calibration matrix for calculating opencv in the follow-up process;

adopting a calibre handeye function of opencv, wherein the input end is a rotation matrix and a translation vector of the tail end of the mechanical arm under a mechanical arm coordinate system and a rotation matrix and a translation vector of a calibration plate under a camera coordinate system, and the output end is a hand-eye calibration matrix, and the calibration matrix at the moment consists of the rotation matrix and the translation matrix and is converted into a 4 multiplied by 4 hand-eye calibration alignment matrix;

the hand-eye calibration accuracy test is performed, because the relative positions of the calibration plate and the mechanical arm base coordinate system are kept unchanged, the calibration effect is verified, the homogeneous matrix at the tail end of the mechanical arm under the mechanical arm base coordinate system, the hand-eye calibration homogeneous matrix and the homogeneous matrix of the calibration plate under the camera coordinate system corresponding to each image are multiplied, and then a translation matrix is extracted from the result;

and (3) carrying out error analysis, wherein the 14 groups of translation matrixes can observe the calibration effect, and the maximum error in the X, Y, Z direction is obtained to determine whether the requirements are met.

Further, the motion process of the mechanical arm comprises:

logging in a mechanical arm, initializing initial angles of 6 joints by the mechanical arm, and ensuring that a camera at the tail end of the mechanical arm can observe a target object;

initializing the ROS, subscribing RGB topic information and depth information of the camera, carrying out image recognition, and returning recognition information, wherein the recognition information comprises a category and a pixel of a center point of a recognition frame;

establishing a state machine, and judging the running state of the mechanical arm and whether a camera identifies a target object or not; wherein, the running state of the mechanical arm includes: the mechanical arm is in a standby state, and the robot can normally walk at the moment; the mechanical arm is in an abnormal state, and the robot stops waiting at the moment; the mechanical arm is in a working state, and the robot stops waiting at the moment; the camera on the end of the mechanical arm does not observe the target;

identifying a target object, and judging whether the distance between the target object and a base coordinate system of the mechanical arm exceeds the control range of the mechanical arm;

newly starting a thread, releasing the ROS topic message in real time, and sending the running state of the real-time mechanical arm to the robot main control;

calculating the position of the target object under the base coordinate system of the mechanical arm according to the pixel and depth information of the center of the target object and the internal reference of the camera, and giving an initial gesture;

the mechanical arm is operated to the front of the target object, the target object is grabbed, a reasonable path planning is designed by utilizing the SDK of the mechanical arm, the mechanical arm is operated, and the mechanical arm is restored to an initial state after the mechanical arm is operated;

the mechanical arm tail end tool uses a mechanical arm dexterous hand with a touch system to grasp or operate the target object, and judges whether the target object is successfully grasped or not or whether the target object is successfully operated or not through the touch system and feedback information.

The beneficial effects of the invention are as follows:

1. establishing visual guidance of cooperative work between hands and eyes, calculating a hand and eye calibration matrix through hand and eye calibration, and obtaining internal parameters of a camera; pose information of a target object under the base coordinate system of the mechanical arm is obtained through RGB images and depth images, then the pose information is transmitted to a controller, the pose information of the target object under the base coordinate system of the mechanical arm is obtained through corresponding calculation, the mechanical arm is operated to reach a designated position, the position at the moment is the initial position for executing a task, and the precision of the pose information obtained through the camera is the premise that the mechanical arm can accurately reach the designated position.

2. The improved recognition method of YOLOv5 combined with the volntet three-dimensional point cloud target detection is provided, a two-dimensional RGB image and a three-dimensional point cloud image are fused, a visual guiding system is constructed, and the grabbing and operating tasks of a target object are realized.

3. The multi-mode data perception method based on two-dimensional RGB images, depth images, point clouds and smart hand touch is provided, the advantages of each mode are utilized to be combined, image recognition is carried out by using improved YOLOv5, the accuracy of pixel point recognition of the YOLOv5 image is enhanced by utilizing a surrounding frame recognized by the votenet three-dimensional point clouds in combination with depth information of a depth camera, the accuracy of operation target object is perceived by utilizing a touch system of the smart hand, and the mechanical arm can accurately grasp the target object;

4. the RGB image can extract geometric information, semantic information and texture information, and the characteristics of the point cloud are enhanced; and (3) acquiring the center point of the bounding box of the object by adopting an improved votenet three-dimensional point cloud target detection method, and carrying out weighted calculation by combining with the pixel center point identified by YOLOv5, thereby improving the accuracy of the pixel coordinate identification of the center point.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The traditional visual pose estimation method is a pose recognition method based on Aruco codes, the Aruco codes are required to be attached near a target object, the comparison is obvious, and the appearance is not attractive, and the deep learning pose recognition method based on the invention firstly uses YOLOv5 to recognize the target object to obtain a detection frame of the target object, and belongs to the category; and then aligning the two-dimensional RGB image with the depth image to obtain the depth information of the center point pixel of the target object, and further obtaining the position information of the target object. The specific flow comprises the following steps:

as shown in fig. 1, the invention provides a mechanical arm sensing method based on multi-mode data fusion, which comprises the following steps:

s1, camera calibration and hand-eye calibration are carried out, wherein the camera calibration is carried out by adopting a calibreateCAmera function of opencv, and the hand-eye calibration is carried out by adopting a calibreateHandeye function of opencv.

S1.1, firstly, obtaining an internal parameter coefficient of a camera, wherein the main method for obtaining the internal parameter coefficient of the camera is as follows:

s1.1.1, installing a camera driver and an ROS, and after the camera is started, checking a topic message camera_info of the camera to check an internal parameter coefficient and a distortion coefficient of the camera;

s1.1.2, calibrating the camera by using a correcting calibration method, shooting a plurality of pictures with different poses by using the camera to the calibration plate, wherein the three-dimensional actual coordinates of each corner point of the calibration plate and the two-dimensional pixel coordinates on the picture are in one-to-one correspondence, and the corresponding relation between the world actual coordinates and the pixel coordinates can be obtained.

S1.1.3 calibrating the camera by utilizing a calibrecode function of opencv, inputting the 2D point coordinates and the 3D point coordinates of the calibration plate and the size of the grid of the calibration plate, and outputting an internal reference matrix and distortion coefficients of the camera and external reference coefficients, wherein the external reference coefficients comprise rotation vectors and translation vectors.

S1.2, a specific calibration method for camera calibration and hand-eye calibration comprises the following steps:

s1.2.1, camera calibration: the camera calibration method of step S1.1.3 is selected because the rotation and translation vectors of each calibration plate picture need to be acquired in order to provide input data for subsequent hand-eye calibration matrix determinations. The obtained internal reference matrix, distortion coefficient and external reference coefficient are shown in the figure. The specific implementation method comprises the following steps:

step one: preparing a calibration plate with 11 multiplied by 8 grids, and fixing the calibration plate on a wall surface;

step two: the position of the base coordinate system of the mechanical arm is kept unchanged, the mechanical arm demonstrator is used for moving the tail end of the mechanical arm, and the camera is fixed at the tail end of the mechanical arm by using the scheme that eyes are on hands, so that the camera moves along with the tail end of the mechanical arm;

step three: the tail end of the mechanical arm moves to a position, and when a picture is taken, the position and the gesture information displayed on the demonstrator are recorded;

step four: and (3) moving the tail end of the mechanical arm to another position, repeating the third shooting and recording, shooting 14 pictures at different positions and angles altogether, and recording 14 groups of data.

Step five: programming by adopting VScode programming software and C++ programming language, setting parameters such as the number of pictures, the size of each checkerboard on a calibration plate, the number of corner points of each row and each column on the calibration plate and the like, extracting 88 corner points on the calibration plate and storing, and then carrying out sub-pixel refinement on each corner point, wherein the aim is to refine the corner points extracted in the prior course and store sub-pixel corner points;

step six: the calibration board is assumed to be placed on a plane with Z=0 in a world coordinate system, the three-dimensional coordinates of each corner point on the calibration board are initialized, the camera is calibrated by using the calibretecamera function of opencv, the input end is the three-dimensional coordinates and the corresponding two-dimensional coordinates of each corner point, and the output end is an internal reference matrix and distortion coefficients of the camera, and the rotation vector and the translation vector of each picture.

Step seven: evaluating the calibration result, carrying out reprojection calculation on the three-dimensional coordinates of each angular point through the obtained internal and external parameters of the camera to obtain a new projection point, and then carrying out error estimation with the previous two-dimensional projection point to obtain an average pixel error of each image;

step eight: and (5) storing a calibration result.

S1.2.2, hand-eye calibration: the purpose of hand-eye calibration is to design an algorithm to obtain a hand-eye calibration matrix, and the conversion relation between a mechanical arm terminal coordinate system and a camera coordinate system is known by solving the hand-eye calibration matrix according to the scheme of the adopted eyes on the hand. The specific implementation method comprises the following steps:

step one: when the camera is calibrated, the external parameter matrix corresponding to 14 groups of images, namely the rotation vector and the translation vector, is obtained, and then the representation form of the gesture rotation vector is converted into the representation form of the rotation matrix;

step two: the pose matrix at the tail end of the mechanical arm under the mechanical arm base coordinate system corresponding to each image consists of a translation vector and Euler angles, and the representation form of the Euler angles needs to be converted into the representation form of a rotation matrix in order to take the pose matrix as the input end of the opencv calculation hand-eye calibration matrix in the follow-up process.

Step three: the calibre handeye function of opencv is used, the input end is a rotation matrix and a translation vector of the tail end of the mechanical arm under the mechanical arm coordinate system and a rotation matrix and a translation vector of the calibration plate under the camera coordinate system, the output end is a hand-eye calibration matrix, and the calibration matrix at the moment consists of the rotation matrix and the translation matrix and needs to be converted into a 4-by-4 hand-eye calibration alignment matrix.

Step four: and (3) testing the hand-eye calibration accuracy, wherein the calibration effect is verified because the relative positions of the calibration plate and the mechanical arm base coordinate system are kept unchanged. The homogeneous matrix at the tail end of the mechanical arm under the mechanical arm base coordinate system, the homogeneous matrix for calibrating the hand and the eye and the homogeneous matrix for calibrating the board under the camera coordinate system corresponding to each image are multiplied, and then the translation matrix is extracted from the result.

Step five: error analysis, namely, the calibration effect can be observed by 14 groups of translation matrixes, the minimum X direction is-1.120 m, the maximum X direction is-1.196 m, and the maximum error is 0.004m; the minimum of the Y direction is 0.121m, the minimum of the Y direction is 0.132m, and the maximum error of the Y direction is 0.011m; the minimum Z direction is-0.102 m, the minimum Z direction is-0.095 m, and the maximum Z direction error is 0.007m. The maximum error in all directions of XYZ is 1.1cm, which meets the requirements.

S2, acquiring an image and a depth map by adopting an ROS (reactive oxygen species) driving camera, and calculating a central point and a pose matrix of a target object by adopting a YOLOv5 image recognition algorithm; the specific method comprises the following steps:

s2.1, compiling an ROS camera driver, and driving a camera; instructions to: roslaunch orbdec_camera gemini2.Launch.

S2.2, calling topics of RGB images and depth information of a camera: the RGB image is converted into Mat matrix of a cv library through cv_bridge, and the pictures are stored into jpg files;

s2.3, loading a model of deep learning YOLOv5, identifying, calculating pixels of each target object pixel center point, and storing an identification result as a structure type;

in the YOLOv5 image recognition, collecting a data set, shooting pictures at different positions and postures of a target object, and then marking the pictures by using labelIMG; sending the marked pictures to YOLOv5 for training, and training a model; then converting the pt model based on the pyTorch into an onnx intermediate model; using Atlas200DK developer suite, after deployment of the environment, the original model needs to be converted into a model suitable for the lifting hardware using ATC tools (. Om); copying the converted model into a model folder, reasoning pictures to be identified, identifying a target object, and giving out two-dimensional pixel coordinates of the target object; according to the internal parameters of the camera and the depth information of the camera at the pixel points, solving the three-dimensional coordinates of the target object under the camera coordinate system; . The default robot is opposite to the object, and the initial pose of the camera on the mechanical arm can be adjusted to be opposite to the object, so the initial pose defaults to (0, 0), and if the robot is not opposite to the target object, the pose needs to be calculated through the angles of the vehicle body and the target object.

S2.4, voteNet three-dimensional point cloud recognition, namely, loading a model for recognition, determining the three-dimensional coordinates of the center of each target object, and assisting a YOLOv5 image recognition algorithm to obtain the surface pixel position.

S2.5, the depth camera obtains depth information of the position of the pixel, and three-dimensional coordinates (x, y, z) of a center point of the target object are obtained according to the internal reference matrix, wherein the depth information is d;

the coordinates of the center pixel point of the target object are known as follows: (u, v) internal reference matrix:

three-dimensional coordinates: z=d; x= (u-c) _x )*z/f _x ；y＝(u-c _y )*z/f _y 。

S3, designing a coordinate system transfer algorithm, and transferring the homogeneous matrix of the target object under the camera coordinate system to the homogeneous matrix under the mechanical arm base coordinate system; the specific method comprises the following steps:

s3.1, a target object can be identified by using YOLOv5, a translation vector (x, y, z) of the target object is obtained, a rotation vector of a default target object is (0, 0) and is converted into a rotation matrix, and then the rotation matrix and the translation vector are spliced into a pose matrix M of 4*4 _target2cam ；

S3.2, obtaining a hand-eye calibration matrix M through hand-eye calibration _cam2gripper ；

S3.3, obtaining a homogeneous matrix M of the tail end of the mechanical arm under the mechanical arm base coordinate system through an interface of the mechanical arm _gripper2base Calculating a homogeneous matrix M of the tail end of the mechanical arm under the mechanical arm base coordinate system by the following formula _target2base ：

M _target2base ＝M _gripper2base *M _cam2gripper *M _target2cam

S3.4, considering that the tail end of the mechanical arm needs to be provided with a tool, such as a dexterous hand, the mechanical arm real time is calculatedPose homogeneous matrix M that should be reached _{gripperWithTool2base} Through the conversion matrix M of the tool tail end under the mechanical arm TCP mechanical arm tail end coordinate system _tool2gripper ；

The homogeneous matrix of the target object under the basic coordinate system is equal to the homogeneous matrix of the tool tail end under the basic coordinate system, namely:

M _target2base ＝M _tool2base

according to the coordinate system conversion relation, transferring the homogeneous matrix of the tool tail end under the mechanical arm tail end coordinate system to the mechanical arm polar coordinate system:

M _tool2base ＝M _{gripperWithTool2base} *M _tool2gripper

s3.5, calculating a homogeneous matrix of the position to be reached of the tail end of the mechanical arm under the mechanical arm base coordinate system according to two formulas in the step S3.4:

M _{gripperWithTool2base} ＝M _target2base *(M _tool2gripper ) ^-1

And S4, designing a motion control algorithm to realize motion control of the mechanical arm and complete a set task.

As shown in fig. 1, completing grabbing or manipulating the target object includes:

s41, judging whether a target object is identified by adopting a YOLOv5 image identification algorithm; if not, moving the robot body and continuing to identify the target object; if yes, obtaining the pose of the object through the depth information and the camera internal parameters;

s42, judging whether an object center point needs to be identified, if so, starting VoteNet three-dimensional point cloud identification to obtain an object center point pose, calculating an object surface pixel center point by combining with YOLOv5 pixel center point weighting, and further carrying out a coordinate system transfer algorithm; if not, directly carrying out a coordinate system transfer algorithm;

s43, performing a mechanical arm motion control algorithm, and judging whether the mechanical arm can reach the position; if not, returning to the step of moving the robot body to identify the target object; if yes, the mechanical arm is operated, and a dexterous hand is adopted to grasp or operate the target object until the task is completed.

Calculating the pose matrix of the object comprises adopting VScode programming software and C++ programming language to carry out algorithm design, and the specific implementation method comprises the following steps:

step one: observing an object in real time through a camera, and taking a picture;

step two: obtaining a pose matrix of a target object center point under a camera coordinate system through the YOLOv5 image recognition algorithm, the depth image information and the camera internal reference matrix;

step three: the algorithm is designed, and the pose matrix of the object is converted into a 4 multiplied by 4 homogeneous matrix through VScode software and C++ programming, so that matrix operation is convenient to use;

step four: acquiring a pose matrix of the lower end of a mechanical arm base coordinate system from a mechanical arm controller, and then converting a representation method of Euler angles and translation matrices into a representation method of homogeneous matrices;

step five: the hand-eye calibration homogeneous matrix calculated by the previous hand-eye calibration is added, three homogeneous matrices are obtained at the moment, matrix multiplication is carried out on the three matrices, the homogeneous matrix of the target object under the mechanical arm base coordinate system can be obtained, and then the homogeneous matrix is converted into a rotation vector and a translation vector.

Step six: and (3) calibrating by the mechanical arm TCP to obtain the position and the gesture from the tail end of the smart hand of the end effector to the tail end of the mechanical arm, and transferring the gesture of the tail end of the mechanical arm under the base coordinate system of the mechanical arm to the gesture of the tail end of the smart hand under the base coordinate system.

The invention is developed by adopting a UR5 mechanical arm, and the UR5 flexible robot is very suitable for optimizing lightweight collaboration processes, such as picking, placing and testing. UR5 is easy to program and enables quick setup, achieving a desirable balance between machine size and power, being an ideal choice for automated lightweight machining tasks. The UR robot can simulate the movement range of human arms, can carry out dragging teaching, and can set the road point only by moving the robot to the target position. The mechanical arm rotates or translates according to the rotation vector and the translation vector to reach the appointed position. The movement process of the mechanical arm is as follows:

a. logging in a mechanical arm, initializing initial angles of 6 joints by the mechanical arm, and ensuring that a camera at the tail end of the mechanical arm can observe a target object;

initializing the ROS, subscribing RGB topic information and depth information of the camera, providing topic information of an RGB image for Atlas200DK developer suite to perform image recognition, and returning recognition information, wherein the recognition information comprises category and pixels of a recognition frame center point;

c. and establishing a state machine, and judging the running state of the mechanical arm and whether the camera recognizes the target object. Totally divided into 4 states: (1) The mechanical arm is in a standby state, and the robot can normally walk at the moment; (2) The mechanical arm is in an abnormal state, and the robot stops waiting at the moment; (3) The mechanical arm is in a working state, and the robot stops waiting at the moment; (4) No object is observed by the camera on the end of the robotic arm.

d. Identifying a target object, and judging whether the distance between the target object and a base coordinate system of the mechanical arm exceeds the control range of the mechanical arm;

e. and newly starting a thread, releasing the ROS topic message in real time, and sending the running state of the real-time mechanical arm to the robot main control.

f. According to the pixel and depth information of the center of the target object and the internal parameters of the camera, the position of the target object under the base coordinate system of the mechanical arm can be calculated, and an initial gesture is given.

g. The mechanical arm is operated to the front of the target object, the target object is grabbed, a reasonable path planning is designed by utilizing the SDK of the mechanical arm, the mechanical arm is operated, and the mechanical arm is restored to an initial state after the mechanical arm is operated.

h. The mechanical arm tail end tool uses a mechanical arm dexterous hand with a touch system to grasp or operate the target object, and judges whether the target object is successfully grasped or not or whether the target object is successfully operated or not through the touch system and feedback information.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes using the descriptions and drawings of the present invention or directly or indirectly applied to other related technical fields are included in the scope of the invention.

Claims

1. A mechanical arm sensing method based on multi-mode data fusion is characterized by comprising the following steps:

2. The method for sensing the mechanical arm based on multi-mode data fusion according to claim 1, wherein the step of designing the motion control algorithm to realize motion control of the mechanical arm and complete the given task comprises the following steps:

3. The method for sensing a mechanical arm based on multi-mode data fusion according to claim 1, wherein the step of acquiring the image and the depth map by using the ROS-driven camera and calculating the center point and the pose matrix of the target by using the YOLOv5 image recognition algorithm comprises the following steps:

4. The method for sensing a manipulator arm based on multi-modal data fusion according to claim 1, wherein the step of designing a coordinate system transfer algorithm to transfer a homogeneous matrix of the target object in a camera coordinate system to a homogeneous matrix of the target object in a manipulator arm base coordinate system comprises:

Calculating the pose homogeneous matrix M which the mechanical arm actually should reach _{gripperWithTool2base} Through the conversion matrix M of the tool tail end under the mechanical arm TCP mechanical arm tail end coordinate system _tool2gripper The method comprises the steps of carrying out a first treatment on the surface of the Comprising the following steps: the homogeneous matrix of the target object under the basic coordinate system is equal to the homogeneous matrix of the tool end under the basic coordinate system, namely M _target2base ＝M _tool2base The method comprises the steps of carrying out a first treatment on the surface of the According to the coordinate system conversion relation, transferring the homogeneous matrix of the tool tail end under the mechanical arm tail end coordinate system to M under the mechanical arm polar coordinate system _tool2base ＝M _{gripperWithTool2base} *M _tool2gripper ；

5. The method for sensing a manipulator arm based on multi-modal data fusion according to claim 1, wherein in the steps of calibrating a camera and calibrating a hand and an eye, reference coefficients of the camera are obtained first, comprising:

6. The method for sensing a manipulator arm based on multi-modal data fusion of claim 5, wherein the step of calibrating the camera specifically comprises:

7. The method for sensing a manipulator arm based on multi-modal data fusion of claim 6, wherein the step of performing hand-eye calibration comprises:

8. The method for sensing the mechanical arm based on multi-mode data fusion according to claim 1, wherein the motion process of the mechanical arm comprises: