CN116587280A

CN116587280A - Robot 3D laser vision disordered grabbing control method, medium and system

Info

Publication number: CN116587280A
Application number: CN202310674934.2A
Authority: CN
Inventors: 吴清生; 黄华飞; 王安丽
Original assignee: Anhui Yunhua Intelligent Equipment Co ltd
Current assignee: Anhui Yunhua Intelligent Equipment Co ltd
Priority date: 2023-06-07
Filing date: 2023-06-07
Publication date: 2023-08-15

Abstract

The invention provides a robot 3D laser vision disordered grabbing control method, medium and system, which belong to the technical field of machine grabbing, and comprise the following steps: acquiring a three-dimensional diagram of a robot, a clamp and a material frame; the three-dimensional model of the imported workpiece is compared with the 3D image shot by the camera, and the pose of the workpiece is identified; acquiring a pose conversion relation between the robot and the 3D camera, and completing hand-eye calibration between the robot and the 3D camera; acquiring a 2D image, a depth image and a point cloud view of a target workpiece stack obtained by a 3D camera, and forming a workpiece stack fusion image; according to a pre-trained unordered grabbing model, calculating a workpiece pile fusion image to obtain an optimal grabbing workpiece and a pose thereof; the robot is controlled to grasp the workpiece according to the optimal grasping workpiece and the pose thereof; the method, the medium and the system can solve the technical problem that the grabbing efficiency is low due to poor grabbing instantaneity when the robot performs unordered grabbing by utilizing 3D laser vision.

Description

Robot 3D laser vision disordered grabbing control method, medium and system

Technical Field

The invention belongs to the technical field of machine grabbing, and particularly relates to a robot 3D laser vision disordered grabbing control method, medium and system.

Background

So far, on the basis of the machine vision technology, the research on the industrial robot is more and more extensive, and high attention is paid, more scientific and technical achievements are presented, and better application is achieved in actual production, but because the automatic application speed presents an ascending situation, the defects and defects of the traditional industrial production mode are increasingly revealed, and the modern industrial production needs are difficult to meet. Some businesses also sort products by manual methods, such as sealing boxes, boxing, handling, etc. In order to reduce the labor intensity of workers, the machine vision technology for improving the production benefit becomes an optimal application mode, and the traditional operation method is effectively improved and promoted. Nowadays, more and more industrial robot systems integrate visual interfaces, and the visual technology is matched with a robot to realize the positioning, classification and detection of a target object. The core of the machine vision is to process the image collected by the camera, extract the characteristic information of the image, judge the upper layer semantic of the image, further replace the human eyes and the human brain to complete the appointed task, and even complete the task which can not be completed by the human eyes in the special environment.

In the actual production process, the characteristics of the shape, the size, the surface texture and the like of the objects are greatly different, and meanwhile, the placement state of the objects also has the characteristic of disorder. The traditional grabbing control method often needs to model an object in advance and grab the object through a preset grabbing strategy. However, when the method is used for dealing with unordered objects, the grabbing efficiency is low, and the requirements on the shape and the placement state of the objects are high, so that the method is limited in practical application.

In order to solve the above problems, researchers have begun exploring the use of computer vision techniques to assist in achieving the control of the gripping of the robot. The computer vision technology can acquire the position, the posture and other information of the object through the perception and the analysis of the scene, thereby providing basis for the grabbing of the robot. However, conventional computer vision techniques are mainly based on two-dimensional image processing, and have limited capturing effects on objects in three-dimensional space. Therefore, how to use the three-dimensional vision technology to realize the rapid and accurate grabbing of disordered objects becomes a problem to be solved in the field of robot grabbing control.

In recent years, three-dimensional laser vision techniques have been widely focused and studied. The three-dimensional laser vision technology obtains three-dimensional point cloud data of an object through a laser scanner, and then identifies and positions the object through a point cloud processing algorithm. Compared with the traditional two-dimensional vision technology, the three-dimensional laser vision technology has higher precision in the aspect of space information acquisition and has greater potential for realizing disordered grabbing of the robot.

However, the existing robot grabbing control method based on the three-dimensional laser vision technology still has some defects. Firstly, in the aspect of point cloud data processing, the existing method often adopts a complex calculation method and model, so that the calculated amount is large, and the real-time performance is poor. Secondly, in the aspect of grabbing strategies, the existing method mainly depends on the characteristics and rules of manual design, the generalization capability of the grabbing strategies is weak, and the grabbing effect on different types of objects is limited. In addition, when the existing method is used for processing unordered objects, multiple grabbing attempts are needed, and grabbing efficiency is low.

Disclosure of Invention

In view of the above, the invention provides a control method, medium and system for unordered grabbing of a robot 3D laser vision, which can solve the technical problem of lower grabbing efficiency caused by poor grabbing instantaneity when the robot utilizes the 3D laser vision to unordered grab.

The invention is realized in the following way:

the first aspect of the invention provides a robot 3D laser vision disordered grabbing control method, which comprises the following steps:

s10, acquiring a three-dimensional diagram of a robot, a clamp and a material frame;

s20, comparing the three-dimensional model of the imported workpiece with a 3D image shot by a camera, and identifying the pose of the workpiece;

S30, acquiring a pose conversion relation between the robot and the 3D camera, and completing hand-eye calibration between the robot and the 3D camera;

s40, acquiring a 2D image, a depth image and a point cloud view of a target workpiece stack obtained by a 3D camera, and forming a workpiece stack fusion image;

s50, calculating a workpiece pile fusion image according to a pre-trained unordered grabbing model to obtain an optimal grabbing workpiece and a pose thereof;

s60, controlling a robot to grasp the workpiece according to the optimal grasping workpiece and the pose thereof;

and when the optimal workpiece grabbing is the workpiece grabbing, the workpiece with the least influence on the pose of other workpieces is grabbed.

On the basis of the technical scheme, the robot 3D laser vision disordered grabbing control method can be further improved as follows:

the step of comparing the three-dimensional model of the imported workpiece with the 3D image shot by the camera to identify the pose of the workpiece specifically comprises the following steps:

preprocessing a three-dimensional model of a workpiece to obtain a point cloud representation;

extracting the features of the 3D image shot by the camera and the imported workpiece three-dimensional model by adopting a PFH or FPFH algorithm;

performing feature matching, and adopting nearest neighbor searching and a random sample consistency algorithm;

Estimating a pose transformation matrix between the feature matching point pairs;

and applying the estimated pose transformation matrix to the imported workpiece three-dimensional model to obtain the pose of the workpiece three-dimensional model in the 3D image shot by the camera.

The method comprises the steps of acquiring the pose conversion relation between the robot and the 3D camera and completing the hand-eye calibration between the robot and the 3D camera, and specifically comprises the following steps:

setting parameters of a 3D camera and a calibration plate;

controlling the 3D camera to shoot and collect calibration plate data of different poses;

calculating a calibration result according to the added calibration point column;

and optimizing and reducing errors of the calculation result according to the calibration precision, and finally obtaining the pose conversion relation between the robot and the 3D camera to finish the hand-eye calibration between the robot and the 3D camera.

The step of acquiring a 2D image, a depth image and a point cloud view of a target workpiece stack obtained by a 3D camera and forming a workpiece stack fusion image specifically comprises the following steps:

acquiring a 2D image, a depth image and a point cloud view of a target workpiece stack obtained by a 3D camera;

registering the 2D image and the depth image to obtain a depth image corresponding to the 2D image;

registering the point cloud data with the 2D image, and endowing color information to the point cloud data;

And carrying out downsampling and filtering processing on the fused point cloud data so as to reduce data quantity and noise.

The method specifically comprises the steps of establishing and training the unordered grabbing model, wherein the step specifically comprises the following steps of:

constructing a training sample comprising a plurality of pairs of workpiece pile images; the workpiece pile image pair comprises a workpiece pile fusion image, which is marked as a first image, and a workpiece pile fusion image after one workpiece is randomly taken out in the first image, which is marked as a second image;

and establishing an unordered grabbing model prototype by using the convolutional neural network, and training by using a training sample to obtain an unordered grabbing model.

The method specifically comprises the following steps of controlling a robot to grasp a workpiece according to the optimal workpiece grasping and the pose thereof:

step 1: recording the workpiece stack fusion image as a first fusion image;

step 2: controlling the robot to grasp the optimal grasping workpiece in the first fusion image;

step 3: after the optimal grabbing of the workpieces is completed, acquiring a second fusion image of the workpiece stack at the moment;

step 4: calculating the similarity of the second fusion image and the first fusion image, if the similarity is greater than or equal to a grabbing threshold value, deleting the image obtained by optimally grabbing the workpiece from the first fusion image as the first fusion image, and calculating the optimally grabbing workpiece of the first fusion image;

Step 5: and (3) iteratively executing the steps 2-4 until the similarity is smaller than the grabbing threshold, taking the current image as a first fusion image, calculating the optimal grabbing workpiece of the first fusion image, and iteratively executing the steps 2-5.

Further, in step 5, if the loop of step 2-4 is executed iteratively for more than 10-20 times, the current image is used as the first fused image, the optimal workpiece to be grasped of the first fused image is calculated, and step 2-step 5 is executed iteratively.

The beneficial effects of adopting above-mentioned improvement scheme are: the setting of the step can set a loop iteration maximum value, and the condition that the grabbing error is gradually increased due to the fact that a fixed image is always adopted as a first fusion image is prevented.

Further, the step 2 of controlling the robot to grasp the optimal grasping workpiece in the first fused image further includes a step of accurately determining a pose of the optimal grasping workpiece, and specifically includes:

deleting the part except the optimal grabbing workpiece in the first fusion image, and only reserving the optimal grabbing workpiece as an image to be grabbed;

and carrying out pose recognition on the optimal grabbing workpiece to obtain the pose of the optimal grabbing workpiece.

A second aspect of the present invention provides a computer readable storage medium, where the computer readable storage medium stores program instructions, where the program instructions are executed to perform a method for controlling robot 3D laser vision disorder grabbing as described above.

A third aspect of the present invention provides a robotic 3D laser vision chaotic grasping control system, comprising the computer readable storage medium described above.

Compared with the prior art, the robot 3D laser vision disordered grabbing control method, medium and system provided by the invention have the beneficial effects that: firstly, a disordered grabbing model is established by utilizing a convolutional neural network, an optimal grabbing workpiece and the pose thereof can be obtained by carrying out operation on a workpiece pile fusion image, at the moment, a robot is controlled to grab the optimal grabbing workpiece, after grabbing, the fusion image of the rest workpiece pile has small change, namely, most of the workpieces do not have pose change, at the moment, the grabbed workpiece is deleted from the fusion image as a new image, and the new image is calculated to find the optimal grabbing workpiece, and in the process, as the fusion image change before and after grabbing is extremely small, the grabbed workpiece can be directly used for searching the optimal grabbing workpiece after the process of deleting the grabbed workpiece by using the image before grabbing in a memory; compared with the method for capturing the fused image by taking a photo each time, the method for capturing the workpiece in the fused image has the advantages of reduced image processing amount, higher processing efficiency, effectively reduced calculation time and improved capturing instantaneity.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of the method provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

As shown in fig. 1, the first aspect of the present invention provides a method for controlling disordered grabbing of a 3D laser vision of a robot, which includes the following steps:

when the workpiece is grabbed, the workpiece with the least influence on the pose of other workpieces is grabbed optimally.

In step S10, obtaining a three-dimensional map of the robot, the fixture, and the material frame is one of key steps for implementing the robot 3D laser vision disordered grabbing control method. To obtain these three-dimensional maps, we need to first know the structure and parameters of the robot and then build a corresponding three-dimensional model. The specific implementation mode is as follows:

Acquiring structure and parameters of robot

First, the structure and parameters of the robot need to be acquired. The structure of a robot generally includes a plurality of joints and links, and the manner in which the joints and links are connected and the relative positional relationship therebetween determine the motion performance of the robot. Therefore, parameters such as length, angle, etc. of each joint and link of the robot need to be measured or queried. These parameters may be obtained from technical data provided by the robot manufacturer or from self-measurements.

Establishing a three-dimensional model of a robot

Based on the obtained robot structure and parameters, a three-dimensional model of the robot is built by using Computer-Aided Design (CAD) software, such as SolidWorks, pro/E, etc. In the modeling process, the following points need to be noted:

the model should reflect the structure and parameters of the actual robot as accurately as possible to ensure the precision in the subsequent grabbing control process;

each joint in the model needs to be set as a movable joint so as to realize the movement of the joint in the subsequent grabbing control process;

the three-dimensional model of the jig and the material frame should be included in the model so that their influence is taken into consideration in the subsequent grip control process.

Acquiring three-dimensional figures of robot, clamp and material frame

After the three-dimensional models of the robot, the jig and the material frame are established, the models need to be exported into a three-dimensional map. The three-dimensional map may use a general three-dimensional map format such as STL, OBJ, etc. These formats may be conveniently exchanged and processed between different computer platforms and software.

In deriving the three-dimensional map, attention is paid to the following points:

the resolution of the three-dimensional map should be set according to the actual requirements. The higher the resolution ratio is, the higher the precision of the three-dimensional graph is, but the larger the calculation amount is, and the real-time performance of the follow-up grabbing control process can be affected;

the coordinate system in the three-dimensional graph is consistent with the coordinate system of the actual robot so as to carry out coordinate transformation in the follow-up grabbing control process;

in order to facilitate subsequent processing, three-dimensional graphs of the robot, the clamp and the material frame can be respectively exported;

after three-dimensional modeling is completed, a coordinate system needs to be established

After the three-dimensional modeling is completed, a unified coordinate system needs to be established to describe the relative positional relationship among the robot, the fixture and the material frame. In general, we can take the coordinate system of the robot base as the world coordinate system (W) and build local coordinate systems (T) at the reference points of the jig and the frame, respectively _C And T _F ). By establishing a coordinate system, the pose relationship among the components can be conveniently described.

Coordinate transformation

In order to describe the relative positional relationship among the robot, the jig, and the material frame, coordinate transformation is required. The coordinate transformation may be performed by the following formula:

T _WC ＝T _WF ·T _FC ；

wherein T is _WC Representing the slave fixture coordinate system T _C A transformation matrix to the world coordinate system W; t (T) _WF Representing the slave frame coordinate system T _F A transformation matrix to the world coordinate system W; t (T) _FC Representing the slave fixture coordinate system T _C To a frame coordinate system T _F Is used for the transformation matrix of the (a). These transformation matrices may be obtained by three-dimensional modeling software or measurement devices (e.g., laser trackers, optical gauges, etc.).

Generating point cloud data of robots, fixtures and material frames

For subsequent workpiece recognition and grabbing plans, the three-dimensional model needs to be converted into point cloud data. This may be achieved by using a point cloud generation algorithm, such as MeshLab, pointCloudLibrary (PCL), etc. The point cloud data contains geometric information and color information of the model, and can be used for workpiece identification and pose estimation.

Data fusion

After the point cloud data of the robot, the fixture and the material frame are acquired, the data are fused into a unified data structure. This can be done by the following formula:

P _W ＝T _WC ·P _C +T _WF ·P _F ；

Wherein P is _W Representing the fused point cloud data; p (P) _C Point cloud data representing the jig; p (P) _F Point cloud data representing the material frame. Through data fusion, the relative position relation of the robot, the clamp and the material frame can be described under a unified coordinate system, basic data are provided for subsequent workpiece identification and grabbing plans, and interference is avoided when the grabbing path of the robot is planned.

In the above technical solution, the step of comparing the three-dimensional model of the introduced workpiece with the 3D image shot by the camera to identify the pose of the workpiece specifically includes:

In step S20, we need to compare the three-dimensional model of the introduced workpiece with the 3D image captured by the camera, and identify the pose of the workpiece. The key of the step is to realize quick recognition and pose estimation of the workpiece in a disordered environment. To achieve this object, we can employ the following embodiments.

First, we need to pre-process the three-dimensional model of the workpiece. When importing a three-dimensional model of a workpiece, we can use a point cloud representation, i.e. discretizing the surface of the workpiece into a set of points. These points may be calculated by scanning the physical workpiece with a three-dimensional scanner or using a three-dimensional model derived using CAD software. The workpiece model represented by the point cloud can be conveniently compared and matched with the 3D image shot by the camera.

Next, we need to extract features from the 3D image taken by the camera. The characteristic is that mathematical quantity describing the surface shape, texture and other attributes of the object can be used for distinguishing different objects and estimating the pose of the object. To extract features in a 3D image, we can employ the following algorithm:

a point feature histogram (PFH, point Feature Histograms) algorithm is used to extract the local features of each point in the point cloud. The PFH features are derived by computing the geometric relationship between k-nearest neighbors of each point in the point cloud. Specifically, for each point p in the point cloud _i We can find their k nearest neighbors p _i1 ,p _i2 ,...,p _ik The normal vector angle difference α, angle β and curvature γ between these points are then calculated. The characteristic values are distributed in a multi-dimensional histogram to obtain a point p _i PFH characteristics of (c).

A Fast Point Feature Histograms (FPFH) algorithm is used to extract the local features of each point in the point cloud. The FPFH feature is an improvement on the PFH feature and is obtained by calculating the relative position relationship between k neighbor points of each point in the point cloud. Specifically, for each point p in the point cloud _i We can find their k nearest neighbors p _i1 ,p _i2 ,...,p _ik The distance d, angle θ, and curvature Φ between these points are then calculated. The characteristic values are distributed in a multi-dimensional histogram to obtain a point p _i Is a FPFH characteristic of (c).

After extracting the features of the 3D image captured by the camera and the imported three-dimensional model of the workpiece, we need to perform feature matching. Feature matching is to find out the corresponding point pairs by calculating the similarity between two sets of features. To achieve feature matching, we can employ the following algorithm:

the distance between the two sets of features is calculated using a nearest neighbor search (Nearest Neighbor Search, NNS) algorithm. Giving a feature f to be matched _q We can be in the target feature set f ₁ ,f ₂ ,...,f _n Find the feature f closest to it _i . This process may be accelerated by kd-Tree, ball Tree (Ball Tree), or like data structures.

A random sample consensus (RANSAC) algorithm is used to estimate the pose transformation between pairs of feature matching points. Given a set of feature matching point pairs (p _q1 ,p _t1 ),(p _q2 ,p _t2 ),...,(p _qm ,p _tm ) We can find an optimal pose transformation matrix T using the RANSAC algorithm so that the distance between transformed pairs of points is minimized. Specifically, in the RANSAC algorithm, we can randomly select a pair of point subsets, and calculate the corresponding pose transformation matrix T _i Then T is taken _i The distance between the transformed point pairs is calculated, applied to all the point pairs. This process is repeated several times, and the pose transformation matrix T that minimizes the distance is selected as the optimal solution.

Finally, the estimated pose transformation matrix T can be applied to the imported workpiece three-dimensional model to obtain the pose of the workpiece three-dimensional model in the 3D image shot by the camera. Thus, the workpiece pose recognition is completed.

In another embodiment of this step, the following embodiments may be employed:

first, a 3D image photographed by a camera needs to be preprocessed. Due to the problems of illumination, shielding and the like in an actual application scene, noise and incomplete conditions may exist in the acquired 3D image. Therefore, before workpiece recognition, the 3D image needs to be subjected to noise reduction, filtering, and other processes to improve the accuracy of recognition. Specifically, the following method may be used:

Noise points in the 3D image are excluded using an outlier removal algorithm, such as a random sample consensus (RANSAC) algorithm.

The 3D image is subjected to smoothing processing, for example, using a gaussian filter or a bilateral filter, etc., to eliminate high-frequency noise in the image.

The isolated points and filled holes in the image are removed using morphological operations, such as open operations, closed operations, etc.

After the pretreatment is completed, the treated 3D image is required to be registered with a three-dimensional model of the workpiece so as to obtain the pose of the workpiece. In order to achieve efficient registration, the following method may be employed:

feature descriptors such as a Point Feature Histogram (PFH) or a Fast Point Feature Histogram (FPFH) are used to extract feature points in the 3D image and the workpiece three-dimensional model. The feature descriptors can effectively describe local geometric information of points in the point cloud, and are beneficial to realizing efficient point cloud registration.

And estimating a rigid body transformation matrix between the 3D image and the workpiece three-dimensional model by using a random sampling consistency (RANSAC) algorithm or a least square method and other methods, so as to realize coarse registration of the point cloud. Specifically, the following steps may be employed:

(1) Randomly selecting a point pair from the characteristic points of the 3D image and the workpiece three-dimensional model, and calculating the distance between the two point pairs;

(2) Setting a threshold value, taking the point pairs with the distance smaller than the threshold value as inner points and taking other point pairs as outer points;

(3) Solving a rigid body transformation matrix between inner points by a least square method and other methods;

(4) Repeating the above process for a plurality of times, and selecting the rigid body transformation matrix with the largest number of inner points as a rough registration result.

Based on coarse registration, fine registration of the point cloud is achieved using an Iterative Closest Point (ICP) algorithm or a variation thereof (e.g., point-to-plane ICP, nonlinear least squares, etc.). Specifically, the following steps may be employed:

(1) According to the rough registration result, initially aligning the 3D image with the workpiece three-dimensional model;

(2) Finding the nearest point in the three-dimensional model of the workpiece for each point in the 3D image;

(3) Calculating the distance between the point pairs, and solving the minimum distance and the corresponding rigid body transformation matrix;

(4) And updating the pose of the 3D image, and repeating the process until the preset convergence condition (such as iteration times, error thresholds and the like) is met.

By the method, registration of the 3D image and the workpiece three-dimensional model can be achieved, and therefore the pose of the workpiece is obtained. It should be noted that in practical applications, the above method may need to be appropriately adjusted and optimized for specific scenarios and requirements. For example, a deep learning method (e.g., pointNet, 3DMatch, etc.) may be used to extract point cloud features to improve accuracy and robustness of registration; meanwhile, strategies such as multi-view fusion, layering optimization and the like can be adopted, so that the registering effect is further improved.

In the above technical solution, the step of obtaining the pose conversion relationship between the robot and the 3D camera and completing the hand-eye calibration between the robot and the 3D camera specifically includes:

setting parameters of a 3D camera and a calibration plate;

In step S30, a pose conversion relationship between the robot and the 3D camera is obtained, and hand-eye calibration between the robot and the 3D camera is completed. The hand-eye calibration is an important link in a robot vision system, and mainly solves the coordinate system relation between the tail end of a robot arm and a camera. The purpose of hand-eye calibration is to solve a transformation matrix from the tail end of the robot to a camera coordinate system, so that the robot can position and grasp a target object according to image information shot by the camera.

In order to realize the hand-eye calibration, the following method can be adopted:

(1) Calibration plate-based method

In this method, a calibration plate with specific marks is required. The calibration plate may be a checkerboard, circular mark, or other identifiable feature point set. First, it is necessary to place a calibration plate at the end of the robot and collect images photographed by the camera at different positions and attitudes. By identifying feature points in the image, feature point coordinates in the camera coordinate system can be calculated. And simultaneously, recording pose information of the tail end of the robot at each position and each pose. With the data, a transformation matrix from the tail end of the robot to a camera coordinate system can be solved by adopting a hand-eye calibration algorithm such as a Tsai-Lenz algorithm, a DLT algorithm and the like.

Specifically, a Tsai-Lenz algorithm can be adopted for hand-eye calibration. The basic principle of the Tsai-Lenz algorithm is to convert the hand-eye calibration problem into a linear least squares problem. First, the coordinates of each feature point in the camera coordinate system need to be calculated, and the following formula may be used:

X _c ＝RX _w +T；

wherein X is _c Representing coordinates of feature points in a camera coordinate system, X _w Representing coordinates of the feature points in the world coordinate system, R representing the rotation matrix, and T representing the translation vector. Expanding the formula can result in:

[X _c Y _c Z _c ]＝[r ₁₁ r ₁₂ r ₁₃ r ₂₁ r ₂₂ r ₂₃ r ₃₁ r ₃₂ r ₃₃ ][X _w Y _w Z _w ]+[t _x t _y t _z ]；

the above formula can be converted into a system of linear equations due to the coordinates of the feature points in the world coordinate system and the coordinates in the camera coordinate system, and then the rotation matrix (R) and translation vector (T) are solved.

In practical use, a calibration plate can be installed at a proper position, a 3D calibration program of the robot is operated, communication parameters of Mech-Hub software and an industrial robot system are set, the robot is manually operated through Mech-Viz software, the control right of the industrial robot is obtained, 3D cameras and the parameters of the calibration plate are set on the Mech-Viz software, the 3D cameras are controlled to shoot and collect calibration plate data of different poses, a calibration point column is added, a calibration result is calculated, optimization and error analysis are carried out on the calculation result according to the calibration precision, finally, the pose conversion relation between the robot and the 3D cameras is obtained, and hand-eye calibration between the industrial robot and the 3D cameras is completed.

(2) Method based on non-calibration plate

Besides the calibration plate, the method of non-calibration plate can also be used for hand-eye calibration. The method mainly utilizes the motion information of the tail end of the robot at different positions and postures and the image information shot by the camera to solve the hand-eye calibration problem. Typical non-calibrated plate methods are self-alignment based methods, motion relative relationship based methods, and the like.

Taking the method based on automatic alignment as an example, it is first necessary to fix the robot tip in a position and let the camera take a still scene. The robot tip may then be rotated along an axis while the images taken by the camera are recorded. By analyzing the feature point motion in the image, feature point coordinates in the camera coordinate system can be calculated. And simultaneously, recording pose information of the tail end of the robot at each position and each pose. With the data, a transformation matrix from the tail end of the robot to a camera coordinate system can be solved by adopting a hand-eye calibration algorithm such as a method based on a motion relative relation.

In the above technical solution, the step of acquiring the 2D map, the depth map and the point cloud view of the target workpiece stack obtained by the 3D camera and forming the workpiece stack fusion image specifically includes:

In step S40, we need to acquire a 2D map, a depth map and a point cloud view of the target workcell stack obtained by the 3D camera and form a workcell stack fusion image. Key technologies involved in this process include: 2D image acquisition, depth image acquisition, point cloud data acquisition and generation of a fusion image.

2D image acquisition

And 2D image acquisition is to shoot a target workpiece stack through an RGB camera to acquire color information of the target workpiece stack. In this process, we need to calibrate the camera to eliminate distortion and obtain the internal and external parameters of the camera. The common camera calibration method includes Zhang Zhengyou calibration method and the like. After calibration, we can convert the points under the pixel coordinate system into the points under the camera coordinate system through the camera internal parameters.

Depth image acquisition

The depth image refers to an image in which the distance from each pixel point to the camera is recorded. Common depth image capture devices are ToF cameras, structured light cameras, etc. Similar to the 2D image, we also need to calibrate the depth camera. In the process of depth image acquisition, the problems of noise, invalid pixel points and the like of the depth image are required to be paid attention to, and corresponding filtering and complementing methods are adopted for processing.

Point cloud data acquisition

The point cloud data is a set of three-dimensional coordinate data representing discrete points of the surface of the target object. We can convert the pixels of the image into three-dimensional point cloud data by registration (alignment) of the 2D image and the depth image, and camera parameters. Common point cloud registration methods include ICP (Iterative Closest Point) algorithm and NDT (Normal Distribution Transform) algorithm.

Generation of fused images

After the acquisition of the 2D image, the depth image and the point cloud data is completed, the information needs to be fused into a workpiece pile fusion image. The fusion image contains color, depth and geometric information of the target workpiece stack, and provides rich input data for a subsequent unordered grabbing model. The specific fusion method is as follows:

first, the 2D image and the depth image are registered, and a depth image corresponding to the 2D image is obtained. Interpolation may be performed by bilinear interpolation or the like to obtain a depth image of the same size as the 2D image.

Secondly, registering the point cloud data with the 2D image, and endowing color information to the point cloud data. Here, the pixels of the 2D image may be mapped into the point cloud data by camera internal parameters, thereby assigning each point cloud point a corresponding color value.

And finally, carrying out downsampling and filtering processing on the fused point cloud data so as to reduce the data quantity and noise. The usual downsampling method is voxel grid filtering (Voxel Grid Filter), and the filtering method is statistical outlier filtering (Statistical Outlier Removal).

In addition, the workpiece pile fusion image can be obtained by the following method:

first, for each pixel point (u, v) in the 2D image, we can get the depth value z of that point from the depth image. We can then convert the pixel coordinates (u, v) to coordinates (x, y) in the normalized camera coordinate system using the camera's internal reference matrix K and distortion parameter d:

(x,y)＝K ^-1 (u,v,1) ^T -d；

next, we can calculate the coordinates (X, Y, Z) of the point in the world coordinate system:

[XYZ]＝z[xy1]；

in this way, we can map each pixel point in the 2D image to the corresponding three-dimensional coordinates in the point cloud data. Meanwhile, color information (such as RGB values) in the 2D image is also endowed to the corresponding three-dimensional points to form point cloud data with the color information.

And finally, taking the point cloud data with the color information as a workpiece pile fusion image. The fusion image contains three-dimensional structure, color and texture information of the target workpiece stack, and can provide richer information for the subsequent unordered grabbing model.

Through the steps, the fusion image of the target workpiece stack can be obtained, and rich input data is provided for a subsequent unordered grabbing model.

In the above technical solution, the steps of establishing and training the unordered grabbing model specifically include:

In step S50, we will operate on the workpiece stack fusion image according to the pre-trained unordered grabbing model to obtain the optimal grabbing workpiece and its coordinates and pose. The specific implementation mode is as follows:

feature extraction

First, we need to extract features from the workpiece stack fusion image that will be used for analysis of the trained unordered grabbing model. Feature extraction may use some common computer vision algorithms such as SIFT (scale invariant feature transform), SURF (accelerated robust feature), ORB (Oriented FAST and Rotated BRIEF), and the like. Wherein the process of collecting the image pair of the workpiece stack may be accomplished by:

(a) The workpieces are randomly placed in a material frame to form a disordered stacking state. The variety, shape and size of the workpieces in the workpiece stack are ensured to be diversified to ensure that the training sample has better generalization capability;

(b) Shooting a fusion image of the workpiece stack by using a 3D laser camera, wherein the fusion image comprises a 2D image, a depth image and a point cloud view; fusing the views together to form a first image;

(c) Randomly selecting a workpiece, and recording the position and the pose of the workpiece. And then removed from the stack of workpieces in a real or virtual environment;

(d) Shooting the fusion image of the workpiece stack again to obtain a second image;

(e) The first image, the second image and the position and pose information of the removed workpiece are taken as a training sample. Repeating the steps and collecting enough training samples.

Taking SIFT as an example, key points and descriptors thereof in a workpiece pile fusion image can be extracted for subsequent grabbing model analysis; the main steps of the SIFT algorithm comprise:

and (3) detecting a scale space extremum: detecting key points in images of different scales;

positioning key points: precisely determining the position and the scale of the key points;

key point direction distribution: assigning one or more directions to each keypoint;

Generating key point descriptors: generating a descriptor of the key point.

Grabbing model

After feature extraction of the workpiece pile fusion image, we need to analyze these features with a pre-trained unordered grabbing model to find the optimal grabbing workpiece and its coordinates and pose. The grasping model may be a deep learning-based method such as Convolutional Neural Network (CNN) or Recurrent Neural Network (RNN), etc.

Taking convolutional neural network as an example, we can input the extracted features into a trained CNN model, and calculate the grabbing probability of each workpiece through forward propagation. The structure of the CNN model typically includes multiple convolution layers, pooling layers, and full-join layers, where local and global features in the image can be automatically learned. To accommodate our problems, we can design a CNN model with the following structure:

(a) Input layer: the fused image is received as input.

(b) Convolution layer: the input image is convolved by using a plurality of convolution checks to extract local features.

(c) Pooling layer: and the output of the convolution layer is downsampled, so that the characteristic dimension is reduced, and the calculated amount is reduced.

(d) Full tie layer: the output of the pooling layer is connected to a fully connected layer to achieve nonlinear combination of features.

(e) Output layer: and outputting the predicted workpiece position and pose.

To train the CNN model, we need to define a loss function to measure the difference between the model predicted workpiece position and pose and the actual value. Common Loss functions are Mean Square Error (MSE) and Cross Entropy Loss (Cross-Entropy Loss). We can select the appropriate loss function and use a random gradient descent (SGD) or other optimization algorithm for model training.

Optimal gripping workpiece screening

After the grabbing probability of each workpiece is obtained, the optimal grabbing workpiece needs to be screened, namely, the workpiece with the least influence on the positions and the postures of other workpieces is needed when the workpiece is grabbed. This can be achieved by setting a gripping threshold, only workpieces with a gripping probability higher than the gripping threshold will be considered to be the best gripping workpieces.

The specific screening process is as follows:

sequencing the grabbing probabilities of all the workpieces;

starting from the workpiece with the highest grabbing probability, checking whether the workpiece meets the grabbing threshold condition;

if the grabbing threshold condition is met, taking the workpiece as an optimal grabbing workpiece; otherwise, continuing to check the next workpiece with higher grabbing probability;

the above process is repeated until an optimal gripping workpiece is found or all workpieces are inspected.

The above-mentioned grabbing threshold may be represented by using the similarity between the first image and the second image, that is, when the similarity is higher, it indicates that the pose change of the remaining workpiece is minimal after grabbing the first workpiece; in actual use, after the optimal workpiece is grabbed, the next step can directly adopt the first image to continuously calculate the next optimal grabbed workpiece, a new workpiece stack image obtained after grabbing is not required to be adopted to calculate the optimal grabbed workpiece, photographing steps can be reduced, image analysis calculated amount is reduced, and calculation efficiency is improved. In general, the capture threshold is calculated in a manner of 1-K ₀ /P ₀ Wherein K is ₀ The representation coefficient is generally 2-10; p (P) ₀ Representing the work observed by the surface energy in the image of the work stackThe number of pieces, preferably, K ₀ ＝2。

Coordinate and pose calculation

After determining the optimal gripping of the workpiece, we need to calculate its coordinates and pose in the stack of workpieces. This may be accomplished by matching the extracted keypoints with a three-dimensional model of the workpiece. Common matching algorithms are RANSAC (random sample consensus) and ICP (iterative closest point), etc.

Taking RANSAC as an example, we can calculate the coordinates and pose of the best gripping workpiece by:

Randomly selecting a group of key point pairs, and calculating a transformation matrix between the key point pairs;

applying the transformation matrix to other key points, and calculating the distance between the transformed points and the target point;

counting the number of key point pairs meeting a distance threshold;

repeating the above process for a plurality of times, and selecting a transformation matrix with the most number of key points meeting the distance threshold as a final result; the distance threshold is the size of the optimal grabbing workpiece;

and calculating the coordinates and the pose of the optimally grabbed workpiece in the workpiece stack according to the final transformation matrix.

Result output

And outputting the coordinates and the pose of the optimal grabbing workpiece for grabbing by a robot in the following step S60.

Summarizing, in step S50, we need to calculate the workpiece pile fusion image according to the pre-trained unordered grabbing model, so as to obtain the optimal grabbing workpiece, its coordinates and pose. The specific implementation mode comprises feature extraction, grabbing model analysis, optimal grabbing workpiece screening, coordinate and pose calculation and the like.

In another embodiment of the present invention, step S50 may also be directly used to obtain the optimal gripping workpiece and its coordinates in a computer simulation manner.

In the above technical solution, the step of controlling the robot to grasp the workpiece according to the optimal grasping of the workpiece and the pose thereof specifically includes:

Step 1: recording the workpiece stack fusion image as a first fusion image;

Deleting the image obtained by optimally grabbing the workpiece from the first fusion image as the first fusion image, wherein the step specifically comprises the following steps: marking the optimal grabbing workpiece in the first fusion image; and deleting the marked optimal grabbing workpiece in the first fusion image, and taking the image at the moment as the first fusion image. Specifically, the treatment may be performed in the following manner:

Optimal gripping workpiece marking

After obtaining the position and pose information of the optimal gripping workpiece, we can use an image processing algorithm, such as edge detection, contour extraction, etc., to mark the optimal gripping workpiece in the first fused image. Specifically, the fusion image is firstly converted into a gray level image, and then the outline of the optimal grabbing workpiece is extracted from the gray level image according to the position and pose information of the optimal grabbing workpiece. Next, we can use an edge detection algorithm, such as the Canny edge detection algorithm, to further process the extracted contour to obtain the edge information of the best gripping workpiece. Finally, we can mark the edges of the optimally gripped workpiece with a specific color (e.g., red) in the first fused image for subsequent processing.

Deleting marked optimal grabbing workpiece

After the marking of the best-grip workpiece is completed, the marked best-grip workpiece needs to be deleted from the first fused image. This can be achieved by an image processing algorithm such as image dilation and erosion.

Firstly, the edge of the marked optimal grabbing workpiece in the first fused image can be expanded. In particular, one structural element (e.g., a rectangular or circular structural element) may be used to expand the marked edge to expand the edge to the interior of the optimally gripped workpiece. In this way we can get a region covering the best gripping of the workpiece.

The expanded region may then be subjected to an etching process to eliminate noise that may be generated during expansion. In particular, the same structural elements as those used in the expansion process can be used to erode the expanded region, thereby obtaining a noise-removed region that covers the optimal gripping workpiece.

Finally, we can delete the area in the first fused image that covers the best gripping workpiece using image processing techniques, such as image subtraction, etc. Specifically, the image subtraction operation can be performed on the first fused image and the corroded area covering the optimal grabbing workpiece, so that the first fused image with the optimal grabbing workpiece removed is obtained. In this way, we can continue to analyze and process the first fused image, which removes the optimally grabbed workpiece, in a subsequent step.

Further, in the above technical solution, in step 5, if the loop of step 2-4 is executed iteratively for more than 10-20 times, the current image is used as the first fused image, the optimal workpiece to be gripped of the first fused image is calculated, and step 2-step 5 is executed iteratively.

Further, in the above technical solution, step 2 controls the robot to grasp the optimal grasping workpiece in the first fused image, and further includes a step of accurately determining a pose of the optimal grasping workpiece, and specifically includes:

Wherein, in the first fused image, deleting the part other than the optimally grasped workpiece, and only the step of retaining the optimally grasped workpiece can be realized by setting the pixel points in the first fused image which do not belong to the range of the bounding box (B) to be transparent or background color. To delete portions other than the workpiece in the image, the following algorithm may be used:

a. traversing all pixels in the first fused image, denoted (x) _i ,y _i ,z _i )。

b. Inspection pixel (x) _i ,y _i ,z _i ) Whether or not it is located within the bounding box (B). The pixel point is illustrated to be within the bounding box if the following conditions are satisfied:

x _min ≤x _i ≤x _max ][y _min ≤y _i ≤y _max ][z _min ≤z _i ≤z _max ；

c. if the pixel point (x _i ,y _i ,z _i ) Outside the bounding box, it is set to be transparent or background. Specifically, the color value of the pixel may be ((0, 0)) (RGBA format, indicating transparency) or ((255, 255)) (RGB format, indicating white background).

Alternatively, the method is realized in the following way:

determination of bounding box for optimal gripping of workpiece

First, we need to determine the bounding box of the best gripping workpiece in the first fused image. To achieve this, we can employ a segmentation algorithm for 3D point clouds, such as a RANSAC-based planar segmentation algorithm. Specifically, we can find the corresponding point set in the 3D point cloud by using the coordinate and pose information of the optimally grasped workpiece. We can then calculate the smallest bounding rectangle of these points as the bounding box for the best gripping workpiece in the first fused image.

Creating a mask

Next, we need to create a binary mask of the same size as the first fused image. In this mask, the pixel values in the boundary box region of the optimally gripped workpiece are set to 1, and the pixel values of the remaining regions are set to 0. In this way, we can achieve the goal of retaining only the optimal gripping workpiece by applying a mask to the first fused image.

Specifically, we can create a mask using the following method:

where M (x, y) represents the pixel value of the mask at the (x, y) position.

Using masks

Finally, we can achieve the goal of retaining only the optimal gripping workpiece by applying a mask to the first fused image. Specifically, we can calculate the image after applying the mask using the following formula: i '(x, y) =i (x, y) ·m (x, y), where I (x, y) represents the pixel value of the first fused image at the (x, y) position, and I' (x, y) represents the pixel value of the image after mask application at the (x, y) position.

The method comprises the steps of performing pose recognition on an optimal grabbing workpiece to obtain a first method of the pose of the optimal grabbing workpiece:

first find and optimally grasp the workpiece j ^* Corresponding set of keypointsFor each key point->Can be obtained its coordinates in space (x _i ,y _i ,z _i ). Then, principal Component Analysis (PCA) is performed on these coordinates, resulting in the principal axis direction of the workpiece. Let u be ₁ 、u ₂ And u ₃ The pose matrix R of the workpiece can be calculated for three main axis directions obtained by the main component analysis, namely:

R＝[u ₁ u ₂ u ₃ ]；

next, the centroid coordinates (x _c ,y _c ,z _c ) The method comprises the following steps:

where N is the set of keypoints j ^* Is of a size of (a) and (b).

Finally, combining the centroid coordinates and the pose matrix of the workpiece into a pose matrix T E R ^4×4 。

The pose matrix represents the pose of the optimally grasped workpiece. The method is quick and efficient, and has small calculated amount.

And (3) carrying out pose recognition on the optimal grabbing workpiece to obtain a second method of the pose of the optimal grabbing workpiece:

identifying an image to be grabbed by using a trained pose identification model to obtain the pose of the optimal grabbed workpiece in the image to be grabbed, wherein the steps of establishing and training the pose identification model specifically comprise the following steps:

data preparation

In order to train the pose recognition model, a large amount of workpiece image data including 2D images, depth maps and point cloud data under different angles, illumination conditions and occlusion conditions needs to be collected first. Meanwhile, the data are required to be marked, and the real pose of each workpiece is recorded;

let the workpiece image dataset be Wherein I is _i 2D image representing the ith sample, D _i Representing a corresponding depth map, P _i Representing point cloud data, T _i And representing the real pose of the workpiece. The data set D contains N samples;

data preprocessing

To enhance the generalization ability of the model, data enhancement, such as translation, rotation, scaling, flipping, etc., may be performed on the data set. Meanwhile, carrying out normalization processing on the 2D image, the depth map and the point cloud data to ensure that the numerical range is between 0 and 1;

model construction

And constructing a deep learning model for extracting features from the input 2D image, depth map and point cloud data and predicting the pose of the workpiece. The multi-mode fusion method can be adopted to fuse the characteristics of the 2D image, the depth map and the point cloud data so as to improve the prediction performance of the model;

the model is set as a function M (I _m ,D _m ,P _m The method comprises the steps of carrying out a first treatment on the surface of the θ), wherein I _m 、D _m And P _m Respectively representing the input 2D image, depth map and point cloud data, θ representing parameters of the model. The model output is the predicted pose

Loss function

To evaluate the predictive performance of a model, a loss function needs to be definedFor measuring predicted pose +.>And the true pose T. Here the Mean Square Error (MSE) can be used as a loss function:

Model training

The loss function is minimized to update the model parameters θ using a random gradient descent (SGD) or other optimization algorithm. During each iteration, a small batch of samples is randomly drawn from the data set D. Then calculate the loss functionRegarding the gradient of the model parameter θ, and updating the parameters:

where ζ represents the learning rate, the step size used to control the parameter update.

Model verification and tuning

During training, a validation set is required to evaluate the generalization performance of the model. Training may be stopped when the performance of the model on the validation set reaches expectations. In addition, the prediction performance of the model can be improved by adjusting super parameters such as a model structure, a loss function, an optimization algorithm and the like.

Model testing

After training is completed, a separate test set is used to evaluate the predictive performance of the model. The samples in the test set should not be repeated with the samples in the training set and the validation set to ensure reliability of the evaluation results.

This approach provides greater accuracy in identifying the optimal gripping of the workpiece.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. The robot 3D laser vision disordered grabbing control method is characterized by comprising the following steps of:

2. The method for controlling disordered grabbing of 3D laser vision of a robot according to claim 1, wherein the step of comparing the three-dimensional model of the introduced workpiece with the 3D image shot by the camera to identify the pose of the workpiece specifically comprises the steps of:

3. The method for controlling disordered grabbing of 3D laser vision of a robot according to claim 1, wherein the step of obtaining the pose conversion relationship between the robot and the 3D camera and completing the hand-eye calibration between the robot and the 3D camera comprises the following steps:

Setting parameters of a 3D camera and a calibration plate;

4. The method for controlling disordered grabbing of 3D laser vision of a robot according to claim 1, wherein the step of acquiring the 2D map, the depth map and the point cloud view of the target workpiece stack obtained by the 3D camera and forming the workpiece stack fusion image specifically comprises the following steps:

5. The method for controlling the disordered grabbing of the 3D laser vision of the robot according to claim 1, wherein the steps of establishing and training the disordered grabbing model specifically comprise the following steps:

6. The method for controlling disordered grabbing of 3D laser vision of a robot according to claim 1, wherein the step of controlling the robot to grab the workpiece according to the optimal grabbing of the workpiece and the pose thereof specifically comprises the following steps:

step 1: recording the workpiece stack fusion image as a first fusion image;

7. The method for controlling disordered grabbing of 3D laser vision of a robot according to claim 6, wherein in step 5, if the loop of step 2-4 is performed iteratively more than 10-20 times, the current image is used as the first fused image, the optimal grabbing workpiece of the first fused image is calculated, and steps 2-5 are performed iteratively.

8. The method for controlling disordered grabbing of the 3D laser vision of the robot according to claim 6, wherein the step 2 controls the robot to grab the optimal grabbing workpiece in the first fused image, and the method further comprises the step of accurately determining the pose of the optimal grabbing workpiece, specifically comprising:

9. A computer readable storage medium, wherein program instructions are stored in the computer readable storage medium, and the program instructions are used for executing a robot 3D laser vision disorder grabbing control method according to any one of claims 1-8 when running.

10. A robotic 3D laser vision disorder capture control system comprising the computer-readable storage medium of claim 9.