CN113580149A

CN113580149A - Unordered aliasing workpiece grabbing method and system based on key point prediction network

Info

Publication number: CN113580149A
Application number: CN202111156483.0A
Authority: CN
Inventors: 王耀南; 伍俊岚; 朱青; 刘学兵; 周鸿敏; 毛建旭; 周显恩; 吴成中; 冯明涛; 曾琼; 童琛
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2021-11-02
Anticipated expiration: 2041-09-30
Also published as: CN113580149B

Abstract

The invention discloses a disordered aliasing workpiece grabbing method and system based on a key point prediction network, wherein a real-time RGB image is input, the position of each workpiece can be divided through a preset key point prediction network model and the key point position of each workpiece can be predicted, so that the pixel coordinate of each key point in the image is obtained, the conversion relation between a workpiece model coordinate system and a camera coordinate system is solved by combining the 3D coordinate of each key point in the workpiece model coordinate system and camera internal parameters, the conversion relation between the camera coordinate system and a robot coordinate system is obtained by combining hand-eye calibration, and then the 6DoF position and pose information of the workpiece in the robot coordinate system are obtained by solving. The method can predict the pixel position which most possibly represents the key point through voting under the condition that the key point is blocked, solves the difficult problem of calculating the blocked pose of the key point under the condition of workpiece aliasing, enables the robot to realize the workpiece picking function under a more complex scene, and effectively improves the picking success rate.

Description

Unordered aliasing workpiece grabbing method and system based on key point prediction network

Technical Field

The invention belongs to the technical field of intelligent robots, and relates to a method and a system for capturing disordered aliasing workpieces based on a key point prediction network.

Background

The vision technology occupies an important position in the application of the industrial robot, and under the large background that the industrial robot needs to be large in scale, the vision enhancement technology promotes the intelligent industrial robot to adapt to more complex scenes, solves complex problems and has a huge market prospect. Industrial sorting systems are an important part of industrial robot technology development processes, and nowadays, in various industrial production lines, the robot technology is gradually replacing the traditional manual operation mode.

Most of the existing industrial automatic sorting systems complete sorting tasks by programming industrial robots in advance, long-time repeated operation can be realized in the mode, but the placing positions of sorting objects must be strictly set, and the robots cannot deal with flexibly-changed scenes in the mode. At present, multiple kinds of multi-part random placing scenes exist in an industrial automatic sorting system, and the classification and placing of parts in the existing mode mostly depend on manual work, so that full-automatic production cannot be realized. Facing the market demand, the research of robot independent sorting facing complex scenes is of great significance. The pose estimation technology is very effective for solving the problem of picking up target parts in a scene of disordered placement of stacked parts in a machine vision industrial sorting system, but still has a plurality of problems to be overcome, for example, under the conditions that the surfaces of the parts have no texture, the parts are stacked and blocked, the light environment is complex and the like, the pose of a workpiece cannot be accurately calculated, so that the workpiece cannot be accurately picked up.

According to the existing research, generally, workpiece picking can be classified into 2D planar picking and 6dof (six details of freedom) picking according to the difference of the robot working space. The former uses an object detection method to locate the target according to a known operating plane height, while the latter must rely on the 6D pose of the target to complete the pick-up. The traditional target pose estimation method solves the problem of feature matching between a target image and a key point template or global features of the target image to complete an estimation task. However, these methods are sensitive to surface texture and illumination, cannot process non-textured workpieces, and often cannot perform pose prediction in occluded scenes. With the development of deep learning techniques, many studies have combined the pose estimation problem with Convolutional Neural Networks (CNNs), converting it into an end-to-end combination of object detection and pose regression, using RGB or RGB-D images as inputs. Yu Xiang et al, 2017, in Posecnn: A connected neural network for 6D object site estimated networks, proposed a pose estimation network PosecCNN using RGB images as input, using a CNN-based backbone network for feature extraction, and finally using three network branches for object classification, 3D localization and rotational regression. And then based on the popularity of depth cameras in order to better utilize the RGB-D image information. In a patent by Chen Wang et al, Dense fusion 6d object spatial iterative condensation, 2019, two heterogeneous stems were used to extract color and depth features, respectively, and then fused together for regression.

The end-to-end based approaches described above are data driven in nature and require a large amount of real data information to train. However, 6DoF pose labeling is a very complex and time-consuming task that requires an accurate three-dimensional model to compute the training loss of the network. Therefore, most of the methods are trained and tested in public data sets, and are not easy to deploy in an actual robot picking system, so that the problem that the picking success rate is low due to the difficulty in calculating the position and posture of the workpiece when the key point of the workpiece is blocked is caused.

Disclosure of Invention

Aiming at the technical problems, the invention provides a method and a system for capturing disordered aliasing workpieces based on a key point prediction network, which can effectively improve the success rate of picking.

The technical scheme adopted by the invention for solving the technical problems is as follows:

the unordered aliasing workpiece grabbing method based on the key point prediction network comprises the following steps:

step S100: calibrating and determining a camera internal reference matrix through a Zhang friend method according to a preset first calibration picture;

step S200: determining a conversion matrix from a camera coordinate system to a robot coordinate system by a nine-point calibration method according to a preset second calibration picture;

step S500: acquiring a real-time image, inputting the real-time image into a preset pixel-level key point prediction network model, and regressing to obtain pixel coordinates of preset key points in the real-time image;

step S600: acquiring 3D coordinates of preset key points in a workpiece model coordinate system, and acquiring a conversion matrix between the workpiece model coordinate system and a camera coordinate system according to a camera internal reference matrix, the 3D coordinates of the preset key points in the workpiece model coordinate system and pixel coordinates of the preset key points in a real-time image;

step S700: acquiring coordinates of a picking point of a workpiece to be picked under a workpiece model coordinate system and initial direction information of robot grabbing equipment, acquiring workpiece initial pose information according to the coordinates of the picking point of the workpiece to be picked under the workpiece model coordinate system and the initial direction information of the robot grabbing equipment, and acquiring 6DoF position and posture information of the workpiece under the robot coordinate system according to the workpiece initial pose information, a conversion matrix from a camera coordinate system to the robot coordinate system and a conversion matrix between the workpiece model coordinate system and the camera coordinate system;

step S800: and controlling the robot grabbing equipment to grab the target workpiece according to the 6DoF position and posture information of the workpiece in the robot coordinate system.

Preferably, step S100 includes:

step S110: shooting images of preset first calibration pictures at different angles by using a camera;

step S120: extracting corner information from each image of a preset first calibration picture with different angles;

step S130: and calibrating by using a Zhang-friend method according to the angular point information, and calculating camera internal reference data to obtain a camera internal reference matrix.

Preferably, the preset second calibration picture includes nine dots, and the step S200 includes:

step S210: shooting an image of a preset second calibration picture placed in a random posture;

step S220: calculating the circle center pixel position of each dot in the image;

step S230: moving a sucker at the tail end of the robot mechanical arm to each round point, and recording a 3D coordinate under a corresponding robot coordinate system;

step S240: repeating the step S210 to the step S230 for a first preset number of times to obtain a group of 2D-3D data of each dot;

step S250: and calculating a conversion matrix from the camera coordinate system to the robot coordinate system according to the 2D-3D data of each dot.

Preferably, step S250 includes:

step S251: and calculating to obtain a rotation matrix between the robot coordinate system and the camera coordinate system and a translation matrix between the robot coordinate system and the camera coordinate system according to the 2D-3D data of each dot, wherein the calculation specifically comprises the following steps:

wherein the content of the first and second substances,

is a scale factor, and is a function of,

representing a rotation matrix between the robot coordinate system to the camera coordinate system,

representing a translation matrix between the robot coordinate system to the camera coordinate system,

is a reference for the camera to be used,

the coordinates of the pixels corresponding to each dot are,

3D coordinates of each dot under a robot coordinate system;

step S252: calculating a transformation matrix from the robot coordinate system to the camera coordinate system according to a rotation matrix from the robot coordinate system to the camera coordinate system and a translation matrix from the robot coordinate system to the camera coordinate system, and specifically:

wherein the content of the first and second substances,

a transformation matrix from a robot coordinate system to a camera coordinate system;

step S253: the method can obtain a conversion matrix from the camera coordinate system to the robot coordinate system according to the conversion matrix from the robot coordinate system to the camera coordinate system, and specifically comprises the following steps:

wherein the content of the first and second substances,

is a transformation matrix from the camera coordinate system to the robot coordinate system.

Preferably, after step S200 and before step S500, the method further includes:

step S300: building a pixel-level key point prediction network, acquiring a training data set, marking the training data set to obtain a marked data set, and training the pixel-level key point prediction network according to the marked data set to obtain a pixel-level key point prediction network;

step S400: calculating the loss value of the pixel-level key point prediction network according to a preset loss function, performing back propagation to update the network parameters of the pixel-level key point prediction network according to the loss value, and obtaining the updated pixel-level key point prediction network as a preset pixel-level key point prediction network model.

Preferably, the preset pixel-level keypoint prediction network model includes a convolutional neural network, a region candidate network, and four branches, where the four branches are a classification branch, a bounding box acquisition branch, a mask acquisition branch, and a pixel-level keypoint prediction branch, and step S500 includes:

step S510: inputting the image in the marked data set into a convolutional neural network to extract the characteristic information of the image, and transmitting the characteristic information into a regional candidate network;

step S520: the regional candidate network acquires the detection frame of each target workpiece according to the characteristic information and inputs the detection frame to the four branches;

step S530: the classification branch is used for classifying the target workpiece and the background according to the received detection frame; the boundary frame obtaining branch is used for obtaining the coordinates of the preset position point of the boundary frame of each target workpiece according to the received detection frame; the mask obtaining branch is used for obtaining a pixel area where each target workpiece is located according to the received detection frame; the pixel-level key point prediction branch is used for obtaining a unit vector diagram pointing to a preset number of key points according to the received detection frame;

step S540: normalizing the offset of each pixel position and the position of the 2D key point into a unit vector according to the pixel position of each pixel point of the pixel region where each target workpiece is located and the position of the 2D key point;

step S550: acquiring all pixel-level vectors of a single target workpiece, randomly selecting two pixel points, and taking the intersection point of the pixel vectors corresponding to the two pixel points as an initial hypothesis of a 2D key point;

step S560: and repeating the step S550 for a second preset number of times to obtain a group of hypotheses, using a clustering algorithm K-means to obtain a point with the highest score as a pixel point of the key point, and obtaining pixel coordinates of the pixel point as the key point.

Preferably, the loss function preset in step S400 is specifically:

wherein the content of the first and second substances,

、

、

、

weighting factors for the classification branch, the bounding box fetch branch, the mask fetch branch, and the pixel level keypoint prediction branch respectively,

in order to classify the function of the loss,

the loss function is detected for the bounding box,

in order to detect the loss function for the mask,

a branch loss function is predicted for the pixel level keypoints.

Preferably, step S600 includes:

step S610: the method comprises the steps of obtaining 3D coordinates of preset key points in a workpiece model coordinate system, and obtaining a rotation matrix and a translation matrix between the workpiece model coordinate system and a camera coordinate system according to a camera internal reference matrix, the 3D coordinates of the preset key points in the workpiece model coordinate system and pixel coordinates of the preset key points in a real-time image, wherein the method specifically comprises the following steps:

wherein the content of the first and second substances,

representing a rotation matrix between the workpiece model coordinate system to the camera coordinate system,

representing a translation matrix between the workpiece model coordinate system to the camera coordinate system,

is a reference for the camera to be used,

for the pixel coordinates corresponding to the preset key points,

3D coordinates of the preset key points in a workpiece model coordinate system;

step S620: obtaining a conversion matrix between the workpiece model coordinate system and the camera coordinate system according to a rotation matrix and a translation matrix between the workpiece model coordinate system and the camera coordinate system, which comprises the following specific steps:

wherein the content of the first and second substances,

representing a transformation matrix between the workpiece model coordinate system and the camera coordinate system.

Preferably, in step S700, the 6DoF position and posture information of the workpiece in the robot coordinate system is obtained according to the initial pose information of the workpiece, the transformation matrix from the camera coordinate system to the robot coordinate system, and the transformation matrix between the workpiece model coordinate system and the camera coordinate system, and specifically:

wherein the content of the first and second substances,

6DoF position and posture information of the workpiece in the robot coordinate system is shown,

a transformation matrix representing the camera coordinate system to the robot coordinate system,

representing a transformation matrix between the workpiece model coordinate system and the camera coordinate system,

and representing the initial pose information of the workpiece.

The unordered aliasing workpiece grabbing system based on the key point prediction network comprises an image acquisition module, a pose calculation module, a communication module and a pickup module, wherein the image acquisition module is connected with the pose calculation module, the pose calculation module is connected with the pickup module through the communication module,

the image acquisition module is used for acquiring a real-time image and sending the real-time image to the pose calculation module;

the pose calculation module is used for executing the method to obtain the 6DoF position and the posture information of the workpiece in the robot coordinate system and sending the position and the posture information to the pickup module through the communication device;

and the picking module picks the target workpiece according to the received 6DoF position and posture information of the workpiece under the robot coordinate system.

According to the disordered aliasing workpiece grabbing method and system based on the key point prediction network, a real-time RGB image is input, the position of each workpiece can be segmented through a preset key point prediction network model, the position of the key point of each workpiece can be predicted, pixel coordinates of the key point in the image and 3D coordinates of the key point under a workpiece model coordinate system are obtained, a conversion relation between the workpiece model coordinate system and a camera coordinate system is calculated by combining a camera internal parameter matrix, and then 6DoF position and posture information of the workpiece under a robot coordinate system are obtained through solving. Through a preset pixel-level key point prediction network model, the pixel position most possibly representing a key point can be predicted through voting under the condition that the key point is blocked, the problem of pose calculation of the blocked key point under the condition of workpiece aliasing is solved, the robot can realize the workpiece picking function under a more complex scene, the limitation that the traditional feeding method for fixing the workpiece position and teaching and grabbing by the robot is suitable for single scene is eliminated, the feeding and processing of an industrial production line are more flexible, the picking success rate is effectively improved, the system can be popularized and applied to feeding scenes of different parts, and the market prospect is very strong.

Drawings

FIG. 1 is a flowchart of a method for capturing an unordered aliasing workpiece based on a keypoint prediction network according to an embodiment of the invention;

FIG. 2 is a flow chart illustrating the overall architecture of the system according to an embodiment of the present invention;

FIG. 3 is a system hardware platform diagram according to an embodiment of the present invention;

FIG. 4 is a flowchart of a method for capturing an unordered aliased workpiece based on a keypoint prediction network according to another embodiment of the present invention;

FIG. 5 is a schematic diagram of a keypoint prediction network model according to an embodiment of the present invention;

fig. 6 is a schematic diagram of coordinate transformation of the robot arm, the industrial camera, and the target workpiece according to an embodiment of the present invention, where (a) is a schematic diagram of coordinates of the robot arm, (b) is a schematic diagram of coordinates of the industrial camera, (c) is a schematic diagram of coordinates of the target workpiece, and (d) is a schematic diagram of coordinate transformation.

Detailed Description

In order to make the technical solutions of the present invention better understood, the present invention is further described in detail below with reference to the accompanying drawings.

In one embodiment, as shown in fig. 1, the method for capturing an unordered aliasing artifact based on a keypoint prediction network comprises the following steps:

Specifically, as shown in fig. 2, a real-time image is acquired based on an image acquisition module, the image acquisition module is composed of a Baumer VCXG-13C monocular RGB camera and a bar light source, the camera has an image resolution of 1280 × 720, is fixed right above a feeding table and is used for acquiring the workpiece image randomly placed by the feeding table, the light sources are two 32cm bar light sources which are respectively installed on two sides of the feeding table, and the angle and the brightness can be adjusted to generate a better imaging effect; and executing steps S100 to S800 based on a pose calculation module, wherein the pose calculation module is realized by an industrial PC (personal computer) carrying server and is divided into an offline part and an online part, the offline part acquires a large number of real scene pictures through a camera to manufacture a training data set, a self-made data set is used for training a pixel-level key point prediction network model, and the trained model is stored as a preset pixel-level key point prediction network model. Acquiring a real-time image by adopting an API (application program interface) of a Baume monocular camera at the online part, inputting the real-time image into a preset pixel-level key Point prediction network model, regressing to obtain pixel coordinates of corresponding key points in the image, and finally resolving the position and attitude of a workpiece based on a PnP (passive-n-Point) algorithm to obtain the position and attitude information of the workpiece under a robot coordinate system; the position and posture information of a workpiece in a robot coordinate system is sent to the robot based on a communication module, the communication module completes data transmission between an industrial PC and the robot, the PC and the robot communicate through a gigabit Ethernet, and the PC transmits the calculated 6DoF pose of the target workpiece in the robot coordinate system to the robot through a TCP/IP; the grabbing of the target workpiece is completed based on a picking module which comprises a robot with an end picker, wherein the robot is a novel cooperative robot Sawyer of Rethink Robotics, and has a single arm 7 freedom degree and is suitable for wide or narrow space. According to the shape characteristics of a grabbed object, a single mechanical arm sucker is selected as a tail end picker, the motion control and track planning tasks of the tail end picker of the robot are completed through a built-in software platform of the robot, firstly, the mechanical arm is located at an initial position, the tail end is controlled to reach the position of a target workpiece through transmitted target workpiece pose information, an air pump is started and kept to achieve picking of the target workpiece, the mechanical arm is controlled to move to a position above a production line conveyor belt and close to the conveyor belt, feeding is achieved by closing the air pump, and then the mechanical arm returns to the initial position to form a round of feeding circulation.

Further, before grabbing, firstly, a camera and robot grabbing hardware equipment are set up, as shown in fig. 3, 1 represents an industrial camera, 2 represents an end picker, 3 represents an evacuation tube, 4 represents a bar light source, 5 represents an industrial part, 6 represents a material fetching table, and 7 represents a sawyer robot, and the method specifically comprises the following steps: a1) fixing the working position of the sawyer robot according to the working space distribution map of the sawyer robot; a2) building an industrial part placing platform, wherein the height is 800mm, the length is 850mm, the width is 340mm, and the industrial part placing platform is positioned 1000mm in front of the fixed position of the robot; a3) horizontally fixing a monocular camera (corresponding to an industrial camera) vertically downwards at a position 610mm above the part placing platform; a4) in order to realize the pickup of the object-oriented mobile phone shell parts, according to the characteristics of regular shape and smooth surface of the mobile phone shell, a vacuum pneumatic sucker is arranged at the tail end of a mechanical arm, and the pickup is realized by controlling the vacuum suction state of a cylinder (corresponding to a vacuum-pumping pipe) through a control switch.

According to the unordered aliasing workpiece grabbing method based on the key point prediction network, a real-time RGB image is input, the position of each workpiece can be segmented through a preset key point prediction network model, the position of the key point of each workpiece can be predicted, pixel coordinates of the key point in the image and 3D coordinates of the key point under a workpiece model coordinate system are obtained, a conversion relation between the workpiece model coordinate system and a camera coordinate system is calculated by combining a camera internal parameter matrix, and then 6DoF position and posture information of the workpiece under a robot coordinate system is obtained through solving. Through a preset pixel-level key point prediction network model, the pixel position most possibly representing a key point can be predicted through voting under the condition that the key point is blocked, the problem of pose calculation of the blocked key point under the condition of workpiece aliasing is solved, the robot can realize the workpiece picking function under a more complex scene, the limitation that the traditional feeding method for fixing the workpiece position and teaching and grabbing by the robot is suitable for single scene is eliminated, the feeding and processing of an industrial production line are more flexible, the picking success rate is effectively improved, the system can be popularized and applied to feeding scenes of different parts, and the market prospect is very strong.

In one embodiment, step S100 includes:

In one embodiment, the preset second calibration picture includes nine dots, and the step S200 includes:

In one embodiment, step S250 includes:

wherein the content of the first and second substances,

is a scale factor, and is a function of,

is a reference for the camera to be used,

the coordinates of the pixels corresponding to each dot are,

3D coordinates of each dot under a robot coordinate system;

wherein the content of the first and second substances,

wherein the content of the first and second substances,

Specifically, the transformation matrix from the camera coordinate system to the robot coordinate system and the transformation matrix from the robot coordinate system to the camera coordinate system are in an inverse relationship, so that the transformation matrix from the camera coordinate system to the robot coordinate system can be obtained according to the transformation matrix from the robot coordinate system to the camera coordinate system.

In one embodiment, as shown in fig. 4, after step S200 and before step S500, the method further includes:

step S300: and constructing a pixel-level key point prediction network, acquiring a training data set, marking the training data set to obtain a marked data set, and training the pixel-level key point prediction network according to the marked data set to obtain the pixel-level key point prediction network.

Specifically, the method for labeling the training data set comprises the following steps: the iPhone6S mobile phone shell is used as a pickup object, and aiming at network training requirements, a labeling process comprises key point selection, naming rule definition and data storage.

Selecting key points: selecting seven stable key points on the mobile phone shell of the iPhone6S, namely a left upper corner point, a right upper corner point, a left lower corner point, a right lower corner point, two corner points of an apple logo and the center of an 'o' character pattern of the 'iPhone'; the naming rule defines: marking the outline of each mobile phone shell in a scene by data marking software labelme, wherein a label is defined as phone (i), i represents the number of target examples in a picture, key point marking is carried out on each mobile phone shell according to the sequence of the positive direction of the mobile phone shell from top to bottom and from left to right, the label name is defined as phone (i) _ kp (j), j represents that the key point is the jth key point in the mobile phone shell, and when the key point is shielded, the key point position needs to be predicted and marked in the picture; and (3) data storage: the marking information is stored in a json file format, and information such as picture content, mobile phone shell outline, key points and the like is stored in a text form.

In one embodiment, the preset pixel-level keypoint prediction network model includes a convolutional neural network, a region candidate network, and four branches, which are a classification branch, a bounding box fetch branch, a mask fetch branch, and a pixel-level keypoint prediction branch, respectively, and step S500 includes:

Specifically, the pixel-level key point prediction network provided by the invention adopts Mask-RCNN as a backbone network, and a pixel-level key point detection branch is added on the basis of example segmentation. The pixel-level key point prediction network is a Convolutional Neural Network (CNN) firstly, is used for extracting the characteristic information of an image, then obtains a detection frame of a target example through a regional candidate network (RPN), finally connects four branches, and respectively performs four regression tasks of classification, boundary frame acquisition, mask acquisition and pixel-level vector calculation, as shown in FIG. 5.

If N pictures are input, N [ hxwxc ] tensor inputs exist, detection frames of each target are obtained through an RPN, and Nx2 tensors are obtained through classification task branches in each detection frame, wherein 2 represents a target workpiece and a background; the detection branch of the bounding box obtains the tensor with the size of Nx 4, and 4 represents four values of coordinates of upper left corner and lower right corner points of the bounding box; obtaining a tensor of Nx (H multiplied by W) through an example mask branch, and expressing a pixel area where each example is located; the key point detection branch obtains a tensor of size N × [ H × W × (K × 2) ] representing a unit vector diagram pointing to K key points.

Under the pixel-level vector calculation regression task, each pixel in the pixel area of each instance (corresponding to the target workpiece) is subjected to

Defining a vector

Representing each pixel location and 2D keypoints

In order to avoid the influence of the size and position of the workpiece and to distinguish different key points, the offset is normalized to a unit vector, specifically:

wherein the content of the first and second substances,

for each pixel location and 2D keypoint

The amount of the offset of (a) is,

is the position coordinates of the key points,

all pixel locations within the pixel area of the target workpiece.

Then, all pixel level vectors of a single instance are obtained by using the mask of the instance, two pixel points are randomly selected, and the intersection point of the two pixel vectors is used as a key point

Initial assumption of

Repeating the above steps J times to obtain a set of hypotheses

And finally, obtaining a point with the highest score, namely the pixel point most probably as the key point by using a clustering algorithm K-means, and obtaining the pixel coordinate of the pixel point as the key point.

Step S300 further includes using the vector angle error to supervise the pixel level keypoint prediction branch for training, with a penalty function defined as:

wherein the content of the first and second substances,

a unit vector of directions indicating that a point i in the pixel region of each target workpiece points to a keypoint k,

、

respectively representing a prediction vector and a real vector, N representing the number of key points, and M representing the number of pixel points in a single-target workpiece pixel area.

Specifically, the network parameters of the pixel-level key point prediction network are updated through back propagation according to the loss values, and when the back propagation stopping condition is met, the updated pixel-level key point prediction network is used as a preset pixel-level key point prediction network model.

In an embodiment, the loss function preset in step S400 is specifically:

wherein the content of the first and second substances,

、

、

、

in order to classify the function of the loss,

the loss function is detected for the bounding box,

in order to detect the loss function for the mask,

predicting branch loss for pixel level keypointsA function.

Specifically, the penalty function of the pixel-level keypoint prediction branch is described in detail above, and the penalty functions of the other branches are default values in MaskRCNN, classification penalty functions

Using softmax loss, bounding box detects loss

Using smooth L1 loss function, mask generation takes cross entropy as a loss function

。

In one embodiment, step S600 includes:

wherein the content of the first and second substances,

is a reference for the camera to be used,

for the pixel coordinates corresponding to the preset key points,

3D coordinates of the preset key points in a workpiece model coordinate system;

wherein the content of the first and second substances,

Specifically, a 2D key point coordinate [ u, v ] of the workpiece is obtained from the key point prediction network model, an assumption generated by the key points on the surface of the workpiece is selected, a mobile phone shell model coordinate system is established, and coordinates of the key points in the workpiece model coordinate system are obtained, as shown in table 1:

TABLE 1

The 2D-3D correspondence of the keypoints can then be obtained.

Since the conversion relationship between the robot coordinate system and the camera coordinate system is fixed, the step S200 obtains T through the hand-eye calibration_c2rThe 3D coordinates of each key point in the workpiece model coordinate system are fixed as shown in table 1, the pixel coordinates of the key point in the image and the 3D coordinates of the key point in the workpiece model coordinate system are obtained through the key point prediction in step S500, and the conversion relation T between the workpiece model coordinate system and the camera coordinate system is calculated through the PnP algorithm_o2c：

The posture conversion relationship is shown in fig. 6, and the posture of the workpiece in the robot coordinate system can be obtained.

In one embodiment, in step S700, the 6DoF position and posture information of the workpiece in the robot coordinate system is obtained according to the initial pose information of the workpiece, the transformation matrix from the camera coordinate system to the robot coordinate system, and the transformation matrix between the workpiece model coordinate system and the camera coordinate system, and specifically:

wherein the content of the first and second substances,

and representing the initial pose information of the workpiece.

Specifically, a 3D coordinate system is created for the workpiece model, so that the 3D coordinates of each key point in the workpiece model coordinate system can be obtained by measuring the workpiece model data

. Obtaining the pixel coordinates corresponding to each key point in the real-time image by the key point prediction network

According to the PnP algorithm, the model coordinate system and the camera coordinate system can be calculatedInter-conversion matrix

Selecting the geometric center of the mobile phone shell as a picking point, wherein the coordinate of the center under the coordinate system of the workpiece model is

(ii) a Transformation matrix of known workpiece model to camera coordinate system

And a transformation matrix of the camera to robot coordinate system

Defining the initial direction of the workpiece as the direction coordinate of the terminal picker in the robot coordinate system in the vertical picking state of the robot when the mobile phone shell is horizontally placed, representing the deflection angle (Rx, Ry, Rz) of the current mechanical arm terminal picker around the three axes of the robot reference coordinate system x, y and z, and representing the deflection angle as

The initial 6DoF information of the workpiece is defined as

Obtaining the conversion relation between the camera coordinate system and the robot coordinate system through the calibration of hands and eyes

And a transformation matrix between the workpiece model coordinate system and the camera coordinate system

Obtaining the 6DoF position and the attitude information of the workpiece under the robot coordinate system

. The robot picks up the workpiece according to the 6DoF position and posture information of the workpiece, wherein the 6DoF comprises position information with 3 degrees of freedom and direction information with 3 degrees of freedomTherefore, the robot can realize accurate pickup at different angles. Therefore the cell-phone shell is put with any kind of gesture no matter, can calculate all the time that the cell-phone shell is different to put under the gesture, and the terminal pick-up point of robot picks up the direction coordinate of cell-phone shell perpendicularly, combines, realizes that 6DoF picks up.

According to the disordered aliasing workpiece grabbing method based on the key point prediction network, a pixel-level vector calculation branch is introduced into the key point prediction network, and the vector calculated by each pixel point represents the direction of the pixel point pointing to the key point, so that according to the consistency of the directions, under the condition that the key point of the workpiece is shielded, the pixel position most possibly representing the key point can be obtained through voting, and the problem of pose calculation of the shielded key point under the condition that the workpiece is aliased is solved; secondly, the pose calculation is divided into two stages, firstly 2D-3D key points are matched through a key point prediction network, then the pose calculation is realized through PnP, the data set of the invention is only needed to be manufactured by marking the outline of the target workpiece and the key points of the workpiece in the picture, the pose information of each workpiece is not needed to be calculated during marking, therefore, compared with the data set, the data set is simple and convenient to manufacture, the PnP method only needs more than four groups of key points to solve the data, seven key points are determined by the method, therefore, the workpiece pose can be solved by only carrying out PnP calculation on seven groups of point pairs, the characteristics of light weight data calculation are reflected, and the problems that the existing method for calculating the workpiece pose by using deep neural network regression needs to label the 6D pose of training data, the labeling work is very difficult, and the network regression pose uses a violent matching method, so that the calculation amount is large and the time is consumed are solved; finally, the invention uses 6DoF pose information when picking up the workpieces, increases the calculation of direction information, and enables the robot to grab in the initially defined relative posture (the end picker of the robot is vertical to the plane of the workpieces) when picking up the workpieces, thereby enabling the grabbing to be more accurate and improving the success rate.

In one embodiment, the unordered aliasing workpiece grabbing system based on the key point prediction network comprises an image acquisition module, a pose calculation module, a communication module and a pickup module, wherein the image acquisition module is connected with the pose calculation module, the pose calculation module is connected with the pickup module through the communication module, and the image acquisition module is used for acquiring a real-time image and sending the real-time image to the pose calculation module; the pose calculation module is used for executing a disordered aliasing workpiece grabbing method based on a key point prediction network to obtain 6DoF position and posture information of a workpiece in a robot coordinate system and sending the position and posture information to the pickup module through the communication device; and the picking module picks the target workpiece according to the received 6DoF position and posture information of the workpiece under the robot coordinate system.

For specific definition of the unordered aliasing artifact fetching system based on the keypoint prediction network, reference may be made to the above definition of the unordered aliasing artifact fetching method based on the keypoint prediction network, and details are not repeated here.

The method and the system for capturing the disordered aliasing workpiece based on the key point prediction network are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the core concepts of the present invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. The unordered aliasing workpiece grabbing method based on the key point prediction network is characterized by comprising the following steps:

step S600: acquiring 3D coordinates of preset key points in a workpiece model coordinate system, and obtaining a conversion matrix between the workpiece model coordinate system and a camera coordinate system according to the camera internal reference matrix, the 3D coordinates of the preset key points in the workpiece model coordinate system and pixel coordinates of the preset key points in the real-time image;

step S800: and controlling robot grabbing equipment to grab the target workpiece according to the 6DoF position and posture information of the workpiece in the robot coordinate system.

2. The method according to claim 1, wherein step S100 comprises:

step S130: and calibrating by using a Zhang-friend method according to the corner information, and calculating camera internal reference data to obtain a camera internal reference matrix.

3. The method according to claim 2, wherein the predetermined second calibration picture includes nine dots, and the step S200 includes:

step S220: calculating the circle center pixel position of each dot in the image of the preset second calibration picture in random posture arrangement;

step S230: moving a mechanical arm tail end sucker of the robot grabbing equipment to each round point, and recording a 3D coordinate under a corresponding robot coordinate system;

4. The method of claim 3, wherein step S250 comprises:

wherein the content of the first and second substances,

is a scale factor, and is a function of,

is a reference for the camera to be used,

the coordinates of the pixels corresponding to each dot are,

3D coordinates of each dot in a workpiece model coordinate system;

step S252: calculating to obtain a conversion matrix from the robot coordinate system to the camera coordinate system according to the rotation matrix from the robot coordinate system to the camera coordinate system and the translation matrix from the robot coordinate system to the camera coordinate system, specifically:

wherein the content of the first and second substances,

step S253: obtaining a transformation matrix from the camera coordinate system to the robot coordinate system according to the transformation matrix from the robot coordinate system to the camera coordinate system, which specifically comprises the following steps:

wherein the content of the first and second substances,

5. The method of claim 1, wherein after step S200 and before step S500, further comprising:

step S400: calculating the loss value of the pixel-level key point prediction network according to a preset loss function, and performing back propagation to update the network parameters of the pixel-level key point prediction network according to the loss value to obtain an updated pixel-level key point prediction network as a preset pixel-level key point prediction network model.

6. The method according to claim 1, wherein the preset pixel-level keypoint prediction network model comprises a convolutional neural network, a region candidate network, and four branches, which are a classification branch, a bounding box fetch branch, a mask fetch branch, and a pixel-level keypoint prediction branch, respectively, and step S500 comprises:

step S510: inputting the real-time image into the convolutional neural network to extract the characteristic information of the image, and transmitting the characteristic information into the regional candidate network;

step S520: the regional candidate network acquires a detection frame of each target workpiece according to the characteristic information and inputs the detection frame to the four branches;

step S560: and repeating the step S550 for a second preset number of times to obtain a group of hypotheses, obtaining a pixel point with the highest point as a key point by using a clustering algorithm K-means for the group of hypotheses, and obtaining the pixel coordinate of the pixel point as the key point.

7. The method according to claim 6, wherein the loss function preset in step S400 is specifically:

wherein the content of the first and second substances,

、

、

、

in order to classify the function of the loss,

the loss function is detected for the bounding box,

in order to detect the loss function for the mask,

a branch loss function is predicted for the pixel level keypoints.

8. The method of claim 1, wherein step S600 comprises:

step S610: acquiring a 3D coordinate of a preset key point in a workpiece model coordinate system, and obtaining a rotation matrix and a translation matrix between the workpiece model coordinate system and a camera coordinate system according to the camera internal reference matrix, the 3D coordinate of the preset key point in the workpiece model coordinate system and a pixel coordinate of the preset key point in the real-time image, wherein the method specifically comprises the following steps:

wherein the content of the first and second substances,

is a reference for the camera to be used,

for the pixel coordinates corresponding to the preset key points,

3D coordinates of the preset key points in a workpiece model coordinate system;

step S620: obtaining a conversion matrix between the workpiece model coordinate system and the camera coordinate system according to the rotation matrix and the translation matrix between the workpiece model coordinate system and the camera coordinate system, which specifically comprises the following steps:

wherein the content of the first and second substances,

9. The method according to claim 1, wherein in step S700, the 6DoF position and posture information of the workpiece in the robot coordinate system is obtained according to the initial pose information of the workpiece, the transformation matrix from the camera coordinate system to the robot coordinate system, and the transformation matrix between the workpiece model coordinate system and the camera coordinate system, specifically:

wherein the content of the first and second substances,

and representing the initial pose information of the workpiece.

10. The unordered aliasing workpiece grabbing system based on the key point prediction network is characterized by comprising an image acquisition module, a pose calculation module, a communication module and a picking module, wherein the image acquisition module is connected with the pose calculation module, the pose calculation module is connected with the picking module through the communication module,

the pose calculation module is used for executing the method of any one of claims 1 to 9 to obtain 6DoF position and posture information of the workpiece in the robot coordinate system and sending the 6DoF position and posture information to the pickup module through the communication device;

and the picking module picks a target workpiece according to the received 6DoF position and posture information of the workpiece in the robot coordinate system.