CN115008454A

CN115008454A - Robot online hand-eye calibration method based on multi-frame pseudo label data enhancement

Info

Publication number: CN115008454A
Application number: CN202210607581.XA
Authority: CN
Inventors: 钟小品; 虞金龙; 朱文轩; 邓元龙; 吴宗泽
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2022-09-06

Abstract

The invention discloses a robot online hand-eye calibration method based on multi-frame pseudo label data enhancement, which comprises the following steps: generating a plurality of synthetic robot pictures and key point labels as a synthetic data set; training a key point detection model by using a synthetic data set to obtain a pre-training key point detection model, detecting a real scene robot picture based on the pre-training key point detection model to obtain real scene robot key point pseudo label data, obtaining enhanced robot key point pseudo label data by using a multi-frame pseudo label data enhancement method, and finely adjusting the pre-training key point detection model to obtain a final robot key point detection model; and calculating a three-dimensional coordinate set of key points, a two-dimensional key point set and camera internal parameters, and completing on-line hand-eye calibration. The invention combines multi-frame robot image information, jointly estimates the hand-eye relationship, and then re-projects to obtain the label process, thereby improving the accuracy of the pseudo label and further improving the performance of the on-line hand-eye calibration of the system.

Description

Robot online hand-eye calibration method based on multi-frame pseudo label data enhancement

Technical Field

The invention relates to the technical field of image data processing, in particular to a robot online hand-eye calibration method, system, terminal and computer readable storage medium based on multi-frame pseudo tag data enhancement.

Background

Giving robot vision allows flexibility to adapt to complex and novel tasks, and how to obtain the posture relationship between the robot and the externally mounted camera is a fundamental problem in robot vision research, which is also called hand-eye calibration problem (for example, regarding the robot as a hand and regarding the externally mounted camera as an eye).

For the hand-eye calibration problem, taking eye-to-hand as an example, the traditional solution is to fix the positions of the base and the camera of the robot in the world coordinate system, then change the pose of the tail end of the robot, establish an equation in the form of AX ═ BX through the invariance between the positions of the base and the camera of the robot, and solve the equation, wherein,

showing the base-to-tip transformation relationship when in pose 1,

indicating the end-to-base transformation when in pose 2,

indicating the transformation of the plate to the camera in pose 1,

indicating the transformation of the camera to the calibration plate when in pose 2,

representing the camera to base translation.

And finally, solving the position and posture relation between the robot base and the camera, namely the hand-eye relation. For the solution of AX ═ BX, specific solution methods include a separation solution, a synchronous solution, and an iterative solution. Although the hand-eye relationship solved by the method can meet the basic robot operation application, the calibration process is very complicated and needs to be carried out offline. Undoubtedly, it is difficult to rapidly deploy robots in industrial production depending on the conventional method. Therefore, it is necessary to search an online hand-eye calibration method for a robot.

In recent years, with the rise of a deep neural network and the continuous research of researchers on the online hand-eye calibration method, many online hand-eye calibration methods based on the deep neural network emerge to deal with the problem that the steps of the traditional hand-eye calibration algorithm are complicated, the methods usually train a key Point detection network, combine positive kinematics of a robot and a PnP algorithm (passive-n-Point is a method for solving the Point-to-Point motion from 3D to 2D, when n 3D spatial points and their projection positions are known, estimate the pose of a camera, and determine the 3D position of a feature Point by a depth map of a triangularization or RGB-D camera, so in a binocular or RGB-D visual odometer, the camera motion can be directly estimated by using PnP), so as to complete the hand-eye calibration.

Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

The invention mainly aims to provide a robot online hand-eye calibration method, a system, a terminal and a computer readable storage medium based on multi-frame pseudo label data enhancement, and aims to solve the problems that in the prior art, a hand-eye calibration algorithm based on deep learning only uses a model trained by synthetic data, and the generalization is poor and the real data labeling cost is high.

In order to achieve the above object, the present invention provides an online robot hand-eye calibration method based on multi-frame pseudo tag data enhancement, which comprises the following steps:

generating a plurality of robot model animation sequences, performing field randomization on the robot model animation sequences, and outputting a plurality of synthetic robot pictures and key point labels as synthetic data sets;

preprocessing the synthetic data set, training a key point detection model by using the preprocessed synthetic data set to obtain a pre-trained key point detection model, extracting feature information of pictures by using an encoder based on the pre-trained key point detection model to obtain a feature map, mapping the feature map into a plurality of confidence maps by using a decoder, and obtaining key point coordinates according to the confidence maps;

acquiring a robot key point image in a real scene, predicting the acquired robot key point image by using the pre-trained key point detection model, taking a prediction result as an original key point pseudo label, solving a hand-eye relationship by combining the original key point pseudo labels of multi-frame images and using an EPnP (enhanced nearest neighbor) algorithm, re-projecting key points by using the hand-eye relationship to obtain enhanced key point pseudo label data, and adjusting the pre-trained key point detection model by using the enhanced key point pseudo label data to obtain a final robot key point detection model;

and calculating to obtain a key point three-dimensional coordinate set, a two-dimensional key point set and camera internal parameters, and completing online hand-eye calibration based on the robot key point detection model according to the key point three-dimensional coordinate set, the two-dimensional key point set and the camera internal parameters.

Optionally, the method for calibrating hands and eyes of a robot online based on multi-frame pseudo tag data enhancement, where the generating of the animation sequences of multiple robot models specifically includes:

importing an STL three-dimensional model of the robot into a blender simulation software supporting an STL format, setting key points needing to record coordinates on the robot model in the blender simulation software to finish predefining the key points, and establishing a connection relation among all connecting rods to enable all joints of the robot to rotate randomly;

and automatically generating a preset number of robot model animation sequences with each joint rotating randomly by using a python interface in the blend simulation software.

Optionally, the method for calibrating the hands and eyes of the robot on line based on the multi-frame pseudo tag data enhancement includes performing domain randomization on the robot model animation sequence, and outputting multiple synthetic robot pictures and key point tags as a synthetic data set, and specifically includes:

importing a robot model animation sequence generated by the blend simulation software into a UE4 game engine in a fbx format, and performing domain randomization in a UE4 game engine; the field randomization comprises randomizing joint angles of the robot, randomizing a shooting background, randomizing colors of all parts of the robot and randomizing relative postures of the robot and a camera;

the method comprises the steps of shooting pictures by using a virtual camera of a UE4 game engine, recording key point coordinates of each frame of picture by using an NDDS plug-in, and outputting a preset number of synthetic robot pictures and key point two-dimensional coordinate labels as a synthetic data set.

Optionally, the method for calibrating hands and eyes of a robot on line based on multi-frame pseudo tag data enhancement includes preprocessing the synthesized data set, training a key point detection model by using the preprocessed synthesized data set to obtain a pre-trained key point detection model, extracting feature information of a picture by using an encoder of the pre-trained key point detection model to obtain a feature map, mapping the feature map into multiple confidence maps by using a decoder, and obtaining coordinates of key points according to the confidence maps, and specifically includes:

taking the synthetic data set as a training set, and preprocessing the training set;

training the key point detection model by using the preprocessed training set to obtain a pre-training key point detection model;

extracting feature information of the picture by an encoder based on the pre-training key point detection model to obtain a feature map, and mapping the feature map into a plurality of confidence maps by a decoder;

and after denoising each confidence map based on Gaussian smoothing, taking coordinates of peak values to obtain 2D coordinates of each key point of the robot in the image.

Optionally, in the method for calibrating the robot online hand-eye based on multi-frame pseudo tag data enhancement, the number of channels of the confidence map is the number of the key points, and the value of the confidence map is the probability that the pixel is the key point.

Optionally, the method for calibrating a robot online hand-eye based on multi-frame pseudo tag data enhancement includes acquiring a robot key point image in a real scene, predicting the acquired robot key point image by using the pre-trained key point detection model, using a prediction result as an original key point pseudo tag, solving a hand-eye relationship by combining the original key point pseudo tag of the multi-frame image and using an EPnP algorithm, re-projecting key points by using the hand-eye relationship to obtain enhanced key point pseudo tag data, and adjusting the pre-trained key point detection model by using the enhanced key point pseudo tag data to obtain a final robot key point detection model, which specifically includes:

under the condition of keeping the relative postures of a robot base and a camera unchanged in a real scene, changing the joint rotation angle of the robot and collecting multi-frame images as a group of data, then changing the angle of the camera, and repeatedly collecting a plurality of groups of robot key point images;

predicting the acquired robot key point image by using the pre-training key point detection model, and taking a prediction result as an original key point pseudo label, wherein the specific expression is as follows:

wherein I represents a robot picture, theta represents a model parameter, F represents a model detection process,

representing the pixel coordinates of the nth keypoint on the picture,

original pseudo label, L, representing all key points on the mth picture ^ori Representing a set of all original keypoint pseudo-labels on a set of data;

set L of pseudo labels of all original key points on a group of data ^ori Three-dimensional coordinate point set P corresponding to pseudo label ^label And using an EPnP algorithm to solve the reference K in the camera to obtain the original hand-eye relation

By passing

And K sets of three-dimensional coordinate points P ^label Carrying out re-projection to obtain a preliminarily optimized pseudo label set L ^rep And calculating L ^ori And L ^rep The euclidean distance of each element in (a) is specifically expressed as follows:

wherein m is the number of pictures, n is the number of key points,

a reprojected pseudo label representing all keypoints on the set of ith pictures,

representing a reprojected pseudo label of a j key point on the graph, wherein W is the Euclidean distance between an original pseudo label and a reprojected pseudo label of a certain key point;

setting a threshold value, and setting a set L of pseudo labels of original key points according to W of each key point ^ori Removing outliers with Euclidean distances of the original pseudo labels and the re-projected pseudo labels larger than a threshold value, and calculating by using an EPnP algorithm to obtain a new hand-eye relationship

Use of

Three-dimensional coordinate set of key points under robot base coordinate system ^b And P, carrying out re-projection to obtain enhanced robot key point pseudo tag data, specifically expressed as follows:

wherein L is ^opt Representing the enhanced robot key point pseudo label data;

and adding the enhanced pseudo label data of the robot key points into a training set, and adjusting the pre-training key point detection model according to the updated training set to obtain a final robot key point detection model.

Optionally, the method for calibrating the hands and eyes of the robot on line based on the multi-frame pseudo tag data enhancement includes the steps of obtaining a three-dimensional coordinate set of key points, a two-dimensional key point set and camera parameters through calculation, completing the calibration of the hands and eyes on line based on the robot key point detection model according to the three-dimensional coordinate set of key points, the two-dimensional key point set and the camera parameters, and specifically including:

solving a predefined three-dimensional coordinate set of key points under n robot base coordinate systems according to angle sensor data and positive kinematics of each joint of the robot ^b P, as follows:

calibrating the camera by using a Zhangyingyou camera calibration method to obtain the camera internal reference K, wherein the complete matrix of the camera internal reference K is expressed as follows:

wherein, f _x And f _y Representing camera focal length in terms of pixels, c _x And c _y Represents the offset of the origin of the physical imaging plane on the u-axis and the v-axis;

using a camera to shoot a robot picture, and using a final robot key point detection model to predict two-dimensional coordinates of key points in an RGB (red, green and blue) image, wherein the coordinates are expressed as follows:

wherein p represents n two-dimensional key point sets in the image, and p _i Representing the position coordinates of the ith key point in the image;

three-dimensional coordinate set of key points ^b P, a two-dimensional key point set P and camera internal parameters K are used as input, and an EPnP algorithm is used for solving to obtain a hand-eye relation

Using a single imageAnd (5) online hand-eye calibration.

In addition, in order to achieve the above object, the present invention further provides an online robot hand-eye calibration system based on multi-frame pseudo tag data enhancement, wherein the online robot hand-eye calibration system based on multi-frame pseudo tag data enhancement comprises:

the system comprises a synthesis data set generation module, a key point label generation module and a data processing module, wherein the synthesis data set generation module is used for generating a plurality of robot model animation sequences, performing field randomization on the robot model animation sequences and outputting a plurality of synthesis robot pictures and key point labels as synthesis data sets;

the feature extraction module is used for preprocessing the synthetic data set, training a key point detection model by using the preprocessed synthetic data set to obtain a pre-training key point detection model, extracting feature information of pictures by using an encoder based on the pre-training key point detection model to obtain a feature map, mapping the feature map into a plurality of confidence maps by using a decoder, and obtaining key point coordinates according to the confidence maps;

the data enhancement and fine adjustment module is used for acquiring a robot key point image in a real scene, predicting the acquired robot key point image by using the pre-training key point detection model, taking a prediction result as an original key point pseudo label, solving a hand-eye relationship by combining the original key point pseudo label of a multi-frame image and using an EPnP algorithm, re-projecting key points by using the hand-eye relationship to obtain enhanced key point pseudo label data, and adjusting the pre-training key point detection model by using the enhanced key point pseudo label data to obtain a final robot key point detection model;

and the online hand-eye calibration module is used for calculating to obtain a key point three-dimensional coordinate set, a two-dimensional key point set and camera internal parameters, completing online hand-eye calibration according to the key point three-dimensional coordinate set, the two-dimensional key point set and the camera internal parameters and based on the robot key point detection model.

In addition, to achieve the above object, the present invention further provides a terminal, wherein the terminal includes: the method comprises a memory, a processor and a robot online hand-eye calibration program which is stored on the memory and can be run on the processor and is based on multi-frame pseudo tag data enhancement, wherein when the robot online hand-eye calibration program is executed by the processor, the steps of the robot online hand-eye calibration method based on multi-frame pseudo tag data enhancement are realized.

In addition, in order to achieve the above object, the present invention further provides a computer readable storage medium, wherein the computer readable storage medium stores a robot online hand-eye calibration program enhanced based on multi-frame pseudo tag data, and when the robot online hand-eye calibration program enhanced based on multi-frame pseudo tag data is executed by a processor, the steps of the robot online hand-eye calibration method enhanced based on multi-frame pseudo tag data as described above are implemented.

The invention adopts the idea of pseudo labels in semi-supervised learning to label key points in a real robot image of an irrelevant key point two-dimensional coordinate label, and provides a multi-frame pseudo label data joint enhancement method to improve the accuracy of the pseudo labels aiming at the problem that the pseudo labels often contain wrong labels, thereby improving the performance of on-line hand-eye calibration of the system. The method for processing the pseudo labels at the key points of the robot, which is adopted by the invention, particularly a process of jointly estimating the hand-eye relationship by combining multi-frame image information of the robot and then re-projecting to obtain the labels, is a processing method selected by theory and practice.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the robot online hand-eye calibration method based on multi-frame pseudo tag data enhancement of the present invention;

FIG. 2 is a schematic diagram of the method for robot online hand-eye calibration based on multi-frame pseudo tag data enhancement according to the present invention, in which the key points are predefined and the connection relationships between the links are established;

FIG. 3 is a schematic diagram of multi-frame pseudo tag data joint enhancement in a preferred embodiment of the robot online hand-eye calibration method based on multi-frame pseudo tag data enhancement of the present invention;

FIG. 4 is a schematic diagram of online hand-eye calibration in a preferred embodiment of the method for robot online hand-eye calibration based on multi-frame pseudo tag data enhancement of the present invention;

FIG. 5 is a schematic diagram of a preferred embodiment of the robot online hand-eye calibration system based on multi-frame pseudo tag data enhancement according to the present invention;

FIG. 6 is a diagram illustrating an operating environment of a terminal according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the method for calibrating the online hand-eye of the robot based on the enhancement of the multi-frame pseudo tag data according to the preferred embodiment of the present invention, as shown in fig. 1, the method for calibrating the online hand-eye of the robot based on the enhancement of the multi-frame pseudo tag data includes the following steps:

and step S10, generating a plurality of robot model animation sequences, performing field randomization on the robot model animation sequences, and outputting a plurality of synthetic robot pictures and key point labels as a synthetic data set.

Specifically, an STL three-dimensional model of a robot is imported into a blender simulation software supporting an STL format, key points needing to record coordinates are set on a robot model in the blender simulation software (three-dimensional graphic image software providing a series of animation short-film making solutions from modeling, animation, material, rendering, audio processing, video clipping and the like) to complete the predefining of the key points, and a connection relationship (i.e. a parent-child relationship, as shown in fig. 2, 8 key points are provided, i.e. 8 key points are provided for each picture) between connecting rods is established, so that each joint of the robot can rotate freely; and automatically generating a preset number (for example, 20000) of robot model animation sequences with each joint rotating randomly by using a python interface in the blender simulation software.

Then, the robot model animation sequence generated by the blender simulation software is imported into a UE4 (universal Engine 4) game Engine in the fbx format, and the domain randomization is performed in the UE4 game Engine; the field randomization comprises randomizing joint angles of the robot, randomizing a shooting background, randomizing colors of all parts of the robot and randomizing relative postures of the robot and a camera; and finally, shooting pictures by using a virtual camera of the UE4 game engine, recording the coordinates of key points of each frame of picture by using an NDDS plug-in, and finally outputting a preset number (for example, 20000) of pictures of the synthetic robot and the two-dimensional coordinate labels of the key points as a synthetic data set.

And step S20, preprocessing the synthetic data set, training a key point detection model by using the preprocessed synthetic data set to obtain a pre-training key point detection model, extracting feature information of pictures by using an encoder based on the pre-training key point detection model to obtain a feature map, mapping the feature map into a plurality of confidence maps by using a decoder, and obtaining key point coordinates according to the confidence maps.

Specifically, the synthetic data set is used as a training set, for example, 20000 synthetic data sets are used as the training set, and the training set is preprocessed before network training, mainly including cropping and rotating pictures at different degrees, so as to achieve the purpose of enhancing the generalization capability of the model; training a key point detection model (a Simple Baselines key point detection model in the field of human body key point detection, and a network belongs to a coding-decoding framework) by using the preprocessed training set to obtain a pre-trained key point detection model, so that the trained network has certain robot key point detection capability; due to the field difference between the synthetic scene and the real scene, the model trained on the synthetic data set has poor effect when being applied to the real scene, so that the model trained on the synthetic data set is used as a pre-training key point detection model, and then fine adjustment is carried out on the basis of the pre-training key point detection model; extracting feature information of the picture by an encoder based on the pre-training key point detection model to obtain a feature map, and mapping the feature map into a plurality of confidence maps by using a decoder, wherein the number of channels of the confidence maps is the number of key points, and the value of the confidence maps is the probability that the pixel is the key point; and (3) denoising each confidence map (reducing noise influence) based on Gaussian smoothing, and then taking coordinates of the peak value to obtain 2D coordinates of each key point of the robot in the image.

Step S30, collecting robot key point images in a real scene, predicting the collected robot key point images by using the pre-training key point detection model, using a prediction result as an original key point pseudo label, solving a hand-eye relationship by combining the original key point pseudo labels of multi-frame images and using an EPnP algorithm, re-projecting key points by using the hand-eye relationship to obtain enhanced key point pseudo label data, and adjusting the pre-training key point detection model by using the enhanced key point pseudo label data to obtain a final robot key point detection model.

Specifically, as shown in fig. 3, real scene data acquisition: under the real scene, a camera is erected in a laboratory, under the condition that the relative postures of a robot base and the camera are kept unchanged, the joint rotation angle of the robot is changed, multi-frame images are collected to serve as a group of data, then the angle of the camera is changed, and multiple groups of images of key points of the robot are repeatedly collected, so that more visual angle information is provided for a subsequent fine adjustment model.

Generating an original pseudo label: predicting the acquired robot key point image by using the pre-training key point detection model, and taking a prediction result as an original key point pseudo label, wherein the specific expression is as follows:

representing the pixel coordinates of the nth keypoint on the picture,

original pseudo label, L, representing all key points on the mth picture ^ori Representing the set of all original keypoint pseudo-labels on a set of data.

Multi-frame pseudo label data joint enhancement: set L of pseudo labels of all original key points on a group of data ^ori Three-dimensional coordinate point set P corresponding to pseudo label ^label And using an EPnP algorithm to solve the reference K in the camera to obtain the original hand-eye relation

As shown in fig. 3, by

And K sets of three-dimensional coordinate points P ^label Carrying out reprojection to obtain a pseudo label set L ^rep (As shown in FIG. 3, L may actually be represented in red ^ori Yellow represents L ^rep ) And calculating L ^ori And L ^rep The euclidean distance of each element in (2) is specifically expressed as follows:

wherein m is the number of pictures, n is the number of key points,

representing a re-projection pseudo label of a jth key point on a graph, wherein W is the Euclidean distance between an original pseudo label of a certain key point and the re-projection pseudo label;

as shown in FIG. 3, a threshold is set, and a set L of pseudo labels of original key points is set according to W of each key point ^ori Removing outliers (significant false detection points) with Euclidean distances of the original pseudo labels and the re-projected pseudo labels larger than a threshold value, and calculating by using an EPnP algorithm to obtain a new hand-eye relationship

The influence of the pseudo label with overlarge error with the real label on the EPnP solution is effectively reduced, the accuracy of the hand-eye relationship solution is improved, and therefore the accuracy of the pseudo label is improved

Three-dimensional coordinate set of key points under robot base coordinate system ^b And P, carrying out reprojection to obtain enhanced robot key point pseudo tag data, specifically expressed as follows:

wherein L is ^opt And representing the enhanced robot key point pseudo label data.

Through multi-frame pseudo label combined data enhancement, the accuracy of the attitude solving result can be improved, and pseudo labels can be provided for the missed detection points. And finally, fine tuning the model by using the enhanced pseudo tag data, so that the performance of the model in a real scene can be enhanced.

As shown in fig. 3, adding the enhanced pseudo tag data of the key point of the robot to a training set, adjusting the pre-trained key point detection model according to the updated training set (i.e., continuing training to fine-tune the model based on the pre-trained key point detection model), in the process of fine tuning the model, training the model to be convergent, then performing multi-frame pseudo tag data joint enhancement again, then fine tuning the model again, and iterating the processes of pseudo tag enhancement and fine tuning the model until the model is convergent, thereby obtaining the final detection model of the key point of the robot.

And S40, calculating to obtain a three-dimensional coordinate set of key points, a two-dimensional key point set and camera internal parameters, and completing online hand-eye calibration based on the robot key point detection model according to the three-dimensional coordinate set of key points, the two-dimensional key point set and the camera internal parameters.

Specifically, as shown in fig. 4, a three-dimensional coordinate set of key points under a predefined n robot-based coordinate system is solved according to angle sensor data and positive kinematics of each joint of the robot ^b P, as follows:

as shown in fig. 4, the camera is calibrated by using the zhangnyou camera calibration method to obtain the camera internal reference K, and the complete matrix of the camera internal reference K is expressed as follows:

wherein f is _x And f _y Representing camera focal length in terms of pixels, c _x And c _y Representing the offset of the origin of the physical imaging plane in the u-axis and the v-axis.

As shown in fig. 4, a camera is used to take a robot picture, and the final robot key point detection model is used to predict the two-dimensional coordinates of the key points in the RGB image, which is expressed as follows:

as shown in fig. 4, willThree-dimensional coordinate set of key points ^b P, a two-dimensional key point set P and camera internal parameters K are used as input, and an EPnP algorithm is used for solving to obtain a hand-eye relation

And (3) completing online hand-eye calibration by using the single image (namely, completing online hand-eye calibration by using the single image).

The invention relates to a robot online hand-eye calibration method based on multi-frame pseudo label data enhancement, which comprises the steps of firstly using blender simulation software and a UE4 game engine to perform key point predefining and various field randomization processing operations, and outputting a synthetic data set from the UE4 game engine; then, pre-training a key point detection model based on an encoder-decoder network by using a synthetic data set to obtain a pre-training key point detection model on the synthetic data set; then, a pre-training key point detection model is used for detecting the robot key points collected in a real scene to obtain an original pseudo label of each robot image, the key point pseudo labels of multi-frame images are combined to obtain a key point pseudo label set L of the multi-frame images ^ori Pseudo label set L using multi-frame image union ^ori Carrying out EPnP attitude estimation to obtain an initial hand-eye relationship, and then re-projecting the key points on the image by using the hand-eye relationship to obtain a preliminarily optimized pseudo label L ^rep Calculating L ^ori And L ^rep Setting a threshold value to remove the original pseudo label set L according to the Euclidean distance of each element in the set ^ori And (3) carrying out EPnP attitude estimation again to obtain a more reliable hand-eye relationship, and re-projecting the key points to the image by using the hand-eye relationship to obtain enhanced pseudo label data L of the robot key points ^opt (ii) a Finally, the enhanced pseudo label data L ^opt And adding a training set fine adjustment model to improve the key point detection capability of the model in a real scene, further improving the hand-eye calibration performance of the system, and using the final robot key point detection model after fine adjustment and combining positive kinematics and an EPnP algorithm to complete an online hand-eye calibration task.

Has the advantages that:

(1) compared with the traditional hand-eye calibration algorithm, the hand-eye calibration method can realize hand-eye calibration by only one image, and has strong real-time performance;

(2) compared with the existing on-line hand-eye calibration algorithm based on deep learning, the method adopts the thought of a semi-supervised algorithm, introduces the high-quality real scene pseudo label, and solves the problems of poor generalization of a model trained by synthetic data and high real data labeling cost;

(3) compared with the existing robot pseudo label generation method, the method provided by the invention has the advantages that the quality of the pseudo label is improved by using a multi-frame pseudo label joint enhancement method, and the accuracy of on-line hand-eye calibration is further improved.

According to the scheme of the invention, through practice tests, under the condition that only one picture is used for carrying out hand-eye calibration, the error is 4.65cm, the grabbing task can be completed in a scene without high-precision requirement, and the position of the camera can be changed freely based on the vision grabbing task. The invention can obtain high-quality pseudo label data to improve the effect of the robot key point detection model and construct a system to complete online hand-eye calibration.

Further, as shown in fig. 5, based on the above method for calibrating a robot online hand-eye based on multi-frame pseudo tag data enhancement, the present invention also provides a system for calibrating a robot online hand-eye based on multi-frame pseudo tag data enhancement, wherein the system for calibrating a robot online hand-eye based on multi-frame pseudo tag data enhancement comprises:

a synthetic data set generating module 51, configured to generate a plurality of robot model animation sequences, perform domain randomization on the robot model animation sequences, and output a plurality of synthetic robot pictures and key point tags as a synthetic data set;

the feature extraction module 52 is configured to pre-process the synthetic dataset, train a key point detection model with the pre-processed synthetic dataset to obtain a pre-trained key point detection model, extract feature information of a picture based on an encoder of the pre-trained key point detection model to obtain a feature map, map the feature map into a plurality of confidence maps with a decoder, and obtain a key point coordinate according to the confidence map;

the data enhancement and fine adjustment module 53 is configured to collect a robot key point image in a real scene, predict the collected robot key point image by using the pre-training key point detection model, use a prediction result as an original key point pseudo tag, solve a hand-eye relationship by combining the original key point pseudo tag of a multi-frame image and using an EPnP algorithm, re-project key points by using the hand-eye relationship to obtain enhanced key point pseudo tag data, and adjust the pre-training key point detection model by using the enhanced key point pseudo tag data to obtain a final robot key point detection model;

and the online hand-eye calibration module 54 is configured to calculate a three-dimensional coordinate set of key points, a two-dimensional key point set and camera parameters, complete online hand-eye calibration according to the three-dimensional coordinate set of key points, the two-dimensional key point set and the camera parameters, and based on the robot key point detection model.

Further, as shown in fig. 6, based on the above method and system for calibrating hands and eyes of a robot on line based on multi-frame pseudo tag data enhancement, the present invention further provides a terminal, where the terminal includes a processor 10, a memory 20, and a display 30. Fig. 6 shows only some of the components of the terminal, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.

The memory 20 may in some embodiments be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 20 may also be an external storage device of the terminal in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal. Further, the memory 20 may also include both an internal storage unit and an external storage device of the terminal. The memory 20 is used for storing application software installed in the terminal and various types of data, such as program codes of the installation terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In an embodiment, the memory 20 stores thereon a multi-frame pseudo tag data enhancement-based robot online hand-eye calibration program 40, and the multi-frame pseudo tag data enhancement-based robot online hand-eye calibration program 40 can be executed by the processor 10, so as to implement the multi-frame pseudo tag data enhancement-based robot online hand-eye calibration method in the present application.

The processor 10 may be a Central Processing Unit (CPU), a microprocessor or other data Processing chip in some embodiments, and is configured to execute program codes stored in the memory 20 or process data, for example, execute the robot online hand-eye calibration method based on multi-frame pseudo tag data enhancement, and the like.

The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the terminal and for displaying a visual user interface. The components 10-30 of the terminal communicate with each other via a system bus.

In one embodiment, when the processor 10 executes the robot online hand-eye calibration program 40 enhanced based on multi-frame pseudo tag data in the memory 20, the following steps are implemented:

The generating of the animation sequence of the multiple robot models specifically comprises the following steps:

The field randomization of the robot model animation sequence is performed, and a plurality of synthetic robot pictures and key point labels are output as a synthetic data set, and the method specifically includes the following steps:

importing a robot model animation sequence generated by the blender simulation software into a UE4 game engine in a fbx format, and performing domain randomization in a UE4 game engine; the field randomization comprises randomizing joint angles of the robot, randomizing a shooting background, randomizing colors of all parts of the robot and randomizing relative postures of the robot and a camera;

The method includes the steps of preprocessing the synthetic data set, training a key point detection model by using the preprocessed synthetic data set to obtain a pre-trained key point detection model, extracting feature information of pictures by using an encoder based on the pre-trained key point detection model to obtain a feature map, mapping the feature map into a plurality of confidence maps by using a decoder, and obtaining key point coordinates according to the confidence maps, and specifically includes the following steps:

The channel number of the confidence map is the number of the key points, and the value of the confidence map is the probability that the pixel is the key point.

The method includes the steps of collecting a robot key point image in a real scene, predicting the collected robot key point image by using a pre-training key point detection model, using a prediction result as an original key point pseudo label, solving a hand-eye relationship by combining the original key point pseudo label of a multi-frame image and using an EPnP algorithm, re-projecting key points by using the hand-eye relationship to obtain enhanced key point pseudo label data, and adjusting the pre-training key point detection model by using the enhanced key point pseudo label data to obtain a final robot key point detection model, and specifically includes the steps of:

representing the pixel coordinates of the nth keypoint on the picture,

original pseudo label, L, representing all key points on the m-th picture ^ori Representing a set of all original keypoint pseudo-labels on a set of data;

By passing

wherein m is the number of pictures, n is the number of key points,

setting a threshold value, and setting a set L of pseudo labels of original key points according to W of each key point ^ori Removing outliers with Euclidean distance between the original pseudo label and the re-projected pseudo label larger than a threshold value, and calculating by using an EPnP algorithm to obtain a new hand-eye relationship

Use of

wherein L is ^opt Representing the enhanced robot key point pseudo label data;

The method comprises the following steps of calculating to obtain a three-dimensional coordinate set of key points, a two-dimensional key point set and camera internal parameters, completing online hand-eye calibration based on the robot key point detection model according to the three-dimensional coordinate set of key points, the two-dimensional key point set and the camera internal parameters, and specifically comprising the following steps of:

wherein f is _x And f _y Representing camera focal length in terms of pixels, c _x And c _y Represents the offset of the origin of the physical imaging plane on the u-axis and the v-axis;

three-dimensional coordinate set of key points ^b P, two-dimensional key point set P and camera parameter K are used as inputThe hand-eye relation is obtained by solving the EPnP algorithm

And completing online hand-eye calibration by using a single image.

The invention also provides a computer-readable storage medium, wherein the computer-readable storage medium stores a robot online hand-eye calibration program based on multi-frame pseudo tag data enhancement, and when being executed by a processor, the robot online hand-eye calibration program based on multi-frame pseudo tag data enhancement realizes the steps of the robot online hand-eye calibration method based on multi-frame pseudo tag data enhancement.

In summary, the present invention provides a robot online hand-eye calibration method based on multi-frame pseudo tag data enhancement and related devices, where the method includes: generating a plurality of robot model animation sequences, performing field randomization on the robot model animation sequences, and outputting a plurality of synthetic robot pictures and key point labels as synthetic data sets; preprocessing the synthetic data set, training a key point detection model by using the preprocessed synthetic data set to obtain a pre-trained key point detection model, extracting feature information of pictures by using an encoder based on the pre-trained key point detection model to obtain a feature map, mapping the feature map into a plurality of confidence maps by using a decoder, and obtaining key point coordinates according to the confidence maps; acquiring a robot key point image in a real scene, predicting the acquired robot key point image by using the pre-trained key point detection model, taking a prediction result as an original key point pseudo label, solving a hand-eye relationship by combining the original key point pseudo label of a multi-frame image and using an EPnP algorithm, effectively reducing the influence of a pseudo label with an overlarge error with a real label on the EPnP solution, improving the accuracy of the hand-eye relationship solution, re-projecting key points by using the hand-eye relationship, obtaining enhanced key point pseudo label data, improving the accuracy of the pseudo label, and adjusting the pre-trained key point detection model by using the enhanced pseudo label data to obtain a final robot key point detection model; and calculating to obtain a key point three-dimensional coordinate set, a two-dimensional key point set and camera internal parameters, and completing online hand-eye calibration based on the robot key point detection model according to the key point three-dimensional coordinate set, the two-dimensional key point set and the camera internal parameters. The invention combines multi-frame robot image information, jointly estimates the hand-eye relationship, and then re-projects to obtain the label process, thereby improving the accuracy of the pseudo label and further improving the performance of the on-line hand-eye calibration of the system.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by instructing relevant hardware (such as a processor, a controller, etc.) through a computer program, and the program can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods described above. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. The method for calibrating the hands and eyes of the robot on line based on the enhancement of the data of the multi-frame pseudo labels is characterized by comprising the following steps of:

acquiring a robot key point image in a real scene, predicting the acquired robot key point image by using the pre-training key point detection model, taking a prediction result as an original key point pseudo label, solving a hand-eye relationship by combining the original key point pseudo label of a multi-frame image and using an EPnP algorithm, re-projecting key points by using the hand-eye relationship to obtain enhanced key point pseudo label data, and adjusting the pre-training key point detection model by using the enhanced key point pseudo label data to obtain a final robot key point detection model;

2. The method for calibrating hands and eyes of the robot based on the multi-frame pseudo tag data enhancement according to claim 1, wherein the generating of the animation sequence of the plurality of robot models specifically comprises:

3. The method for calibrating hands and eyes of a robot on line based on data enhancement of multi-frame pseudo labels according to claim 2, wherein the field randomization is performed on the animation sequence of the robot model, and a plurality of synthetic robot pictures and key point labels are output as a synthetic data set, and specifically comprises:

4. The method for calibrating hands and eyes of a robot on line based on multi-frame pseudo tag data enhancement according to claim 3, wherein the step of preprocessing the synthesized data set, the step of training the key point detection model by using the preprocessed synthesized data set to obtain a pre-trained key point detection model, the step of extracting feature information of a picture by using an encoder of the pre-trained key point detection model to obtain a feature map, the step of mapping the feature map into a plurality of confidence maps by using a decoder, and the step of obtaining key point coordinates according to the confidence maps specifically comprises the steps of:

5. The multi-frame pseudo tag data enhancement-based robot online hand-eye calibration method according to claim 4, wherein the number of channels of the confidence map is the number of key points, and the value of the confidence map is the probability that the pixel is a key point.

6. The method for calibrating the hands and eyes of the robot on line based on the multi-frame pseudo tag data enhancement according to claim 4, wherein the method comprises the steps of collecting images of key points of the robot in a real scene, predicting the collected images of the key points of the robot by using the pre-trained key point detection model, using a prediction result as pseudo tags of original key points, solving a hand-eye relationship by combining the pseudo tags of the original key points of the multi-frame images and using an EPnP algorithm, re-projecting key points by using the hand-eye relationship to obtain enhanced pseudo tag data of the key points, and adjusting the pre-trained key point detection model by using the enhanced pseudo tag data of the key points to obtain a final detection model of the key points of the robot, and specifically comprises the steps of:

wherein, I represents a robot picture, and theta representsThe model parameters, F, represent the model detection process,

representing the pixel coordinates of the nth keypoint on the picture,

By passing

wherein m is the number of pictures, n is the number of key points,

Use of

wherein L is ^opt Representing the enhanced robot key point pseudo label data;

7. The method for calibrating the hands and eyes of the robot on line based on the multi-frame pseudo tag data enhancement according to claim 6, wherein the calculation is performed to obtain a three-dimensional coordinate set of key points, a two-dimensional key point set and camera parameters, and the on-line hands and eyes calibration is completed based on the robot key point detection model according to the three-dimensional coordinate set of key points, the two-dimensional key point set and the camera parameters, and specifically comprises:

And completing online hand-eye calibration by using a single image.

8. A robot online hand-eye calibration system based on multi-frame pseudo tag data enhancement is characterized by comprising:

9. A terminal, characterized in that the terminal comprises: the method comprises a memory, a processor and a multi-frame pseudo tag data enhancement-based robot online hand-eye calibration program stored on the memory and capable of running on the processor, wherein the multi-frame pseudo tag data enhancement-based robot online hand-eye calibration program realizes the steps of the multi-frame pseudo tag data enhancement-based robot online hand-eye calibration method according to any one of claims 1 to 7 when executed by the processor.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a multi-frame pseudo tag data enhancement-based robot online hand-eye calibration program, and when the multi-frame pseudo tag data enhancement-based robot online hand-eye calibration program is executed by a processor, the steps of the multi-frame pseudo tag data enhancement-based robot online hand-eye calibration method according to any one of claims 1 to 7 are implemented.