CN112857215B

CN112857215B - Monocular 6D pose estimation method based on regular icosahedron

Info

Publication number: CN112857215B
Application number: CN202110023412.7A
Authority: CN
Inventors: 孙昊; 段伦辉; 崔睿; 吴梦坤; 谭英伦
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2022-02-08
Anticipated expiration: 2041-01-08
Also published as: CN112857215A

Abstract

The invention discloses a monocular 6D pose estimation method based on a regular icosahedron. The method combines the Aruco code and the regular icosahedron, then uses a monocular camera to collect images of the regular icosahedron marked by the Aruco code, calculates the spatial three-dimensional coordinates of the central point of the Aruco code, obtains sparse point cloud of an inscribed sphere of the Aruco code, further compares the virtual regular icosahedron at the camera coordinates by utilizing multivariate nonlinear fitting and an iterative closest point algorithm, further calculates the spatial displacement and the rotation angle of the current regular icosahedron, and indirectly calculates the pose of a measured object. The method overcomes the high requirements of the existing method on illumination, environment and equipment, the detection is rapid, and the stability is improved; a large amount of effort is not required to be invested in template acquisition or model training in the early stage, the pose estimation cost is obviously reduced, the pose estimation precision is ensured, the universality is enhanced, and the pose estimation of large-amplitude object motion can be carried out.

Description

Monocular 6D pose estimation method based on regular icosahedron

Technical Field

The invention belongs to the field of image recognition pose detection, and particularly relates to a monocular 6D pose estimation method based on a regular icosahedron.

Background

The pose estimation of the target plays an important role in both production and life, and is a precondition for realizing the flexible and effective motion of the robot. At present, the industrial automation industry is developed rapidly, the requirements of real-time detection and feedback of the state of the robot are continuously improved, and the target pose estimation based on vision has important significance for improving the performance of the robot. The monocular vision system only uses one camera, has a simple structure and low cost, and is widely applied. The main methods for estimating the pose of the target at present comprise:

in documents LEPEIT V, PILET J, FUA P.Point matching as a classification scheme for fast and robust object position estimation [ C ]// Proceedings of the 2004IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2004.CVPR 2004. volume 2.IEEE,2004: II-II. However, the pose estimation method based on the feature points is very susceptible to the influence of illumination and environment, so that the tracking loss of the feature points is caused, and the estimation quality of the target pose is reduced.

In the documents PAVLAKOS G, ZHOU X, CHAN A, et al.6-dof object position from a semantic key points [ C ]//2017IEEE international conference on robotics and automation (ICRA), IEEE,2017:2011 2018, a template-based method is adopted, a two-dimensional or three-dimensional template of an object to be detected is constructed, and a template closest to an actual pose is searched by comparing the relationship between an actual pose image of the current object and the template, so that the current pose of the object is estimated. However, the method needs to construct a large number of template matching libraries for two-dimensional and three-dimensional mapping in the prior period, the workload is huge, the result precision depends on the size of the template matching libraries, and one template library is only suitable for pose estimation of a single object and cannot meet the universality.

The method based on deep learning is used, the deep learning is widely applied in recent years, and the effect of identifying the pose of an object in an image is achieved by training a deep neural network and constructing a loss function. In the documents "REDMON J, DIVVAL S, GIRSHICK R, et al. you only look on:. Unifield, real-time object detection [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognition.2016:779 and 788", the YoLO network is adopted, but the method based on deep learning needs a large amount of sample resources to construct a large amount of sample data and label the real pose, needs a large amount of computing resources to train the network, is difficult to control the cost, is mostly purposefully developed, and cannot meet the universality.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to solve the technical problem of providing a monocular 6D pose estimation method based on a regular icosahedron.

The technical scheme for solving the technical problem is to provide a monocular 6D pose estimation method based on a regular icosahedron, and the method is characterized by comprising the following steps:

firstly, marking a regular icosahedron by using an Aruco code to obtain the Aruco code marked regular icosahedron; calibrating a monocular camera, and observing the regular icosahedron marked by the Aruco code through the calibrated monocular camera;

secondly, positioning a plurality of observed Aruco codes in the current frame and the initial frame respectively, and then obtaining the coding id of all the observed Aruco codes in the current frame and the initial frame respectively_i1,2,3, n and the spatial coordinates a of the central point of the ArUco code_i(x_i,y_i,z_i) 1,2,3, n, n represents the number of observed ArUco codes for a frame;

thirdly, performing multivariate nonlinear fitting on the spatial coordinates of the central point of the Aruco code of the current frame obtained in the second step to calculate the coordinates of the spherical center of the regular icosahedron inscribed sphere of the current frame as the spatial coordinates t of the body center of the regular icosahedron of the current frame₀Performing multivariate nonlinear fitting on the spatial coordinates of the central point of the Aruco code of the initial frame to calculate the spherical center coordinates of the positive icosahedron inscribed sphere of the initial frame as the spatial coordinates t of the body center of the positive icosahedron of the initial frame_O(ii) a Then by the formula t_rel＝t₀-t_OObtaining the relative space coordinate t of the body center of the regular icosahedron of the current frame relative to the initial frame_rel；

Constructing a virtual regular icosahedron at the origin of a camera coordinate system, wherein the center of the virtual regular icosahedronCoinciding with the origin of a camera coordinate system, marking the virtual regular icosahedron by using the Aruco codes and obtaining the codes of all the Aruco codes of the virtual regular icosahedron and the space coordinate A of the central point of the codes_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2,3, ·, 20; then the space coordinate of the central point of the Aruco code of the current frame obtained in the second step and the code thereof are compared with A_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2, 3.. 20, calculating a rotation matrix R of a current frame regular icosahedron by an iterative closest point method, and combining the spatial coordinates of a central point of an Aruco code of an initial frame and the coding thereof with A_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2,3, 20 calculates a rotation matrix R of the regular icosahedron of the initial frame by an iterative closest point method_OThen by the formula

Calculating to obtain a relative rotation matrix R of the current frame regular icosahedron relative to the initial frame_rel；

According to the relative rotation matrix R_relAnd relative spatial coordinate t_relObtaining a position matrix T of the current frame regular icosahedron relative to the initial frame regular icosahedron_rel：

In the formula (2), SE (3) represents an attribute of the matrix;

fourthly, firstly, a measured object coordinate system is constructed, the regular icosahedron is fixed on the measured object, and the relative space coordinate t of the regular icosahedron relative to the measured object is calculated_refAnd relative rotation matrix R_refObtaining a pose matrix T of the regular icosahedron relative to the measured object_ref：

And then the relative pose matrix T of the measured object is as follows:

and obtaining the spatial attitude change of the measured object of the current frame relative to the initial frame.

Compared with the prior art, the invention has the beneficial effects that:

(1) the method creatively combines the Aruco code with the regular icosahedron, expands the pose identification from two-dimensional label identification to three-dimensional space body identification, greatly improves the pose identification performance, uses a monocular camera to collect images of the regular icosahedron marked by the Aruco code, calculates the spatial three-dimensional coordinate of the central point of the Aruco code by using a PnP algorithm, obtains sparse point cloud of an inscribed sphere, compares the virtual regular icosahedron at the camera coordinate position by using multivariate nonlinear fitting and an iterative closest point algorithm, converts the position solving and rotation matrix solving problem into an optimization problem, further calculates the spatial displacement and the rotation angle of the current regular icosahedron, and indirectly calculates the pose of a measured object.

(2) The method overcomes the high requirements of the existing method on illumination, environment and equipment, the detection is rapid, the stability is improved, and the average image recognition time of each frame is about 10 ms; a large amount of effort is not required to be invested in template acquisition or model training in the early period, so that the pose estimation cost is obviously reduced, the pose estimation precision is ensured, and the universality is enhanced; the three-dimensional object is used as a mark, and the pose estimation of the object large-amplitude motion in the space can be carried out.

(3) The regular icosahedron adopted by the invention is a regular polyhedron with the largest number of spatial surfaces, which is an important factor for realizing the identification of any spatial angle and large-amplitude motion pose, and has higher precision compared with other regular polyhedrons.

(4) Compared with a method based on feature points, the method for identifying the Aruco codes by the monocular camera overcomes the sensitivity of the method to factors such as illumination, environment and the like, and can stably operate in most environments.

(5) Compared with a template-based method, the method provided by the invention has the advantages that the target pose is indirectly measured by using the regular icosahedron, the universality is stronger, a large amount of template libraries do not need to be constructed, and the memory space and the workload are saved.

(6) Compared with a deep learning-based method, the algorithm does not need relatively strong hardware performance, saves cost and meets the universality.

Drawings

FIG. 1 is a schematic representation of an Aruco code-tagged regular icosahedron of the present invention;

FIG. 2 is a schematic diagram of a regular icosahedron and its inscribed sphere of the present invention;

FIG. 3 is a schematic diagram of the position relationship between the virtual regular icosahedron of the present invention and the actual regular icosahedron marked by Aruco code.

Detailed Description

Specific examples of the present invention are given below. The specific examples are only intended to illustrate the invention in further detail and do not limit the scope of protection of the claims of the present application.

The invention provides a monocular 6D pose estimation method (short for method) based on a regular icosahedron, which is characterized by comprising the following steps:

firstly, marking the icosahedron by using an Aruco code to obtain the icosahedron marked by the Aruco code (shown in figure 1); calibrating a monocular camera, and observing the regular icosahedron marked by the Aruco code through the calibrated monocular camera;

preferably, in the first step, the process of labeling the n-icosahedron using the ArUco code is: the method comprises the steps of generating 20 Aruco codes by using an OpenCV image processing library, respectively attaching the 20 Aruco codes to 20 faces of a regular icosahedron according to the encoding sequence, and enabling the geometric center of the Aruco codes to coincide with the geometric midpoint of the face to which the Aruco codes are attached.

Preferably, in the first step, the monocular camera is calibrated by using a checkerboard standard calibration board and by using an OpenCV image processing library to calibrate the internal parameters of the camera, so as to generate a camera internal parameter matrix K;

the camera internal reference matrix K is a matrix for converting the coordinates in the space into plane coordinates, and the standardized formula is as follows:

in the formula (1), f_x、f_yIs the focal length parameter of the camera; c. C_x、c_yIs the amount of translation of the pixel.

Secondly, positioning a plurality of observed Aruco codes in the current frame and the initial frame respectively by using an image processing algorithm, and obtaining the coding ids of all observed Aruco codes in the current frame and the initial frame respectively by using the image processing algorithm_i1,2,3, n and obtaining the spatial coordinates A of the center points of all observed Aruco codes in the current frame and the initial frame by using a pose calculation method_i(x_i,y_i,z_i) 1,2,3, n, n represents the number of observed ArUco codes for a frame;

the current frame is an image of the current position of the regular icosahedron, and the initial frame is an image of the initial position of the regular icosahedron shot by the monocular camera;

preferably, in the second step, the positioning process of the ArUco code is as follows: carrying out gray processing, median filtering and self-adaptive threshold segmentation on the image acquired by the monocular camera in sequence, and extracting a candidate region meeting the requirement from the segmented image.

Preferably, in the second step, the coding of the ArUco code is obtained: firstly, applying perspective transformation to a candidate area to obtain a standard square mark, separating black and white color positions, and dividing the mark into different cells according to the size of the mark; and determining the color of each cell according to the color corresponding to the pixel with the largest number on each cell, and finally converting the color into a binary value to determine the code of the mark.

Preferably, in the second step, the spatial coordinates of the central point of the ArUco code are obtained: the spatial coordinates of the central point of the Aruco code are obtained according to the four corner point information of the candidate area, the pose calculation function of the Aruco library function in the OpenCV framework is used, the four corner point information is used as input, and the spatial coordinates of the central point of the Aruco code under a camera coordinate system can be obtained by using a PnP mode (namely a 2D-3D matching algorithm).

Thirdly, performing multivariate nonlinear fitting on the space coordinates of the center points of all Aruco codes of the current frame obtained in the second step to calculate the coordinates of the sphere center of the regular icosahedron inscribed sphere of the current frame as the space coordinates t of the body center of the regular icosahedron of the current frame₀＝(x₀,y₀,z₀) Performing multivariate nonlinear fitting on the spatial coordinates of the central point of the Aruco code of the initial frame to calculate the spherical center coordinates of the positive icosahedron inscribed sphere of the initial frame as the spatial coordinates t of the body center of the positive icosahedron of the initial frame_O＝(x_O,y_O,z_O) (ii) a Then the space coordinate t of the body center of the current frame regular icosahedron is used₀＝(x₀,y₀,z₀) Subtracting the spatial coordinate t of the body center of the regular icosahedron of the initial frame_O＝(x_O,y_O,z_O) By the formula t_rel(x,y,z)＝t₀(x₀,y₀,z₀)-t_O(x_O,y_O,z_O) Obtaining the relative space coordinate t of the body center of the regular icosahedron of the current frame relative to the initial frame_rel；

Constructing a virtual regular icosahedron at the origin of a camera coordinate system, wherein the position relation between the virtual regular icosahedron and the actual regular icosahedron is shown in figure 3, the body center of the virtual regular icosahedron is coincided with the origin Oc of the camera coordinate system, then using Aruco codes to mark the virtual regular icosahedron and obtaining the codes of all Aruco codes of the virtual regular icosahedron and the space coordinates A of the center point of the Aruco codes_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2,3, ·, 20; then the space coordinate of the central point of the Aruco code of the current frame obtained in the second step and the code A thereof_i(x_i,y_i,z_i,id_i) 1,2,3, n and a_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2, 3.. 20, calculating a rotation matrix R of a current frame regular icosahedron by an iterative closest point method, and combining the spatial coordinates of a central point of an Aruco code of an initial frame and the coding thereof with A_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2,3, 20 calculates the initial point by the iterative closest point methodRotation matrix R of start frame regular icosahedron_OThen by the formula

In the formula (2), SE (3) represents a plum group, is a mathematical symbolic expression and represents the attribute of a matrix;

preferably, in the third step, the spatial coordinates of the current frame regular icosahedron body center are obtained: since the geometric center of the Aruco code coincides with the geometric midpoint of the surface to which the Aruco code is attached, and the intersection point of the inscribed sphere of the regular icosahedron and the regular icosahedron is exactly positioned at the geometric center of each surface of the regular icosahedron (as shown in FIG. 2), the spatial coordinate A of the central point of each Aruco code on the partial surface of the regular icosahedron observed by the monocular camera obtained in the second step at the current frame can be used for determining the spatial coordinate A of the central point of each Aruco code on the partial surface of the regular icosahedron_i(x_i,y_i,z_i) 1,2,3, n is regarded as the current frame regular icosahedron inscribed sphere point (x)_i,y_i,z_i) The sparse point cloud coordinates of (1); when the monocular camera obtains the space coordinates larger than three central points of the Aruco codes, the data condition of multivariate nonlinear fitting is met (under the general condition, the monocular camera can obtain 6-8 central point coordinates of the Aruco codes in each observation, namely the condition can be met in each observation), and the current frame is processed into the regular icosahedron inscribed sphere points (x)_i,y_i,z_i) And the radius of the inscribed sphere is taken as a parameter, the standard equation of the inscribed sphere is as follows:

(x_i-x₀)²+(y_i-y₀)²+(z_i-z₀)²-R_{inner part} ²＝0 (3)

In the formula (3), R_{Inner part}Is the radius of the tangent sphere of a regular icosahedron,

a is the edge length of a regular icosahedron;

the spherical center of the spherical multivariate nonlinear function is fitted to the formula (3) by using nonlinear least squares, and a loss function is constructed as follows:

since the requirement of multivariate nonlinear fitting on the initial point is high, the initial point is easy to fall into a local minimum value, and therefore the known spherical point coordinate (x) is utilized_i,y_i,z_i) Generating a group of initial values R of multivariate nonlinear fitting for the coordinates of the spherical center of the inscribed sphere_{First stage}：

Then R is put_{First stage}Obtaining a value of the loss function J in the formula (4), and then continuously changing R_{First stage}The value of (a) is that the value of the loss function J in the formula (4) is continuously reduced, and a group of (x) which enables the loss function J to be minimum can be obtained after multiple iterative solution₀,y₀,z₀) That is, the coordinates of the current frame inscribed sphere center (i.e. the space coordinates t of the current frame regular icosahedron body center under the camera coordinate system)₀＝(x₀,y₀,z₀))。

Preferably, in the third step, the spatial coordinates of the positive icosahedron body center of the initial frame are obtained: taking the spatial coordinates of the central point of each Aruco code on the partial surface of the regular icosahedron observed by the monocular camera in the initial frame as the sparse point cloud coordinates of the spherical surface point of the regular icosahedron inscribed sphere of the initial frame, and obtaining the spatial coordinates of the body center of the regular icosahedron of the initial frame in the same way as the spatial coordinates of the body center of the regular icosahedron of the current frame, namely obtaining the spatial coordinates t of the body center of the regular icosahedron of the initial frame in the camera coordinate system_O＝(x_O,y_O,z_O)。

Preferably, in the third step, the rotation matrix of the current frame regular icosahedron is obtained:

search A_WiAnd matching the codes of the virtual regular icosahedron Aruco code corresponding to the code of the actual regular icosahedron Aruco code of the current frame obtained in the second step, so as to obtain a group of matching points in the space:

(A_Wi,A_i),i＝1,2,3,...,n (6)

the group of matching points consists of the spatial coordinates of the central point of the Aruco code observed on the actual regular icosahedron and the spatial coordinates of the central point of the Aruco code corresponding to the actual regular icosahedron, and centroiding is respectively carried out on the spatial coordinates through a formula (7) (namely, the spatial coordinates of the central point of the Aruco code observed on the actual regular icosahedron and the spatial coordinates of the central point of the Aruco code corresponding to the actual regular icosahedron are translated to the original points):

obtained q_iDe-centrode, q, of the spatial coordinates of the center point of the actual regular icosahedral Aruco code_i' is a centroid removing point of the space coordinate of the central point of the virtual regular icosahedron Aruco code, and then q is obtained_iAnd q is_iThe relationship of' is as follows:

defining an error term:

the rotation matrix R of the current frame regular icosahedron, which minimizes the error term E, is found according to equation (9), the first term in equation (9)

Independent of the optimization objective, R in the second term^TI is also independent of the optimization objective, the error term becomes:

in the formula (10), tr represents a trace of the matrix;

to solve the optimization objective R in equation (10), a matrix is defined:

performing singular value decomposition on W in equation (11) to obtain:

W＝UΣV^T (12)

in equation (12), Σ is the eigenvalue matrix of W, U and V are diagonal matrices, and then the rotation matrix R of the current frame regular icosahedron is:

R＝UV^T (13)。

preferably, in the third step, the rotation matrix of the positive icosahedron of the initial frame is obtained: search A_WiThe coding of the virtual regular icosahedron Aruco code corresponding to the coding of the actual regular icosahedron Aruco code of the initial frame obtained in the second step is matched with the same coding, and the rotation matrix R of the initial frame regular icosahedron can be obtained by the same method as the obtaining of the rotation matrix of the current frame regular icosahedron_O。

And fourthly, indirectly calculating the 6D pose of the measured object according to the space relative position relation between the regular icosahedron and the measured object.

Preferably, the fourth step is specifically: firstly, a measured object coordinate system is constructed, the regular icosahedron is fixed on the measured object, and the relative position and posture relation between the regular icosahedron and the measured object is calculated (namely, the relative space coordinate t of the regular icosahedron relative to the measured object is calculated_refAnd relative rotation matrix R_ref) Obtaining a pose matrix T of the regular icosahedron relative to the measured object_ref：

And then the relative pose matrix T of the measured object is as follows:

and obtaining the space attitude change of the measured object of the current frame relative to the initial frame.

The camera calibration method, the Aruco code generation method, the image processing algorithm and the pose calculation method are known methods in the field.

Nothing in this specification is said to apply to the prior art.

Claims

1. A monocular 6D pose estimation method based on a regular icosahedron is characterized by comprising the following steps:

thirdly, performing multivariate nonlinear fitting on the spatial coordinates of the central point of the Aruco code of the current frame obtained in the second step to calculate the coordinates of the spherical center of the regular icosahedron inscribed sphere of the current frame as the spatial coordinates t of the body center of the regular icosahedron of the current frame₀Performing multivariate nonlinear fitting on the spatial coordinates of the central point of the Aruco code of the initial frame to calculate the spherical center coordinates of the positive icosahedron inscribed sphere of the initial frame as the positive icosahedron body of the initial frameSpatial coordinates t of the heart_O(ii) a Then by the formula t_rel＝t₀-t_OObtaining the relative space coordinate t of the body center of the regular icosahedron of the current frame relative to the initial frame_rel；

Constructing a virtual regular icosahedron at the origin of a camera coordinate system, enabling the body center of the virtual regular icosahedron to coincide with the origin of the camera coordinate system, marking the virtual regular icosahedron by using the Aruco codes and obtaining the codes of all the Aruco codes of the virtual regular icosahedron and the space coordinates A of the center points of the codes_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2,3, ·, 20; then the space coordinate of the central point of the Aruco code of the current frame obtained in the second step and the code thereof are compared with A_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2, 3.. 20, calculating a rotation matrix R of a current frame regular icosahedron by an iterative closest point method, and combining the spatial coordinates of a central point of an Aruco code of an initial frame and the coding thereof with A_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2,3, 20 calculates a rotation matrix R of the regular icosahedron of the initial frame by an iterative closest point method_OThen by the formula

In the formula (2), SE (3) represents an attribute of the matrix;

in the third step, obtaining a rotation matrix of the current frame regular icosahedron:

search A_WiCoding pair of actual regular icosahedron Aruco codes of current frame obtained in the middle and second stepsAnd (3) the corresponding codes of the Aruco code of the virtual regular icosahedron are matched with each other, so that a group of matching points in the space are obtained:

(A_Wi,A_i),i＝1,2,3,...,n (6)

the group of matching points consists of the spatial coordinates of the central point of the Aruco code observed on the actual regular icosahedron and the spatial coordinates of the central point of the Aruco code corresponding to the actual regular icosahedron, and the centroiding of the Aruco code is respectively removed by the formula (7):

defining an error term:

in the formula (10), tr represents a trace of the matrix;

to solve the optimization objective R in equation (10), a matrix is defined:

performing singular value decomposition on W in equation (11) to obtain:

W＝UΣV^T (12)

R＝UV^T (13)

And then the relative pose matrix T of the measured object is as follows:

2. The method for estimating the monocular 6D pose based on a regular icosahedron as claimed in claim 1, wherein in the first step, the process of marking the regular icosahedron by using Aruco code is as follows: the method comprises the steps of generating 20 Aruco codes by using an OpenCV image processing library, respectively attaching the 20 Aruco codes to 20 faces of a regular icosahedron according to the encoding sequence, and enabling the geometric center of the Aruco codes to coincide with the geometric center of the face to which the Aruco codes are attached.

3. The monocular 6D pose estimation method based on a regular icosahedron as claimed in claim 1, wherein in the first step, a chessboard grid standard calibration plate is used for calibration of the monocular camera, and an OpenCV image processing library is used for calibration of camera internal parameters to generate a camera internal parameter matrix K;

4. The monocular 6D pose estimation method based on a regular icosahedron as claimed in claim 1, wherein in the second step, the positioning process of the ArUco code is: carrying out gray processing, median filtering and self-adaptive threshold segmentation on the image acquired by the monocular camera in sequence, and extracting a candidate region meeting the requirement from the segmented image.

5. The method for estimating monocular 6D pose based on regular icosahedron according to claim 4, wherein in the second step, the coded acquisition of ArUco code: firstly, applying perspective transformation to a candidate area to obtain a standard square mark, separating black and white color positions, and dividing the mark into different cells according to the size of the mark; and determining the color of each cell according to the color corresponding to the pixel with the largest number on each cell, and finally converting the color into a binary value to determine the code of the mark.

6. The monocular 6D pose estimation method based on a regular icosahedron as claimed in claim 4, wherein in the second step, the spatial coordinates of the central point of the ArUco code are obtained: and solving the spatial coordinates of the central point of the Aruco code according to the four corner point information of the candidate area, using a pose calculation function of an Aruco library function in an OpenCV frame, taking the four corner point information as input, and obtaining the spatial coordinates of the central point of the Aruco code under a camera coordinate system by using a PnP mode.

7. The monocular 6D pose estimation method based on a regular icosahedron as claimed in claim 1, characterized in that, in the third step, the spatial coordinates of the body center of the regular icosahedron of the current frame are obtained: the space coordinate of the central point of the Aruco code on the partial surface of the regular icosahedron observed by the current frame obtained in the second step is taken as the inscribed sphere spherical point (x) of the regular icosahedron of the current frame_i,y_i,z_i) The sparse point cloud coordinates of (1); when the monocular camera obtains the space coordinate which is larger than the central point of three Aruco codes, the spherical surface point (x) of the tangent sphere of the regular icosahedron of the current frame is obtained_i,y_i,z_i) And the radius of the inscribed sphere is taken as a parameter, the standard equation of the inscribed sphere is as follows:

(x_i-x₀)²+(y_i-y₀)²+(z_i-z₀)²-R_{inner part} ²＝0 (3)

a is the edge length of a regular icosahedron;

Then R is put_{First stage}Obtaining a value of the loss function J in the formula (4), and then continuously changing R_{First stage}The value of (a) is that the value of the loss function J in the formula (4) is continuously reduced, and a group of (x) which enables the loss function J to be minimum can be obtained after multiple iterative solution₀,y₀,z₀) I.e. the coordinates of the spherical center of the inscribed sphere of the current frame.