CN113393524A

CN113393524A - Target pose estimation method combining deep learning and contour point cloud reconstruction

Info

Publication number: CN113393524A
Application number: CN202110676959.7A
Authority: CN
Inventors: 陈从平; 姚威; 张力; 江高勇; 周正旺; 丁坤; 张屹; 戴国洪
Original assignee: Changzhou University
Current assignee: Changzhou University
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2021-09-14
Anticipated expiration: 2041-06-18
Also published as: CN113393524B

Abstract

The invention relates to the technical field of target pose estimation, in particular to a target pose estimation method combining deep learning and contour point cloud reconstruction, which comprises the following steps of: s1, calibrating the binocular vision system and performing stereo correction; s2, recognizing the targets in the left camera image and the right camera image by using the trained target detection network model, and obtaining the boundary area of the targets; s3, carrying out straight line segment detection on the boundary area of the target detected in the left and right camera images by using an LSD algorithm; s4, matching the straight line segment by combining the output of the deep learning target detection network and a multi-constraint method; s5, reconstructing contour point cloud of the target; and S6, carrying out pose estimation on the target. According to the method, through the left camera and the right camera, a YOLOv4 deep learning algorithm is utilized and the contour point cloud reconstruction is combined, the calculation time of stereo matching is short, the calculation amount is small, and the cost of a common camera is greatly reduced.

Description

Target pose estimation method combining deep learning and contour point cloud reconstruction

Technical Field

The invention relates to the technical field of target pose estimation, in particular to a target pose estimation method combining deep learning and contour point cloud reconstruction.

Background

The pose estimation aims at acquiring a three-dimensional coordinate and a three-dimensional rotation vector of a target to be measured in a camera coordinate system; in many cases, only the 6D pose of the target is accurately estimated, so that the next operation and decision of the machine can be facilitated; for example, in related tasks of the intelligent robot, the 6D pose of the target is identified, so that useful information can be provided for grabbing and motion planning; in virtual reality applications, the 6D pose of a target is key to supporting virtual interaction between any objects.

The existing 6D pose estimation method is mainly a point cloud registration method, can process targets with complex shapes and weak textures, and has good accuracy and robustness; the point cloud registration method can be divided into a binocular vision-based method and a depth camera-based method according to different methods for acquiring point cloud data.

Most of existing binocular vision-based methods firstly solve a disparity map of a scene through an SGBM (Semi-Global Block Matching) stereo Matching algorithm, then reconstruct point cloud of the scene according to the disparity map and segment a target, and finally register the point cloud with template point cloud to obtain the pose of the target; however, in the method, the point cloud of the whole scene is reconstructed, so that the calculation time of stereo matching is too long; according to the scheme based on the depth camera, the point cloud of the target is firstly obtained, the three-dimensional characteristics of each point in the point cloud are calculated, and then pose estimation is carried out according to the three-dimensional characteristics; therefore, how to provide a pose estimation method with small calculation amount, low cost and high precision is a problem that needs to be solved urgently by the technical personnel in the field.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the image target boundary regions of the left camera and the right camera are obtained by utilizing a YOLOv4 deep learning algorithm, the target is subjected to straight line segment detection and matching by utilizing an LSD algorithm and a multi-constraint method, the contour point cloud of the target is reconstructed, and the pose of the target is estimated, so that the calculation time of stereo matching is short, the calculation amount is small, and in addition, the development cost is greatly reduced by utilizing a common camera to acquire pictures.

The technical scheme adopted by the invention is as follows: a target pose estimation method combining deep learning and contour point cloud reconstruction comprises the following steps:

s1, calibrating a binocular vision system by adopting a Zhangangyou checkerboard calibration mode, calibrating and performing three-dimensional correction on a binocular vision camera by using a Bouguet algorithm based on calibrated parameters, selecting various targets as analysis objects, training the various targets through a YOLOv4 network, and establishing various target detection network models;

s2, recognizing the targets in the left camera image and the right camera image by using the trained target detection network model, and obtaining the boundary area of the targets;

s3, carrying out straight line segment detection on the boundary area of the target detected in the left and right camera images by using an LSD algorithm;

s4, matching the straight line segment by combining the type, the boundary area and the multi-constraint method output by the deep learning target detection network;

s5, reconstructing contour point cloud of the target;

and S6, performing pose estimation on the target by using a point cloud registration method.

Further, step S3 includes:

s31, taking out the end points of all the straight line segments, and grouping the end points with Euclidean distances smaller than a set threshold value d into the same group;

s32 for any specific oneGroup end points, if the number of the end points in the group is more than or equal to 2, the end points in the group are merged into the same point, which is marked as P_r，P_rThe calculation is as follows:

wherein, P_iIs the intersection point of the extension lines of the straight line segments to which any two end points in the current group respectively belong, n is the number of the end points in the group,

a combination number representing 2 endpoints selected from the n endpoints;

s33, use of P_rAnd obtaining the optimized and reconstructed straight line segment as the common end point of the straight line segment where the group of end points are located.

Further, step S4 includes:

s41, calculating the lengths S of all the optimized and reconstructed straight line segments and the included angle theta between the straight line segments and the positive direction of the transverse axis of the image;

s42, for a certain straight line segment l to be matched in the left camera image_lRecording the target boundary region Rect to which it belongs_lThen, the boundary region Rect of the same target is found in the right camera image_rTaking out Rect_rAll the straight line segments in (1) are marked as a straight line segment set L_r；

S43, collecting L in straight line segment_rThe horizontal coordinate of the middle eliminating midpoint is more than l_lA straight line segment of the middle point abscissa obtains a new straight line segment set L'_r；

S44, calculating l_lAnd L's'_rEach straight line segment l_r(l_r∈L_r) Horizontal error E of_eLength error E_sAnd the angle error E_θ：

Wherein, y_ls、y_rsAre each l_lAnd l_rOrdinate of the starting end point of (2), y_le、y_reAre each l_lAnd l_rThe ordinate of the termination endpoint of (1); s_l、s_rAre respectively a straight line segment l_lAnd l_rLength of (d); theta_l、θ_rAre respectively a straight line segment l_lAnd l_rThe included angle with the positive direction of the transverse axis of the image;

s45, mixing E_e，E_s，E_θAccording to E ═ E_e E_s E_θ]The form of the E is spliced into a matching error vector E, and each value in the E is normalized;

s46, calculating a straight line segment l_lAnd set L_rError value E of matching for each straight line segment in the graph_total：

Get E_totalThe straight line segment with the smallest value is taken as l_lAre matched with straight line segments.

Further, step S5 includes:

s51, setting the starting end point and the ending end point of the matched straight line segment in the left camera image and the right camera image as p respectively_ls(u_ls,v_ls)、p_le(u_le,v_le) And p_rs(u_rs,v_rs)、p_re(u_re,v_re) A 1 is to p_ls(u_ls,v_ls) And p_rs(u_rs,v_rs) U of (a)_ls、u_rs、v_lsValues of u are respectively substituted for formula (4)_l、u_r、v_lReconstructing the starting endpoint P of the three-dimensional space straight line segment_s(x_s,y_s,z_s) (ii) a Then p is put_le(u_le,v_le) And p_re(u_re,v_re) U of (a)_le、u_re、v_leValues of u are respectively substituted for formula (4)_l、u_r、v_lReconstructing the termination point P of the three-dimensional space straight line segment_e(x_e,y_e,z_e)：

Wherein u is_l、u_rFor the abscissa, v, of the point to be reconstructed in the left and right camera images, respectively_lFor the ordinate of the point to be reconstructed in the left camera image, b is the base-line distance of the binocular camera, (u)₀,v₀) Is the coordinate value of the center of the optical axis of the left camera, f is the focal length of the two cameras, (X)_c,Y_c,Z_c) Three-dimensional coordinates of the reconstructed point in a left camera coordinate system;

s52, two three-dimensional end points P are reconstructed_sAnd P_eCalculating a spatial straight-line equation L (x, y, z) represented by the equation:

the direction vector of the straight line L (x, y, z) is n (x)_e-x_s,y_e-y_s,z_e-z_s) And then unitized to n_unit(x_unit,y_unit,z_unit)；

S53 substituting the initial end point P of the straight line segment into the three-dimensional space_s(x_s,y_s,z_s) (x) to formula (6)_i-1,y_i-1,z_i-1) Then iterate step by step to the termination endpoint P_e(x_e,y_e,z_e) Generating a point cloud of the spatial straight line segment:

wherein (x)_i,y_i,z_i) Is the coordinate of the current point in the iterative process, (x)_i-1,y_i-1,z_i-1) Δ S is a preset overlap for the coordinates of the previous pointReplacing step length, namely the space distance between adjacent points in the scattered three-dimensional point cloud;

and S54, generating point clouds of all the matched straight line segments to obtain the contour point cloud of the target.

Further, step S6 includes:

s61, in an off-line state, generating a complete contour point cloud of the target to be detected as a template point cloud by using CAD, and calculating a Fast Point Feature Histogram (FPFH) of the template point cloud;

s62, taking the template point cloud generated in the step S61 as a source point cloud P, taking the reconstructed contour point cloud of the target as a target point cloud Q, and calculating the FPFH of the target point cloud Q;

s63, randomly selecting k sampling points from the source point cloud P, wherein k is an integer larger than 3, searching a plurality of points with similar FPFH (flat-panel display frequency) with the sampling points from the target point cloud Q, and then randomly selecting a corresponding point as the sampling point;

s64, calculating a transformation matrix of the point correspondences, and then calculating a transformation error by adopting a Huber penalty function which is recorded as

Wherein H (e)_i) The calculation formula is as follows:

in the formula: t is t_eIs a predetermined value, e_iRepresenting the distance difference of the transformed ith point pair;

s65, repeatedly executing the steps S63 and S64 until the preset iteration times are reached, and finally taking the transformation matrix which enables the transformation error to be minimum as an initial transformation matrix;

s66, applying the initial transformation matrix to the source point cloud P to obtain a new source point cloud P';

s67, for each point in the new source point cloud P', finding the closest euclidean distance in the target point cloud Q as the corresponding point, and then calculating the transformation matrix and the corresponding error E (R, T):

wherein E (R, T) represents the error between the new source point cloud P' and the target point cloud Q under the transformation matrix (R, T); p is a radical of_iAnd q is_iRespectively the coordinates of each point in the source point cloud P' and the target point cloud Q;

s68, applying the transformation matrix obtained in the step S67 to the source point cloud P ', obtaining a new source point cloud P ', and calculating the error E (R, T) of the P ' and the target point cloud Q;

and S69, repeatedly executing the steps S67 and S68 until E (R, T) or the iteration number meets the set condition (the set condition is that E (R, T) is smaller than a preset error value or the steps S67 and S68 are repeatedly executed), finally solving a rotation and translation matrix between the two point clouds, and decomposing the rotation and translation matrix into three-dimensional coordinates and three-dimensional rotation vectors, namely the pose of the target.

The invention has the beneficial effects that:

1. a target pose estimation method based on binocular vision and combined with deep learning and contour point cloud reconstruction is provided and realized;

2. the cost is lower than that of a depth camera scheme by using a binocular camera scheme;

3. the contour point cloud of the target is reconstructed and registered, the operation efficiency is higher compared with that of dense point cloud processing of the target, and certain precision is guaranteed.

Drawings

FIG. 1 is a flow chart of a target pose estimation method combining deep learning and contour point cloud reconstruction according to the present invention;

FIG. 2 is a graph of the relationship between the training rounds of the YOLOv4 network and the loss value of the present invention;

FIG. 3 is a diagram of the effect of the confidence in the test set of the YOLOv4 network of the present invention;

FIG. 4 is a comparison graph of the effect of straight line segments before and after the optimized reconstruction of the present invention;

FIG. 5 is a diagram illustrating the effects of final target detection and identification and straight line segment matching according to the present invention;

FIG. 6 is a diagram of the point cloud process of the present invention for generating all matching straight line segments;

FIG. 7 is a cloud of target contour points reconstructed by the present invention;

FIG. 8 is a cloud of complete contour points CAD generated in the target coordinate system in accordance with the present invention;

FIG. 9 is the final point cloud registration map of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and examples, which are simplified schematic drawings and illustrate only the basic structure of the invention in a schematic manner, and therefore only show the structures relevant to the invention.

As shown in fig. 1, S1, calibrating and stereoscopically correcting the binocular camera, selecting multiple targets as analysis objects, training the multiple targets through the YOLOv4 network, and establishing multiple target detection network models;

carrying out implementation mode verification on the invention by taking three representative targets of a square type, a slice type and an angle type as objects, collecting 400 target images of each type, wherein the number of the target images is 1200, dividing marked samples into a training set, a verification set and a test set according to the number ratio of 8:1:1, inputting the training set into a YOLOv4 network to start learning, setting the initial momentum of the network to be 0.9, adjusting the initial learning rate to be 0.001 and setting the size of a training batch to be 8; freezing a main network, carrying out 30 rounds of preheating training, setting the learning rate to be 0.0001, and training the unfrozen whole network for 30 rounds; the samples in the verification set are input in each training round for identification and verification, and the loss value is calculated, and the result is shown in fig. 2, so that the network is converged after 60 rounds of training; the test set is tested by using the trained model, and the partial result of the confidence degree predicted by the Yolov4 network is shown in FIG. 3, so that the trained model can accurately identify and frame the target.

Calibrating a binocular vision system by adopting a Zhangyingyou checkerboard calibration mode, and performing stereo correction on a binocular camera by using a Bouguet stereo correction method based on calibrated parameters.

And S2, acquiring field pictures by using the binocular camera after stereo correction, and identifying all targets in the images of the left camera and the right camera by using the trained YOLOv4 network to obtain the boundary area of the targets.

S3, carrying out straight line segment detection on the target boundary areas detected in the left camera image and the right camera image by using an LSD algorithm;

because the LSD algorithm considers that each pixel point can only belong to one straight-line segment at most, when two or more intersected straight-line segments are detected, the straight-line segments are generally disconnected from the intersection point, and the end points of the straight-line segments need to be recalculated and merged to optimize and reconstruct the straight-line segments; firstly, taking out the end points of all the straight line segments, merging the end points if the distance between the end points is greater than a set threshold value, and merging the end points P_rAs common endpoints of these straight line segments, i.e., intersections:

the optimized and reconstructed straight line segment is obtained, the process of the optimized and reconstructed straight line segment is shown in fig. 4, and it can be seen that when the intersected straight line segment is detected by a simple LSD algorithm before optimization, the straight line segment is disconnected from the intersection point, but after optimization, the contour of the target discontinuity is correctly connected.

S46, calculating l_lAnd L's'_rIn (1)Each straight line segment l_r(l_r∈L_r) Is matched with the error value E_totalTaking out E_totalThe straight line segment with the smallest value is taken as l_lThe matching straight line segment of (1);

the final target detection recognition and straight line segment matching results are shown in fig. 5, and numerical values in the figure represent the Yolov4 network recognition confidence coefficients of three target pictures of a square type, a thin type and an angle type of a left camera and a right camera; for convenient display, the matching result of the square target is only displayed in the figure, and the method can be used for correctly matching the straight line segments in the images of the left camera and the right camera.

S5, reconstructing contour point cloud of the target;

s52, obtaining a three-dimensional space straight line equation by reconstructing two three-dimensional end points of the straight line segments, performing iterative sampling on the three-dimensional space straight line to generate point clouds of all matched straight line segments, generating the point clouds of all matched straight line segments by the process shown in FIG. 6, and finally reconstructing a target contour point cloud shown in FIG. 7; as can be seen from fig. 7, the contour point cloud of the target is accurately reconstructed, and the position in space is the same as the actual position; because the method only matches and reconstructs the contour of the target, compared with the method for reconstructing the whole scene, the method has the advantages of smaller calculated amount and higher operation efficiency.

S6, performing pose estimation on the target by using a point cloud registration method;

generating a complete contour point cloud defined under a target coordinate system as shown in fig. 8 by using a CAD, then performing point cloud registration with the reconstructed contour point cloud to obtain a pose of the target, wherein a final point cloud registration result is shown in fig. 9; therefore, after registration, two point clouds are approximately overlapped, and the pose estimation result is correct and has high precision.

Meanwhile, the registration experiment of dense point cloud on the target surface is realized, the running time of the registration experiment is recorded, and then the registration experiment is compared with the registration of the contour point cloud of the method, and the result is shown in table 1:

TABLE 1

It can be seen that the average processing speed is improved by about 50 times because the number of points is less while the contour point cloud retains the target structure information to the maximum extent.

The actual pose of the hand measurement and the pose calculated by the algorithm in this document are subjected to error calculation, and then the absolute value is taken, and the result is shown in table 2:

TABLE 2

Therefore, the position errors in all directions estimated by the method are smaller than 0.7mm, the attitude errors are smaller than 0.9 degrees, and the requirements of practical application are met.

The invention has the beneficial effects that: a target pose estimation method based on binocular vision and combined with deep learning and contour point cloud reconstruction is provided and realized; the cost is lower than that of a depth camera scheme by using a binocular camera scheme; the contour point cloud of the target is reconstructed and registered, the operation efficiency is higher compared with that of dense point cloud processing of the target, and certain precision is guaranteed.

In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims

1. A target pose estimation method combining deep learning and contour point cloud reconstruction is characterized by comprising the following steps:

s1, calibrating and performing stereo correction on the binocular camera, selecting various targets as analysis objects, training the various targets through a YOLOv4 network, and establishing various target detection network models;

s4, matching the straight line segment by combining the output category, the boundary area and the multi-constraint method of the deep learning target detection network;

s5, reconstructing contour point cloud of the target;

2. The method for estimating the pose of an object by combining deep learning and reconstruction of a contour point cloud according to claim 1, wherein the step S3 comprises:

s32, aiming at any specific group of endpoints, if the number of the endpoints in the group is more than or equal to 2, merging the endpoints in the group into the same point, and marking as P_r，P_rThe calculation is as follows:

a combination number representing 2 endpoints selected from the n endpoints;

3. The method for estimating the pose of an object by combining deep learning and reconstruction of a contour point cloud according to claim 1, wherein the step S4 comprises:

4. The method for estimating the pose of an object by combining deep learning and reconstruction of a contour point cloud according to claim 1, wherein the step S5 comprises:

s52, two three-dimensional end points P are reconstructed_sAnd P_eThe spatial straight-line equation L (x, y, z) it represents is calculated,the calculation formula is as follows:

wherein (x)_i,y_i,z_i) Is the coordinate of the current point in the iterative process, (x)_i-1,y_i-1,z_i-1) The coordinate of the previous point is adopted, and the delta S is a preset iteration step length, namely a space distance between adjacent points in the dispersed three-dimensional point cloud;

5. The method for estimating the pose of an object by combining deep learning and reconstruction of a contour point cloud according to claim 1, wherein the step S6 comprises:

Wherein H (e)_i) The calculation formula is as follows:

and S69, repeatedly executing the steps S67 and S68 until E (R, T) or the iteration number meets the set condition, finally solving a rotation and translation matrix between the two point clouds, and decomposing the rotation and translation matrix into a three-dimensional coordinate and a three-dimensional rotation vector, namely the pose of the target.