CN111311679A

CN111311679A - Free floating target pose estimation method based on depth camera

Info

Publication number: CN111311679A
Application number: CN202010077687.4A
Authority: CN
Inventors: 肖晓晖; 赵尚宇; 张勇; 汤自林
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-01-31
Filing date: 2020-01-31
Publication date: 2020-06-19
Anticipated expiration: 2040-01-31
Also published as: CN111311679B

Abstract

The invention relates to the field of visual positioning, and discloses a free floating target pose estimation method based on a depth camera, which comprises the steps of obtaining RGB (red, green and blue) images, depth images and point cloud information of a free floating target; importing a three-dimensional model of a target into OpenGL, and selecting a proper viewpoint acquisition template; detecting the target by using the collected template, RGB image and depth image, and calculating the similarity epsilon_sObtaining a set of matching templates; estimating the initial pose of the target by using the training pose information (R, t) contained in the matching template to obtain the initial pose phi_initAnd an initial position p_init(ii) a After point clouds are preprocessed, the initial posture is corrected by utilizing an ICP algorithm to obtain a final postureResult (phi)_final,p_final) And therefore, the pose estimation process of the free floating target is completed.

Description

Free floating target pose estimation method based on depth camera

Technical Field

The invention relates to a free floating target pose estimation method based on a depth camera, and belongs to the technical field of visual positioning.

Background

With the continuous deepening of space exploration, the number of in-orbit spacecrafts is more and more, and fault satellites and orbital garbage are accumulated continuously. The use of space robots for satellite capture of faults, satellite maintenance and orbital refuse removal has become a popular research direction in recent years. Most targets are non-cooperative targets, in a free-floating state. In order for the space robot to correctly capture the target, the pose of the target capture point needs to be accurately estimated. The existing mainstream pose estimation methods comprise an RGB image-based pose estimation method and a depth image-based pose estimation method, wherein the RGB image-based pose estimation method is used for extracting feature points in a color image and calculating a pose, and the depth image-based pose estimation method is used for extracting feature descriptors in a point cloud and calculating and matching the pose. The robustness of the two methods is not high, and the requirements on the environment are high. The method provided by the invention combines the RGB image, the depth image and the point cloud information, improves the accuracy and the calculation efficiency, and reduces the requirement on the environment.

Disclosure of Invention

The invention provides a free floating target pose estimation method based on a depth camera, which combines RGB (red, green and blue) images, depth images and point cloud information, improves the accuracy and the calculation efficiency and reduces the requirement on the environment.

The invention can be realized by the following technical scheme:

a free floating target pose estimation method based on a depth camera is characterized by comprising the following steps:

step 1, target template acquisition, namely importing a three-dimensional model of a target into OpenGL, selecting a viewpoint and acquiring a template;

step 2, detecting the target, and detecting the target by utilizing the acquired template, the RGB image and the depth image and a template matching-based method to obtain a group of matching templates;

step 3, calculating the initial pose of the target, namely calculating the initial pose phi of the target by utilizing the training pose information (R, t) contained in the matching template in the previous step_initAnd an initial position p_init；

Step 4, correcting the pose of the targetAfter point cloud preprocessing, correcting the pose based on an ICP (inductively coupled plasma) algorithm to obtain a final pose result (phi)_final,p_final)。

In the above free floating target pose estimation method based on the depth camera, the template acquisition process includes two processes of viewpoint sampling and redundancy reduction; the specific operation process of viewpoint sampling is as follows: after a three-dimensional model of a target is imported into OpenGL, an upper hemisphere with 162 vertexes is adopted, and the sampling angle step length of an azimuth angle is 15 degrees; the radius of the enclosing sphere is varied in steps of 10 cm; meanwhile, in order to acquire the template rotating in the plane, the camera rotates around the Z axis of the camera when acquiring each viewpoint, and the step length of the rotation angle is set to be 10 degrees.

In the above method for estimating pose of a free-floating target based on a depth camera, in the detection of the target, a method based on template matching is adopted, the gradient direction in an RGB image and the normal vector direction in a depth image are taken as features, and a template is defined as T ({ O) ═ O_m}_m∈M,P)；

Wherein O is a template feature representing a gradient direction or a normal vector direction; m is a modality, representing either an RGB image or a depth image; p is a set of doublets (r, m), r being the location of the feature in the template image;

and (3) performing similarity calculation on each template and the image T at the position c in a sliding window mode:

wherein, W represents a window area with c + r as the center; t is the position of the center of the window area; when the similarity of the templates is epsilon_sAbove threshold τ_sThe template is matched.

In the method for estimating the pose of the free-floating target based on the depth camera, in the calculation of the initial pose of the target, because the detection result of the previous step comprises a group of templates, each template comprises training pose information (R, t), and the training pose information is used for calculating the initial pose of the target; the calculation of the attitude is performed first, and then the calculation of the position is performed.

In the free floating target pose estimation method based on the depth camera, during pose calculation, abnormal values in a detection template are eliminated based on channel chromaticity of an RGB image, and an initial pose is obtained;

each detected template comprises a rough estimation of the target pose, pixels positioned on the projection of the object are considered according to the pose estimation value, and the number of pixels with the expected color is calculated; if the difference of the grey values of the channels between the projected pixel and the pixel of the target is less than a specified threshold value, judging that the pixel has the expected color;

the percentage of the pixels with the expected color is less than seventy percent, the detection result is determined to be invalid, and the template is rejected; in the operation process, pixels of the projection boundary are removed by corroding the projection in advance;

after the abnormal values are eliminated, the rest templates are considered to provide enough credibility for the pose of the detection result, and the poses of the templates are averaged to obtain the initial pose phi of the detection result_init。

In the above free-floating target pose estimation method based on the depth camera, in the position calculation, the initial pose phi is firstly based on_initAnd training the distance d to render a three-dimensional model of the object to obtain a model point cloud and a mask image of the object under the visual angle, and recording the position of the object as p_render＝[0,0,d](ii) a Then projecting the mask image into a field depth image, and segmenting an interested area corresponding to the mask; then, converting the region of interest into three-dimensional point cloud called scene point cloud by using camera internal parameters;

calculating a translation vector t from the model point cloud to the scene point cloud; computing geometric centers of model point clouds and scene point clouds

And

then respectively in model point clouds andsearch separation in scene point cloud

And

of closest point

And

and consider that

And

a group of scene-model corresponding points are obtained, and finally, the translation vector t can be obtained by subtracting the two points;

the initial position of the final object is: p is a radical of_init＝p_render+t。

In the free floating target pose estimation method based on the depth camera, the ICP method is utilized to register the point cloud to realize pose correction, and the final pose (phi) is obtained_final,p_final) Based on a given three-dimensional reference point set S ═ S₁,s₂,...,s_nThe point cloud registration algorithm is used as a target of the point cloud registration algorithm; given three-dimensional point set M ═ M₁,m₂,...,m_nContinuously carrying out rigid body conversion on M in an iteration process, and gradually approaching S; the objective function is:

wherein R represents a rotation transformation matrix and t represents a translation transformation matrix; the target function represents finding a rigid body conversion, so that the square sum of the position errors of M and S after conversion is minimum; then, the initial pose (phi) of the detection result is updated by using the rotation matrix R and the translation vector t after ICP pose correction_init,p_init) To obtainFinal pose (phi)_final,p_final) (ii) a The method specifically comprises two processes of point cloud pretreatment and point cloud registration; the point cloud preprocessing comprises three steps of denoising, point cloud smoothing and point cloud downsampling:

the denoising process is to remove discrete isolated points by Gaussian filtering; point cloud smoothing is performed by using a moving least square Method (MLS); the down-sampling is to utilize a voxel grid to down-sample point clouds;

the point cloud registration is to register the preprocessed point cloud by utilizing an ICP (inductively coupled plasma) algorithm to realize pose correction; in the ICP algorithm, a three-dimensional reference point set S is given as S₁,s₂,...,s_nThe point cloud registration algorithm is used as a target of the point cloud registration algorithm; given three-dimensional point set M ═ M₁,m₂,...,m_nContinuously carrying out rigid body conversion on M in an iteration process, and gradually approaching S; the objective function is:

wherein R represents a rotation transformation matrix and t represents a translation transformation matrix; the target function represents finding a rigid body conversion, so that the square sum of the position errors of M and S after conversion is minimum; then, the initial pose (phi) of the detection result is updated by using the rotation matrix R and the translation vector t after ICP pose correction_init,p_init) To obtain the final pose (phi)_final,p_final)。

In the above free-floating target pose estimation method based on the depth camera, the point cloud registration specifically includes:

(1) initializing a point set M;

(2) for each point M in the set of points M_iSearching for the closest point S in S_jConstructing a set of corresponding points (m)_j,s_j)；

(3) Using a distance threshold d_tMaking a mismatch determination, i.e. if m_iAnd s_jIs greater than d_tDeleting the corresponding point of the group;

(4) obtaining a corresponding rotation matrix R and a translation vector t through an objective function of a minimum formula; and (3) the obtained R and t are used for updating S until the error E (R, t) is smaller than a set threshold value or the iteration number is larger than a set number, the iteration is stopped, and otherwise, the step (2) is returned to find the closest point again.

The beneficial technical effects of the invention are as follows:

the template collection is carried out by utilizing the three-dimensional model of the target, the characteristic redundancy of the collected template is reduced after the template collection, the operation speed is improved on the premise of ensuring the identification precision, and the template collection method has high efficiency and timeliness. Then combining the RGB image and the depth image to obtain a group of templates based on a template matching method; when the gradient feature is utilized, the size of the gradient feature is ignored and only the size of the gradient feature is used, so that the method has strong applicability to the space scene background. And then, screening the template according to the intensity characteristics of each channel of the RGB image, and further preliminarily calculating the pose of the target. And finally, the pose is corrected by combining an ICP method to obtain a final pose recognition result, so that the recognition accuracy is improved. The invention is simple and reliable, convenient to operate, easy to realize and convenient to popularize and apply.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

Fig. 2 is a schematic diagram of free floating target capture.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

As shown in fig. 2, which is a schematic diagram of capturing a free-floating target, a propeller with higher recognition degree is selected as a capture point of the target, and a depth camera is mounted on a mechanical arm base to acquire RGB image, depth image and point cloud information of the target.

As shown in fig. 1, the specific steps are as follows:

step one, target template acquisition: since one template can only represent the observation of an object under a single viewing angle, in order to realize the three-dimensional object detection under any pose, template images must be acquired from multiple viewing angles and at different distances. The template collection comprises the following two steps:

step I, viewpoint sampling: after the three-dimensional model of the target is imported into OpenGL, an upper hemisphere with 162 vertexes is adopted, and the sampling angle step length of the azimuth angle is 15 degrees. The radius of the enclosing sphere is varied in steps of 10 cm. Meanwhile, in order to acquire the template rotating in the plane, the camera rotates around the Z axis of the camera when acquiring each viewpoint, and the step length of the rotation angle is set to be 10 degrees.

Step II, redundancy reduction characteristic: to speed up the computation we use only a part of the collected template features. For the color gradient feature, only the dominant color gradient feature on the target contour is retained, since the target surface we are targeting does not necessarily have texture, and the resulting three-dimensional model of the target does not detail the accompanying texture. For surface normal features, surface normal features are selected inside the object contour, because the surface normal on the projected object boundary is usually not reliably estimated or not estimated at all.

Step two, target detection: the method based on template matching is adopted, the gradient direction in an RGB image and the normal vector direction in a depth image are taken as characteristics, and a template is defined as T ═ ({ O ═ O_m}_m∈M,P)。

Wherein O is a template feature representing a gradient direction or a normal vector direction; m is a modality, representing either an RGB image or a depth image; p is a set of doublets (r, m) and r is the location of the feature in the template image.

The acquisition of the target image is obtained by a binocular depth camera as shown in fig. 2, which is fixed to the base of the robotic arm.

And (3) performing similarity calculation on each template and the input image I at the position c in a sliding window mode:

wherein, W represents a window area with c + r as the center; t is the position of the center of the window area. When the similarity of the templates is epsilon_sAbove threshold τ_sThe template is matched.

From the above formula, when the similarity of the gradient features is calculated, the absolute value of the difference between the cosine values is calculated, so that the black background has no influence on the detection accuracy of the method, and the method has strong applicability to space scenes.

(1) Calculation of gradient direction: and (4) respectively calculating in three channels of RGB of the image, and only keeping the gradient with the maximum modulus value. For the RGB image I, calculating the gradient direction I at the pixel point x_g(x)：

(2) And (3) calculating a normal vector: the depth function d (x) with a first order taylor expansion in the depth image yields:

D(x+dx)-D(x)＝dx^T▽D+h.o.t.

within a neighborhood, each offset dx satisfies the above formula, the optimal depth gradient

The result was obtained by the least square method. Based on

One pass X, X can be found₁,X₂Plane of three points:

X＝v(x)D(x)

where v (x) is a line-of-sight vector of x calculated based on camera intrinsic parameters. Then, for the vector X₁-X and X₂And normalizing the cross multiplication result of the X to obtain a normal vector of the pixel point X.

Step three, calculating the initial pose of the target: since the detection result of the previous step includes a set of templates, each template contains training pose information (R, t), the training poses are used to calculate the initial pose of the target. The method comprises two steps of calculation of the gesture and calculation of the position.

Step I, posture calculation: and eliminating abnormal values in the detection template based on the gray value of each channel of the RGB image to obtain an initial pose.

Each detected template contains a rough estimate of the pose of the object, considering the pixels located on the object projection based on the pose estimate, and calculating how many of them have the desired color. If the difference between the respective channel gray values between the projected pixel and the pixel of the object is less than a prescribed threshold, it is determined that this pixel has the intended color.

The percentage of pixels with the expected color is less than seventy percent, the detection result is determined to be invalid, and the template is rejected. During operation, pixels at the projection boundaries are removed by etching the projection beforehand.

Processing the black and white components of the capture object maps them to channel gray values of similar colors: black maps to blue and white to yellow. Before calculating the difference between the gray values, the corresponding saturation and value components are checked. If the value component is below the threshold t_vThen the hue is set to blue; if the value component is greater than t_vAnd the saturation component is less than the threshold t_sThe hue is set to yellow.

Step II, position calculation: first based on the initial pose phi_initAnd training the distance d to render a three-dimensional model of the object to obtain a model point cloud and a mask image of the object under the visual angle, and recording the position of the object as p_render＝[0,0,d]. And then projecting the mask image into the field depth image, and segmenting an interested area corresponding to the mask. Then the interested region is converted into three by using the camera internal parameterAnd (4) the dimensional point cloud is called as scene point cloud.

And compensating the deviation brought by the center of the scene point cloud as a result by calculating a translation vector t from the model point cloud to the scene point cloud. Computing geometric centers of model point clouds and scene point clouds

And

then searching and separating in the model point cloud and the scene point cloud respectively

And

of closest point

And

and consider that

And

and finally subtracting the two points to obtain the translation vector t.

Step four, in-place pose correction, correcting the pose by utilizing an ICP algorithm, and comprising the following two steps of:

and step I, point cloud preprocessing is carried out, and the point cloud preprocessing comprises three processes of point cloud denoising, point cloud smoothing and point cloud downsampling.

(1) Denoising: and removing discrete isolated points in the point cloud by adopting Gaussian filtering.

(2) Smoothing: point cloud smoothing is performed using a moving least squares Method (MLS).

(3) Down-sampling: the point cloud is downsampled using a voxel grid.

And step II, registering the preprocessed point cloud by utilizing an ICP (inductively coupled plasma) algorithm to realize pose correction. In the ICP algorithm, a three-dimensional reference point set S is given as S₁,s₂,...,s_nThe point cloud registration algorithm is used as a target of the point cloud registration algorithm; given three-dimensional point set M ═ M₁,m₂,...,m_nAnd (4) continuously carrying out rigid body conversion on the point set M in the iteration process, and gradually approaching the point set S. The objective function is:

where R represents the rotational transformation matrix and t represents the translational transformation matrix. The objective function representation finds a rigid body transformation such that the sum of squared positional errors of the transformed point set M and the transformed point set S is minimized, comprising the following processes:

(1) initialization: the set of points M must be in a position close to the set of points S.

(2) Finding the closest point: the solution of the rigid body transformation matrix is established on the corresponding relation of the point set S and the point set M. ICP considers that the correspondence is determined by the shortest distance, i.e. M for each point in the set of points M_iSearching for the closest point S in S_jConstructing a set of corresponding points (m)_j,s_j)。

(3) Removing mismatching groups: in general, part of the S point set is not located in M, so that the part obtained in the previous step corresponds to the point group (M)_i,s_j) Is a false match. Using a distance threshold d_tMaking mismatching judgment if m_iAnd s_jIs greater than d_tThen the set of corresponding points should be excluded.

(4) Solving the optimal rigid body conversion: and solving the corresponding rotation matrix R and translation vector t through the objective function. And (3) the obtained R and t are used for updating the point set S, until the error E (R, t) is smaller than a set threshold value or the iteration frequency is larger than a set frequency, the iteration is stopped, and otherwise, the step (2) is returned to find the closest point again.

Finally, the initial pose (phi) of the detection result is updated by using the rotation matrix R and the translation vector t_init,p_init) To obtain the final pose (phi)_final,p_final)。

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A free floating target pose estimation method based on a depth camera is characterized by comprising the following steps:

Step 4, correcting the pose of the target, after point cloud preprocessing, correcting the pose based on an ICP (inductively coupled plasma) algorithm to obtain a final pose result (phi)_final,p_final)。

2. The free-floating target pose estimation method based on the depth camera is characterized in that the template acquisition process comprises two processes of viewpoint sampling and redundancy reduction; the specific operation process of viewpoint sampling is as follows: after a three-dimensional model of a target is imported into OpenGL, an upper hemisphere with 162 vertexes is adopted, and the sampling angle step length of an azimuth angle is 15 degrees; the radius of the enclosing sphere is varied in steps of 10 cm; meanwhile, in order to acquire the template rotating in the plane, the camera rotates around the Z axis of the camera when acquiring each viewpoint, and the step length of the rotation angle is set to be 10 degrees.

3. The method for estimating the pose of a free-floating target based on a depth camera as claimed in claim 1, wherein in the detection of the target, a method based on template matching is adopted, and the gradient direction in an RGB image and the normal vector direction in the depth image are taken as features, and the template is defined as T ({ O) ═ O }_m}_m∈M,P)；

4. The method for estimating the pose of a free-floating target based on a depth camera according to claim 1, wherein in the calculation of the initial pose of the target, since the detection result of the previous step comprises a set of templates, each template comprises a training pose information (R, t), the training pose information is used for calculating the initial pose of the target; the calculation of the attitude is performed first, and then the calculation of the position is performed.

5. The free-floating target pose estimation method based on the depth camera as claimed in claim 1, wherein in the pose calculation, the channel chromaticity based on the RGB image is used for eliminating abnormal values in the detection template to obtain an initial pose;

6. The method for estimating the pose of a free-floating target based on a depth camera as claimed in claim 1, wherein in the position calculation, the initial pose φ is firstly based on_initAnd training the distance d to render a three-dimensional model of the object to obtain a model point cloud and a mask image of the object under the visual angle, and recording the position of the object as p_render＝[0,0,d](ii) a Then projecting the mask image into a field depth image, and segmenting an interested area corresponding to the mask; then, converting the region of interest into three-dimensional point cloud called scene point cloud by using camera internal parameters;

And

And

of closest point

And

and consider that

And

7. The method for estimating the pose of the free-floating target based on the depth camera as claimed in claim 1, wherein the ICP method is used for registering the point cloud to realize pose correction to obtain the final pose (phi)_final,p_final) Based on a given three-dimensional reference point set S ═ S₁,s₂,...,s_nThe point cloud registration algorithm is used as a target of the point cloud registration algorithm; given three-dimensional point set M ═ M₁,m₂,...,m_nContinuously carrying out rigid body conversion on M in an iteration process, and gradually approaching S; the objective function is:

wherein R represents a rotation transformation matrix and t represents a translation transformation matrix; the target function represents finding a rigid body conversion, so that the square sum of the position errors of M and S after conversion is minimum; then, the initial pose (phi) of the detection result is updated by using the rotation matrix R and the translation vector t after ICP pose correction_init,p_init) To obtain the final pose (phi)_final,p_final) (ii) a Specifically comprises two point cloudsTwo processes of processing and point cloud registration; the point cloud preprocessing comprises three steps of denoising, point cloud smoothing and point cloud downsampling:

8. The depth camera-based free-floating target pose estimation method of claim 7, wherein point cloud registration specifically comprises:

(1) initializing a point set M;