CN110969668A

CN110969668A - Stereoscopic calibration algorithm of long-focus binocular camera

Info

Publication number: CN110969668A
Application number: CN201911152607.0A
Authority: CN
Inventors: 仲维; 柳博谦; 李豪杰; 王智慧; 刘日升; 樊鑫; 罗钟铉
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2020-04-07
Anticipated expiration: 2039-11-22
Also published as: CN110969668B

Abstract

The invention discloses a stereo calibration algorithm of a long-focus binocular camera, and belongs to the field of image processing and computer vision. Since the focal length of the telephoto lens is relatively large, the calibration object needs to be placed at a relatively far place. Due to the size limitation of the calibration object, the calibration object cannot be covered into the whole angle of view. The method extracts and matches feature points from the area which cannot be covered by the calibration plate, and calibrates the camera according to the matched feature points and the coordinates of the angular points of the calibration plate. The method successfully overcomes the limitation that the size of the calibration plate is not enough to cover the whole field angle; the program is simple and easy to realize; the method makes full use of the commonly calibrated checkerboard angular points and the self-calibrated scene characteristic points, can simultaneously correct the internal reference and the external reference of the telephoto camera, and has simple and convenient operation and accurate result.

Description

Stereoscopic calibration algorithm of long-focus binocular camera

Technical Field

The invention belongs to the field of image processing and computer vision, and relates to a method for extracting and matching feature points from an area which cannot be covered by a calibration plate, and calibrating a camera according to the matched feature points and coordinates of angular points of the calibration plate, so that the problem that the feature points cannot be covered on the whole field angle due to the size of the calibration plate is solved.

Background

Stereo vision is an important topic in the field of computer vision, and its aim is to reconstruct the three-dimensional geometric information of a scene. Binocular stereo vision is an important branch of stereo vision. The binocular stereo vision uses a left camera and a right camera to simulate two eyes, calculates a depth image by calculating the difference between binocular images, and can be used in a plurality of fields such as three-dimensional reconstruction and target detection, wherein the difference between the binocular images becomes parallax. The binocular stereo vision has the advantages of high efficiency, proper precision, simple system structure, low cost and the like, and is widely applied to the fields of robot vision, vehicle navigation, medical imaging and the like.

The binocular stereo vision needs to match the imaging points of the left and right images at the same point, so that the focal lengths and the imaging center points of the two lenses of the camera, the position relationship between the two lenses and the position relationship between the left and right lenses need to be known. In addition, since the production process of the camera lens is limited, the photographed image has distortion, which is called as distortion of the camera. To obtain the above data and eliminate distortion, we need to calibrate the camera.

We get the parameters of the two lenses of the camera and their relative position parameters through the calibration process, however, these parameters are not stable and invariant. With the change of temperature, humidity and the like, the internal parameters of the camera lens can change; further, an accidental collision of the camera may cause a positional relationship between the two lenses to change. Therefore, the camera needs to correct the internal and external parameters every time the camera is used for a period of time, and the camera is self-calibrated.

When calibrating the internal and external parameters of the camera lens, the relationship between the image coordinate system and the three-dimensional world coordinate system needs to be calculated. Therefore, we need to know the physical dimensions of the calibration object. Further, it is also necessary to obtain coordinates of matching feature points on the left and right images covering the entire field angle (FOV).

A tele digital camera is a digital camera with a tele lens. The focal length of a lens is typically expressed in millimeters, such as a 35mm lens, a 50mm lens, a 135mm lens, and so forth, as we often speak. The lens may be classified into a wide-angle lens, a standard lens, a telephoto lens, and the like according to its focal length, which is actually classified according to the angle of view of the lens. The imaging law of a lens is such that the sum of the reciprocal of the object distance and the reciprocal of the image distance equals the reciprocal of the focal length, i.e.

Where u, v represent the object distance and the distance, respectively, and f represents the focal length of the camera. Zhejiang university proposes a long-focus camera calibration method (patent publication: CN102622744A) based on a polynomial projection model, which converts points on an image onto a spherical surface and optimizes the points according to the projection properties of straight lines on the spherical surface. Since the focal length of the telephoto lens is relatively large, the calibration object needs to be placed at a relatively far place. Due to the size limitation of the calibrators, it is not possible to cover the calibrators into the entire FOV. In order to solve the contradiction, the invention uses the calibration version and the characteristic points in the scene to correct the internal and external parameters of the camera. It is acceptable that the calibration plate does not cover the FOV since the internal parameters of the lens are known in advance. The image coordinates and world coordinates of the feature points on the calibration board are used for correcting the internal reference, and the image coordinates of all the feature points are used for correcting the external reference.

Disclosure of Invention

The invention aims to overcome the limitation that the size of a calibration plate is not enough to cover the whole field angle, and provides a scheme for detecting and matching characteristic points of an uncovered area of the calibration plate, so as to detect and match the characteristic points of the uncovered area of the calibration plate and correct the original calibration result according to the characteristic points. The flow is shown in FIG. 1.

The technical scheme of the invention is as follows:

a stereo calibration algorithm of a tele binocular camera comprises the following steps:

1) and (3) detecting angular points of the checkerboard: and shooting the chessboard pattern calibration plate image by using a long-focus binocular camera, detecting and matching the chessboard pattern angular points, and acquiring the area which is not covered by the chessboard pattern.

2) Scene feature point detection: and shooting a plurality of groups of scene images by using a long-focus binocular camera, and detecting and extracting feature points for further screening.

3) Matching the characteristic points: and matching the feature points extracted from the left and right images according to the feature description values in the matching window, and deleting repeated matching.

4) And (3) optimizing a matching result by parabolic fitting: the matching result is optimized by a unitary quadratic parabolic fit.

5) Judging the coverage area of the feature points: and (4) dividing the image into m-n grids, if the characteristic points cover all the grids, carrying out the next step, otherwise, continuously shooting the image, and repeating the steps 2) to 4).

6) Correcting the internal reference calibration result: the image coordinates and world coordinates of the feature points on the calibration plate are used to correct the references.

7) And correcting the external reference calibration result: and correcting the calibration result according to the matched checkerboard corner coordinates and scene characteristic point coordinates.

The characteristic point extraction in the step 2) specifically comprises the following steps:

2-1) constructing a single-scale difference Gaussian pyramid (DoG). Differential gaussian pyramid a differential gaussian pyramid is derived from the difference of adjacent Scale spaces and is often used for Scale-invariant feature transform (SIFT). The scale space of an image is defined as: the convolution of the gaussian convolution kernel with the image is a function of the parameter σ in the gaussian convolution kernel. Specifically, the scale space of the scene image I (x, y) is:

L(x,y,σ)＝G(x,y,σ)*I(x,y)

wherein ,

the method is characterized in that the method is a Gaussian kernel function, sigma is a scale factor, the size of sigma determines the smoothness degree of an image, the large scale corresponds to the general appearance characteristic of the image, and the small scale corresponds to the detail characteristic of the image. Large sigma values correspond to coarse scale (low resolution) and small sigma values correspond to fine scale (high resolution). Denotes a convolution operation. L (x, y, σ) is the scale space of image I (x, y), as shown in FIG. 2. The difference is made between the scale spaces with different scales to obtain a layer of difference Gaussian pyramid, and a normalization scale factor lambda is multiplied to enable the maximum value of the DoG image to be 255.

D(x,y,σ)＝λ(L(x,y,kσ)-L(x,y,σ))

Unlike SIFT, we compute only one scale feature. The reason for this is two: firstly, the calculation amount for calculating a plurality of scale spaces is too large to realize reality; secondly, the accuracy of the SIFT features obtained using the multi-scale space is not sufficient to meet the calibration requirements.

2-2) comparing each point in the obtained DoG with a pixel point in the neighborhood to judge whether the point is a local extreme point.

2-2-1) recording the DoG obtained in the step as D. D is subjected to an expansion operation, and the result is recorded as D₁. Will D₁Comparing each pixel point with the point on 8-neighborhood, if the pixel point is local maximum, adding it into candidate point set P₁And (c) removing the residue.

2-2-2) inverting D and then performing an expansion operation, and recording the result as D₂. Will D₂Comparing each pixel point with the point in 8-neighborhood, if the pixel point is local minimum, adding it into candidate point set P₂And (c) removing the residue.

2-2-3) reacting P₁ and P₂Taking intersection to obtain P₃＝P₁∩P₂. Get P₃And taking the points with the middle DoG gray value larger than 15 as a characteristic point set { P }. 2-3) because noise points can appear in the feature points judged only according to the Gaussian features, we need to denoise the Gaussian feature points. Here, a common filter can be usedAnd filtering noise points and edge points.

Matching the characteristic points, specifically comprising the following steps:

3-1) divide the image into m × n blocks. For each feature point of the left image

Find its corresponding block in the left graph

Block

The corresponding right graph search range is recorded as

As shown in fig. 3. Finding a variable capable of describing the similarity degree of the characteristic points to evaluate

And

the similarity degree of any point in the image, if the maximum similarity degree is larger than the threshold value t₁Then it is regarded as the coarse matching point

3-2) if

And

maximum value of similarity in s_firstAnd the second largest value s_secondSatisfies the following conditions:

F(s_first，s_second)≥t₂

the match is retained, where t₂Is a threshold value, F(s)_first,s_second) For the description of s_first and s_secondThe relationship between。

After screening according to the rule, matching according to the steps

Corresponding characteristic point on the left graph

If it is satisfied with

Then the match is retained

And optimizing a matching result by parabolic fitting, which specifically comprises the following steps:

4-1) feature points of left graph

For reference, the integral pixel characteristic points of the corresponding right image are optimized by parabolic fitting

The obtained sub-pixel characteristic points corresponding to the right image

wherein

As a sub-pixel offset in the x-direction,

is the sub-pixel offset in the y-direction.

4-2) to correspond to the integer pixel feature points of the right image

As a reference, calculating the sub-pixel characteristic point of the corresponding left image according to the method of 4-1)

wherein

As a sub-pixel offset in the x-direction,

is the sub-pixel offset in the y-direction.

4-3) the final matching point pair is

Correcting the internal reference calibration result, specifically comprising the following steps:

6-1) obtaining the original calibration result: the parameters of the two lenses are acquired from the tele binocular camera hardware itself, including the parameters of the focal lengths fx and fy (unit: pixel) in the x direction and the y direction, the lens principal point positions u0 and v0, and the like.

6-2) updating the internal reference result.

For each checkerboard corner point image Pi, the following steps are carried out:

(1) and acquiring a rotation matrix R and a translation vector T between the image coordinates and the calibration plate coordinates under the checkerboard corner point image Pi. According to the camera model described above, let M be [ X, Y, Z,1 ] as a point of the three-dimensional world coordinate]^TThe two-dimensional camera plane pixel coordinate is m ═ u, v,1]^TTherefore, the homography relationship from the checkerboard plane to the image plane for calibration is:

sm＝K[R,T]M

wherein s is a scale factor, K is a camera intrinsic parameter, and R and T are a rotation matrix and a translation vector between an image coordinate and a calibration plate coordinate respectively. We stipulate the checkerboard plane as Z ═ 0, then have

We call K [ r1, r2, t ] a homography matrix H, i.e.

H＝[h₁，h₂h3]＝λK[r₁r₂t]

A homography matrix H between the checkerboard coordinate system and the image coordinate system is calculated. After the calculation of the homography matrix is completed, a rotation matrix R and a translation vector T between the image coordinates and the calibration plate coordinates are calculated, and the calculation formula is as follows:

r₃＝r₁×r₂

t＝λK^-1h₃

wherein λ represents a normalization coefficient;

(2) it is assumed here that we have collected n 'sets of images containing a checkerboard, with m' checkerboard corners in each image. And (3) enabling the projection point of the corner point Mij on the ith image on the image under the camera matrix obtained by the calculation to be:

where Ri and ti are the rotation matrix and translation vector for the ith sub-map, Δ R_i and Δt_iThe amount of change in Ri and ti, respectively. K is an internal parameter matrix with a variance Δ K. Corner pointThe probability density function for mij is:

constructing a likelihood function:

let L take the maximum value, i.e. let the following equation be the minimum. The Levenberg-Marquardt algorithm of the multi-parameter nonlinear system optimization problem is used for iterative solution of the optimal solution.

Correcting the external reference calibration result, specifically comprising the following steps:

7-1) calculating matched left and right characteristic points

Corresponding coordinates in a normal coordinate system. The pixel coordinate system takes the upper left corner of the picture as an origin, and the x axis and the y axis of the pixel coordinate system are respectively parallel to the x axis and the y axis of the image coordinate system. The unit of the pixel coordinate system is a pixel, which is a basic and indivisible unit of image display. The normal coordinate system takes the optical center of the camera as the origin of the image coordinate system, and the distance from the optical center to the image plane is scaled to 1. The relationship of pixel coordinates to normal coordinates is as follows:

u＝KX

wherein ,

pixel coordinates representing an image;

representing internal moments of a cameraArray, f_x and f_yFocal lengths (unit is pixel) in x and y directions of the image, respectively, (c)_x,c_y) Representing a position of a camera stagnation point;

are coordinates in a normal coordinate system. Knowing the pixel coordinate system of the image and the camera's internal parameters allows to calculate the regular coordinate system of the pixel point correspondences, i.e.

X＝K^-1 _u

Matching feature points for each pair of left and right cameras

Their normal coordinate system is:

wherein ,

and

are respectively

And

the coordinates of the pixels of (a) and (b),

and

are respectively

And

normal coordinate of, K_l and K_rRespectively, the reference matrices for the left and right cameras.

7-2) removing image distortion: the normal coordinates of the left and right image feature points after distortion removal are calculated from the normal coordinates of the left and right image feature points and the distortion coefficients of the left and right cameras.

Due to the limitation of lens production process, the lens in practical situation has some distortion phenomena to cause nonlinear distortion, which can be roughly divided into radial distortion and tangential distortion.

The radial distortion of the image is the position deviation of image pixel points generated along the radial direction by taking a distortion center as a central point, so that the image formed in the image is deformed. The radial distortion is roughly expressed as follows:

x_d＝x(1+k₁r²+k₂r⁴+k₃r⁶)

y_d＝y(1+k₁r²+k₂r⁴+k₃r⁶)

wherein ,r²＝x²+y²，k₁、k₂、k₃Is a radial distortion parameter.

Tangential distortion is due to imperfections in the camera fabrication such that the lens itself is not parallel to the image plane, and can be quantitatively described as:

x_d＝x+(2p₁xy+p₂(r²+2x²))

y_d＝y+(p₁(r²+2y²)+2p₂xy)

wherein ,p₁、p₂Is the tangential distortion coefficient.

In summary, the coordinate relationship before and after distortion is as follows:

x_d＝x(1+k₁r²+k₂r⁴+k₃r⁶)+(2p₁xy+p₂(r²+2x²))

y_d＝y(1+k₁r²+k₂r⁴+k₃r⁶)+(p₁(r²+2y²)+2p₂xy)

wherein (x, y) is a normal coordinate in an ideal state, (x)_d,y_d) Are the actual normal coordinates with distortion. We get (x)_d,y_d) As an initial value of (x, y), the actual (x, y) is obtained by iteratively calculating several times (for example, 20 times).

7-3) rotating the left and right images according to the original rotation relationship of the two cameras: the rotation matrix R and translation vector t between the original left and right cameras are known, so that

X_r＝RX_l+t

wherein ,X_lNormal coordinates, X, representing the left camera_rRepresenting the normal coordinates of the right camera. And rotating the left picture by a half angle in the positive direction of R, and rotating the right picture by a half angle in the negative direction of R.

For each pair of left and right characteristic points obtained in the previous step after distortion removal

Normal coordinates of

7-4) reducing the image after the distortion removal rotation to a pixel coordinate system according to the formula u-KX. According to the updated left and right characteristic points of the last step

Normal coordinates of

Calculating distortion-removed corrected image coordinates

7-5) solving a basic matrix F and an essential matrix E according to the characteristic point pair coordinates after the distortion removal correction of the left and right images and the internal reference matrixes of the left and right cameras: left and right corresponding pixel point pairs u_l、u_rThe relationship to the basis matrix F is:

and (4) further screening the point pairs by using random sample consensus (RANSAC), and then substituting the coordinates of the corresponding points into the formula to construct a homogeneous linear equation set to solve F.

The relationship between the base matrix and the essence matrix is:

wherein ,K_l、K_rRespectively, the reference matrices for the left and right cameras.

7-6) decomposing the left and right camera rotation and translation relationships after correction from the essence matrix: the relationship of the essential matrix E to the rotation R and translation t is as follows:

E＝[t]_×R

wherein [t]_×A cross-product matrix representing t.

Performing singular value decomposition on E to obtain

Defining two matrices

And

ZW＝Σ

so E can be written in the following two forms

(1)E＝UZU^TUWV^T

Let [ t)]_×＝UZU^T，R＝UWV^T

(2)E＝-UZU^TUW^TV^T

Let [ t)]_×＝-UZU^T，R＝UW^TV^T

Four pairs of R and t are obtained, and a solution with three-dimensional significance is selected.

7-7) the decomposed rotation and translation relation is superposed into the original external reference.

The rotation matrix before distortion removal is recorded as R, and the translation vector is recorded as t ═ t_x,t_y,t_z)^T(ii) a The rotation matrix calculated in step 6-2) is R ', and the translation vector is t ' ═ t '_x,t′_y,t′_z)^T. Then new R_new and t_newAs follows

R_new＝R^1/2R'R^1/2

The invention has the beneficial effects that: the invention aims to overcome the limitation that the size of a calibration plate is not enough to cover the whole field angle, adds some steps to detect and match the characteristic points of the uncovered area of the calibration plate, and corrects the original calibration result according to the characteristic points. The method fully utilizes the commonly calibrated checkerboard angular points and the self-calibrated scene characteristic points, can simultaneously correct the internal reference and the external reference of the telephoto camera, and has simple and convenient operation and accurate result.

Drawings

FIG. 1 is a schematic flow chart.

Fig. 2 shows a gaussian differential pyramid (DoG).

Fig. 3 is a schematic diagram of block matching. Wherein, (a) is a left figure, and (b) is a right figure.

Detailed Description

The invention provides a method for extracting a calibration point covering a full field angle, which is described in detail in combination with the accompanying drawings and embodiments as follows:

1) and (3) detecting angular points of the checkerboard: shooting the image of the checkerboard calibration plate, detecting and matching the checkerboard angular points, and acquiring the area which is not covered by the checkerboard.

2) Scene feature point detection: and shooting a plurality of groups of outdoor scene images, and detecting and matching the characteristic points for further screening.

2-1) building a single-scale differential Gaussian pyramid (DoG). Differential gaussian pyramid a differential gaussian pyramid is derived from the difference of adjacent Scale spaces and is often used for Scale-invariant feature transform (SIFT). The scale space of an image is defined as: the convolution of the gaussian convolution kernel with the image is a function of the parameter σ in the gaussian convolution kernel. Specifically, the scale space of the image I (x, y) is:

L(x，y，σ)＝G(x，y，σ)*I(x，y)

wherein ,

the method is characterized in that the method is a Gaussian kernel function, sigma is a scale factor, the size of sigma determines the smoothness degree of an image, the large scale corresponds to the general appearance characteristic of the image, and the small scale corresponds to the detail characteristic of the image. Large sigma values correspond to coarse scale (low resolution) and small sigma values correspond to fine scale (high resolution). Denotes a convolution operation. L (x, y, σ) is the scale space of the image I (x, y). The difference is made between the scale spaces with different scales to obtain a layer of difference Gaussian pyramid, and a normalization scale factor lambda is multiplied to enable the maximum value of the DoG image to be 255.

D(x，y，σ)＝λ(L(x，y，kσ)-L(x，y，σ))

To increase the computation speed, we compute just one scale of DoG, as shown in fig. 3.

2-2) comparing each point in the obtained DoG with 8 pixel points in a 3 multiplied by 3 neighborhood to judge whether the point is a local extreme point.

2-2-1) recording the DoG obtained in the step as D. D is subjected to an expansion operation, and the result is recorded as D₁. Will D₁Each pixel point in the group is 8-adjacent to the pixel pointComparing the points in the domain, if the pixel point is local maximum, adding it into the candidate point set P₁And (c) removing the residue.

2-2-3) reacting P₁ and P₂Taking intersection to obtain P₃＝P₁∩P₂. Get P₃And taking the points with the middle DoG gray value larger than 15 as a characteristic point set { P }.

2-3) because noise points can appear in the feature points judged only according to the Gaussian features, we need to denoise the Gaussian feature points. Here, a common filter can be used to filter noise and edge points.

3) Matching the characteristic points: and matching the obtained feature points on the left image and the right image according to the feature description values under the matching window, and deleting repeated matching.

Find its corresponding block in the left graph

Block

The corresponding right graph search range is recorded as

And

3-2) if

And

F(s_first，s_second)≥t₂

the match is retained, where t₂Is a threshold value, F(s)_first,s_second) For the description of s_first and s_secondThe relationship between them.

After screening according to the rule, matching according to the steps

Corresponding characteristic point on the left graph

If it is satisfied with

Then the match is retained

4) And (3) optimizing a matching result by parabolic fitting: and optimizing the matching result by using a unitary quadratic parabolic fit.

4-1) feature points of left graph

Is a baseQuasi, parabolic fitting optimization corresponds to integer pixel characteristic point of right image

The obtained sub-pixel characteristic points corresponding to the right image

4-2) to correspond to the integer pixel feature points of the right image

4-3) the final matching point pair is

5) Judging the coverage area of the feature points: if the corner points extracted in the previous step can completely cover the area which can not be covered by the calibration plate, the camera calibration can be carried out according to the coordinates of the corner points of the checkerboard and the coordinates of the scene feature points.

6) And (3) correcting an internal reference calibration result: the image coordinates and world coordinates of the feature points on the calibration plate are used to correct the references.

6-1) obtaining the original calibration result: the parameters of the two lenses are acquired from the camera hardware itself, including the focal lengths fx and fy (unit: pixel) in the x-direction and the y-direction, the lens principal point positions u0 and v0, and the like.

6-2) updating the internal reference result.

(1) and obtaining a rotation matrix R and a translation vector T between the image coordinates and the calibration plate coordinates under the checkerboard corner point image Pi. According to the camera model described above, let M be [ X, Y, Z,1 ] as a point of the three-dimensional world coordinate]^TThe two-dimensional camera plane pixel coordinate is m ═ u, v,1]^TWhat is, what isTaking the homography relationship from the checkerboard plane for calibration to the image plane as follows:

sm＝K[R,T]M

We call K [ r1, r2, t ] a homography matrix H, i.e.

H＝[h₁，h₂h3]＝λK[r₁r₂t]

r₃＝r₁×r₂

t＝λK^-1h₃

where Ri and ti are the rotation matrix and translation vector for the ith sub-map, Δ R_i and Δt_iThe amount of change in Ri and ti, respectively. K is an internal parameter matrix with a variance Δ K. The probability density function for the corner point mij is then:

constructing a likelihood function:

7) And (3) correcting an internal reference calibration result: the image coordinates of all the feature points are used to correct the external parameters.

7-1) calculating matched left and right characteristic points

Corresponding coordinates in a normal coordinate system. The pixel coordinate system takes the upper left corner of the picture as an origin, and the x axis and the y axis of the pixel coordinate system are respectively parallel to the x axis and the y axis of the image coordinate system. The unit of the pixel coordinate system is a pixel, which is a basic and indivisible unit of image display. With cameras in normal coordinate systemsThe optical center serves as the origin of the image coordinate system and the distance of the optical center to the image plane is scaled to 1. The relationship of pixel coordinates to normal coordinates is as follows:

u＝KX

wherein ,

pixel coordinates representing an image;

representing the internal reference matrix of the camera, f_x and f_yFocal lengths (unit is pixel) in x and y directions of the image, respectively, (c)_x,c_y) Representing a location of a camera store;

X＝K^-1 _u

Matching feature points for each pair of left and right cameras

Their normal coordinate system is:

wherein ,

and

are respectively

And

the coordinates of the pixels of (a) and (b),

and

are respectively

And

x_d＝x(1+k₁r²+k₂r⁴+k₃r⁶)

y_d＝y(1+k₁r²+k₂r⁴+k₃r⁶)

wherein ,r²＝x²+y²，k₁、k₂、k₃Is a radial distortion parameter.

x_d＝x+(2p₁xy+p₂(r²+2x²))

y_d＝y+(p₁(r²+2y²)+2p₂xy)

wherein ,p₁、p₂Is the tangential distortion coefficient.

x_d＝x(1+k₁r²+k₂r⁴+k₃r⁶)+(2p₁xy+p₂(r²+2x²))

y_d＝y(1+k₁r²+k₂r⁴+k₃r⁶)+(p₁(r²+2y²)+2p₂xy)

X_r＝RX_l+t

Normal coordinates of

Normal coordinates of

Calculating distortion-removed corrected image coordinates

The relationship between the base matrix and the essence matrix is:

E＝[t]_×R

wherein [t]_×Cross multiplication moment representing tAnd (5) arraying.

Performing singular value decomposition on E to obtain

Defining two matrices

And

ZW＝Σ

so E can be written in the following two forms

(1)E＝UZU^TUWV^T

Let [ t)]_×＝UZU^T，R＝UWV^T

(2)E＝-UZU^TUW^TV^T

Let [ t)]_×＝-UZU^T，R＝UW^TV^T

R_new＝R^1/2R'R^1/2

。

Claims

1. The stereo calibration algorithm of the tele binocular camera is characterized by comprising the following steps of:

1) and (3) detecting angular points of the checkerboard: shooting an image of the checkerboard calibration board by using a long-focus binocular camera, detecting and matching checkerboard angular points, and acquiring an area which is not covered by the checkerboard;

2) scene feature point detection: shooting a scene image by using a long-focus binocular camera, and detecting and extracting feature points;

3) matching the characteristic points: matching the feature points extracted from the left and right images according to the feature description values in the matching window, and deleting repeated matching;

4) and (3) optimizing a matching result by parabolic fitting: optimizing the matching result by a unitary quadratic parabolic fit;

5) judging the coverage area of the feature points: dividing the image into m × n grids, and when the feature points cover all the grids, performing the next step, otherwise, continuously shooting the scene image, and repeating the steps 2) to 4);

6) correcting the internal reference calibration result: correcting the internal reference by using the image coordinates and world coordinates of the feature points on the calibration plate;

2. The stereo calibration algorithm of the tele binocular camera according to claim 1, wherein the feature point extraction in the step 2) specifically comprises the following steps:

2-1) constructing a single-scale difference Gaussian pyramid (DoG), wherein the scale space of a scene image I (x, y) is as follows:

L(x,y,σ)＝G(x,y,σ)*I(x,y)

wherein ,

is a gaussian kernel, σ is a scale factor, represents a convolution operation; l (x, y, σ) is the scale space of the image I (x, y), the difference is made between the scale spaces of different scales to obtain a layer of difference gaussian pyramid, and a normalized scale factor λ is multiplied to make the maximum value of the DoG image 255:

D(x,y,σ)＝λ(L(x,y,kσ)-L(x,y,σ))；

2-2) comparing each point in the obtained DoG with a pixel point in a neighborhood to judge whether the point is a local extreme point;

2-2-1) recording the obtained DoG as D, and recording the result as D by performing an expansion operation on D₁(ii) a Will D₁Comparing each pixel point with the point on 8-neighborhood, if the pixel point is local maximum, adding it into candidate point set P₁Lining;

2-2-2) inverting D and then performing an expansion operation, and recording the result as D₂(ii) a Will D₂Comparing each pixel point with the point in 8-neighborhood, if the pixel point is local minimum, adding it into candidate point set P₂Lining;

2-2-3) reacting P₁ and P₂Taking intersection to obtain P₃＝P₁∩P₂(ii) a Get P₃Taking the points with the middle DoG gray value larger than 15 as a characteristic point set { P };

2-3) denoising the Gaussian characteristic points by using a filter, and filtering noise points and edge points.

3. The stereo calibration algorithm of the tele binocular camera according to claim 1 or 2, wherein the feature point matching in step 3) specifically comprises the following steps:

3-1) divide the image into m × n blocks, for each feature point of the left image

Find its corresponding block in the left graph

Block

The corresponding right graph search range is recorded as

When a feature point similarity degree is foundVariable to evaluate

And

the similarity degree of any point in the image is larger than the threshold value t₁Then it is regarded as the coarse matching point

3-2) when

And

F(s_first,s_second)≥t₂

the match is retained, where t₂Is a threshold value, F(s)_first,s_second) For the description of s_first and s_secondThe relationship between;

after screening, matching according to the methods of the steps 3-1) and 3-2)

Corresponding characteristic point on the left graph

When it is satisfied with

Then the match is retained

4. The stereo calibration algorithm of the tele binocular camera according to claim 1 or 2, wherein the parabolic fitting in the step 4) optimizes the matching result, specifically comprising the steps of:

4-1) feature points of left graph

The obtained sub-pixel characteristic points corresponding to the right image

wherein

As a sub-pixel offset in the x-direction,

is the sub-pixel offset in the y-direction;

4-2) to correspond to the integer pixel feature points of the right image

wherein

As a sub-pixel offset in the x-direction,

is the sub-pixel offset in the y-direction;

4-3) the final matching point pair is

5. The stereo calibration algorithm of the tele binocular camera according to claim 3, wherein the parabolic fitting in the step 4) optimizes the matching result, specifically comprising the steps of:

4-1) feature points of left graph

The obtained sub-pixel characteristic points corresponding to the right image

wherein

As a sub-pixel offset in the x-direction,

is the sub-pixel offset in the y-direction;

4-2) to correspond to the integer pixel feature points of the right image

wherein

As a sub-pixel offset in the x-direction,

is the sub-pixel offset in the y-direction;

4-3) the final matching point pair is

6. The stereoscopic calibration algorithm for the tele binocular camera according to claim 1, 2 or 5, wherein the step 6) of correcting the intra-reference calibration result specifically comprises the following steps:

6-1) obtaining the original calibration result: parameters of two lenses are acquired from the tele binocular camera hardware itself, including focal lengths fx and fy in the x and y directions, units: pixel, lens principal point position u0 and v 0;

6-2) for each checkerboard corner point image Pi, carrying out the following steps:

(1) acquiring a rotation matrix R and a translation vector T between an image coordinate and a calibration plate coordinate under the checkerboard corner point image Pi; according to the camera model, a point of the three-dimensional world coordinate is set as M ═ X, Y, Z,1]^TThe two-dimensional camera plane pixel coordinate is m ═ u, v,1]^TThen, the homography relationship from the checkerboard plane to the image plane for calibration is:

where s is a scale factor, K is a camera intrinsic parameter, and Z is 0 for a checkerboard plane, then

K [ r1, r2, t ] is the homography matrix H, i.e.

H＝[h₁,h₂h3]＝λK[r₁r₂t]

Calculating a homography matrix H between the checkerboard coordinate system and the image coordinate system; after the calculation of the homography matrix is completed, a rotation matrix R and a translation vector T between the image coordinates and the calibration plate coordinates are calculated, and the calculation formula is as follows:

r₃＝r₁×r₂

t＝λK^-1h₃

wherein λ represents a normalization coefficient;

(2) setting and collecting n' pairs of images containing checkerboards, wherein each image is provided with m checkerboard angular points; let corner M on ith sub-image_ijThe projection points on the obtained image under the camera matrix are:

where Ri and ti are the rotation matrix and translation vector for the ith sub-map, Δ R_i and Δt_iThe variation amounts of Ri and ti, respectively; k is the moment of an internal parameterThe matrix, the variation of which is Δ K,

representing the projection point of the angular point Mij on the obtained image under the camera matrix; then corner point m_ijThe probability density function of (a) is:

constructing a likelihood function:

let L take the maximum value, i.e. let the following equation be the minimum; and (3) iterative solution solving is carried out by using a Levenberg-Marquardt algorithm of a multi-parameter nonlinear system optimization problem:

7. the stereo calibration algorithm of the tele binocular camera according to claim 3, wherein the step 6) of correcting the internal reference calibration result specifically comprises the following steps:

K [ r1, r2, t ] is the homography matrix H, i.e.

H＝[h₁,h₂h3]＝λK[r₁r₂t]

r₃＝r₁×r₂

t＝λK^-1h₃

wherein λ represents a normalization coefficient;

(2) n' pairs of images containing checkerboards are collected, and each image has m checkerboard angular points(ii) a Let corner M on ith sub-image_ijThe projection points on the obtained image under the camera matrix are:

where Ri and ti are the rotation matrix and translation vector for the ith sub-map, Δ R_i and Δt_iThe variation amounts of Ri and ti, respectively; k is an internal parameter matrix with a variance Δ K,

constructing a likelihood function:

8. the stereo calibration algorithm of the tele binocular camera according to claim 4, wherein the step 6) of correcting the internal reference calibration result specifically comprises the following steps:

K [ r1, r2, t ] is the homography matrix H, i.e.

H＝[h₁,h₂h3]＝λK[r₁r₂t]

r₃＝r₁×r₂

t＝λK^-1h₃

wherein λ represents a normalization coefficient;

constructing a likelihood function:

9. the stereoscopic calibration algorithm of the tele binocular camera according to claim 1, 2, 5, 7 or 8, wherein the step of correcting the external reference calibration result comprises the following steps:

7-1) calculating matched left and right characteristic points

Coordinates in a corresponding normal coordinate system

The pixel coordinate system takes the upper left corner of the picture as an origin, and the x axis and the y axis of the pixel coordinate system are respectively parallel to the x axis and the y axis of the image coordinate system; the unit of the pixel coordinate system is a pixel; the normal coordinate system takes the optical center of the camera as the origin of the image coordinate system, and the distance from the optical center to the image plane is scaled to 1; the relationship of pixel coordinates to normal coordinates is as follows:

u＝KX

wherein ,

pixel coordinates representing an image;

representing the internal reference matrix of the camera, f_x and f_yRepresenting the focal lengths of the image in the x-and y-directions, respectively, in pixels, (c)_x,c_y) Representing a position of a camera stagnation point;

is a coordinate in a normal coordinate system; the pixel coordinate system of the known image and the normal coordinate system corresponding to the pixel points calculated by the camera's internal parameters, i.e.

X＝K^-1u

Matching feature points for each pair of left and right cameras

Their normal coordinate system is:

wherein ,

and

are respectively

And

the coordinates of the pixels of (a) and (b),

and

are respectively

And

normal coordinate of, K_l and K_rThe internal reference matrixes of the left camera and the right camera respectively;

7-2) removing image distortion

Calculating the undistorted normal coordinates of the left and right image feature points according to the normal coordinates of the left and right image feature points and respective distortion coefficients of the left and right cameras;

the image radial distortion is the position deviation of image pixel points along the radial direction by taking a distortion center as a central point, so that the image formed in the image is deformed; the radial distortion is expressed as follows:

x_d＝x(1+k₁r²+k₂r⁴+k₃r⁶)

y_d＝y(1+k₁r²+k₂r⁴+k₃r⁶)

wherein ,r²＝x²+y²，k₁、k₂、k₃Is a radial distortion parameter;

tangential distortion is due to imperfections in the camera fabrication, causing the lens itself to be non-parallel to the image plane, quantitatively described as:

x_d＝x+(2p₁xy+p₂(r²+2x²))

y_d＝y+(p₁(r²+2y²)+2p₂xy)

wherein ,p₁、p₂Is a tangential distortion coefficient;

x_d＝x(1+k₁r²+k₂r⁴+k₃r⁶)+(2p₁xy+p₂(r²+2x²))

y_d＝y(1+k₁r²+k₂r⁴+k₃r⁶)+(p₁(r²+2y²)+2p₂xy)

wherein (x, y) is a normal coordinate in an ideal state, (x)_d,y_d) Is the true coordinate with distortion in reality; with (x)_d,y_d) As the initial value of (x, y), the actual (x, y) is obtained through iterative calculation;

X_r＝RX_l+t

wherein ,X_lNormal coordinates, X, representing the left camera_rNormal coordinates representing the right camera; rotating the left image by a half angle in the positive direction of R, and rotating the right image by a half angle in the negative direction of R;

Normal coordinates of

7-4) reducing the image after distortion removal rotation to a pixel coordinate system according to a formula u-KX; according to the updated left and right characteristic points of the last step

Normal coordinates of

Calculating distortion-removed corrected image coordinates:

further screening the point pairs by using random sampling consistency, substituting corresponding point coordinates into the formula, and constructing a homogeneous linear equation set to solve F;

the relationship between the base matrix and the essence matrix is:

wherein ,K_l、K_rRespectively are internal reference matrixes of the left camera and the right camera;

E＝[t]_×R

wherein [t]_×A cross-product matrix representing t;

and D, performing singular value decomposition on the E to obtain:

two matrices are defined:

and

ZW＝Σ

so E is written in the following two forms

(1)E＝UZU^TUWV^T

Let [ t)]_×＝UZU^T，R＝UWV^T

(2)E＝-UZU^TUW^TV^T

Let [ t)]_×＝-UZU^T，R＝UW^TV^T

Obtaining four pairs of R and t, and selecting a solution with three-dimensional significance;

7-7) adding the decomposed rotation and translation relation to the original external reference

The rotation matrix before distortion removal is recorded as R, and the translation vector is recorded as t ═ t_x,t_y,t_z)^T(ii) a Step 6-2) countingThe calculated rotation matrix is R ', and the translation vector is t ═ t'_x,t′_y,t′_z)^T(ii) a Then new R_new and t_newAs follows

R_new＝R^1/2R′R^1/2

10. The stereo calibration algorithm of the tele binocular camera of claim 6, wherein the step of correcting the external reference calibration result specifically comprises the steps of:

7-1) calculating matched left and right characteristic points

Coordinates in a corresponding normal coordinate system

u＝KX

wherein ,

pixel coordinates representing an image;

X＝K^-1u

Matching feature points for each pair of left and right cameras

Their normal coordinate system is:

wherein ,

and

are respectively

And

the coordinates of the pixels of (a) and (b),

and

are respectively

And

7-2) removing image distortion

x_d＝x(1+k₁r²+k₂r⁴+k₃r⁶)

y_d＝y(1+k₁r²+k₂r⁴+k₃r⁶)

wherein ,r²＝x²+y²，k₁、k₂、k₃Is a radial distortion parameter;

x_d＝x+(2p₁xy+p₂(r²+2x²))

y_d＝y+(p₁(r²+2y²)+2p₂xy)

wherein ,p₁、p₂Is a tangential distortion coefficient;

x_d＝x(1+k₁r²+k₂r⁴+k₃r⁶)+(2p₁xy+p₂(r²+2x²))

y_d＝y(1+k₁r²+k₂r⁴+k₃r⁶)+(p₁(r²+2y²)+2p₂xy)

X_r＝RX_l+t

Normal coordinates of

Normal coordinates of

Calculating distortion-removed corrected image coordinates:

7-5) removing distortion according to the left and right images, and correcting the coordinates of the feature point pairs and the internal parameters of the left and right camerasSolving a basic matrix F and an essential matrix E by an array: left and right corresponding pixel point pairs u_l、u_rThe relationship to the basis matrix F is:

the relationship between the base matrix and the essence matrix is:

E＝[t]_×R

wherein [t]_×A cross-product matrix representing t;

and D, performing singular value decomposition on the E to obtain:

two matrices are defined:

and

ZW＝Σ

so E is written in the following two forms

(1)E＝UZU^TUWV^T

Let [ t)]_×＝UZU^T，R＝UWV^T

(2)E＝-UZU^TUW^TV^T

Let [ t)]_×＝-UZU^T，R＝UW^TV^T

The rotation matrix before distortion removal is recorded as R, and the translation vector is recorded as t ═ t_x,t_y,t_z)^T(ii) a The rotation matrix calculated in step 6-2) is R ', and the translation vector is t ' ═ t '_x,t′_y,t′_z)^T(ii) a Then new R_new and t_newAs follows

R_new＝R^1/2R′R^1/2