CN114399532A

CN114399532A - Camera position and posture determining method and device

Info

Publication number: CN114399532A
Application number: CN202210012434.8A
Authority: CN
Inventors: 赵德力; 彭登; 陶永康; 傅志刚; 曾阳露
Original assignee: Guangdong Huitian Aerospace Technology Co Ltd
Current assignee: Guangdong Huitian Aerospace Technology Co Ltd
Priority date: 2022-01-06
Filing date: 2022-01-06
Publication date: 2022-04-26
Also published as: WO2023130842A1

Abstract

The embodiment of the invention provides a camera position and posture determining method and a device, wherein the method comprises the following steps: acquiring a plurality of image frames acquired by a camera; dividing the non-key frames into a plurality of image blocks, extracting angular points meeting preset characteristic point conditions from the image blocks as characteristic points, and determining the relative pose of a first camera by adopting an optical flow method according to the matched characteristic points between adjacent non-key frames; extracting feature points from the key frames, and determining the relative pose of the second camera by using a feature point matching method according to the matched feature points between the adjacent key frames; and determining an optimized camera pose corresponding to the current key frame according to a first camera relative pose between a plurality of adjacent non-key frames and a second camera relative pose between adjacent key frames. The embodiment of the invention can improve the pose precision and achieve balance on the pose precision and the processing efficiency.

Description

Camera position and posture determining method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a camera pose determination method and a camera pose determination device.

Background

The visual odometer estimates the mileage based on the visual image, and is widely applied to positioning of the mobile equipment in an unknown environment. The visual odometry architecture is generally divided into a front end and a rear end, wherein the front end is responsible for constructing multi-view geometric constraints, and the rear end is used for performing nonlinear least square optimization on a reprojection error based on the front end.

The commonly used front end is divided into two visual methods, namely a feature point matching method and an optical flow method. The feature point matching method adopts matched feature points to construct common view constraint among a plurality of image frames, and is characterized by high precision, but has the defects of large calculation amount and easiness in no matching due to the fact that feature point extraction and feature descriptor calculation and matching are involved. The optical flow method does not perform calculation and matching of descriptors, and directly tracks the feature points through the optical flow, so that the calculation cost is greatly reduced, but the problems of relatively low precision and easy tracking loss exist. In essence, both the feature point method and the optical flow method need to rely on the stability of feature points in the scene. Therefore, under the weak texture scene with sparse characteristic points, the stability problems exist in both the front-end constraint methods.

Disclosure of Invention

In view of the above problems, embodiments of the present invention are proposed to provide a camera pose determination method and a corresponding camera pose determination apparatus that overcome or at least partially solve the above problems.

In order to solve the above problem, an embodiment of the present invention discloses a camera pose determining method, including:

acquiring a plurality of image frames acquired by a camera; the plurality of image frames comprise key frames and non-key frames;

dividing the non-key frames into a plurality of image blocks, extracting angular points meeting preset feature point conditions from the image blocks as feature points, and determining the relative pose of a first camera by adopting an optical flow method according to matched feature points between adjacent non-key frames;

extracting feature points from the key frames, and determining the relative pose of the second camera by using a feature point matching method according to the matched feature points between adjacent key frames;

and determining an optimized camera pose corresponding to the current key frame according to a first camera relative pose between a plurality of adjacent non-key frames and a second camera relative pose between the adjacent key frames.

Optionally, the method further comprises:

extracting line segments meeting preset characteristic line conditions from the key frames to serve as characteristic lines;

determining a third camera pose between the adjacent key frames according to the matched feature lines between the adjacent key frames;

determining an optimized camera pose corresponding to a current keyframe from a first camera relative pose between a plurality of the neighboring non-keyframes and a second camera relative pose between the neighboring keyframes, comprising:

and determining the optimized camera pose corresponding to the current key frame according to the first camera relative pose among a plurality of adjacent non-key frames, the second camera relative pose among the adjacent key frames and the third camera pose among the adjacent key frames.

Optionally, the determining an optimized camera pose corresponding to the current key frame according to a first camera relative pose between a plurality of adjacent non-key frames, a second camera relative pose between the adjacent key frames, and a third camera pose between the adjacent key frames includes:

performing Kalman filtering fusion according to a first camera relative pose between a plurality of adjacent non-key frames and a second camera relative pose between the adjacent key frames, and determining an estimated camera pose corresponding to the current key frame;

determining a feature point re-projection error for the current key frame according to the estimated camera pose corresponding to the current key frame;

determining a feature line reprojection error for the current keyframe from a third camera pose between the adjacent keyframes;

and determining the optimized camera pose corresponding to the current key frame based on the feature point re-projection error and the feature line re-projection error aiming at the current key frame.

Optionally, the extracting, as the feature point, an angular point that satisfies a preset feature point condition from the plurality of image blocks includes:

performing low-pass filtering processing on the plurality of image blocks;

respectively calculating angular response values of pixel points in each image block subjected to low-pass filtering;

sorting the angular response values corresponding to the pixel points in each image block respectively, and selecting a preset number of pixel points as initial angular points according to a sorting result;

determining the dispersion degree of the initial corner points in each image block;

setting a corresponding screening proportion for each image block according to the dispersion degree and the angular response value of the initial angular points in each image block;

screening candidate angular points from the corresponding initial angular points according to the screening proportion of each image block;

and a random sampling consistency algorithm is used for screening the target corner points from the candidate corner points.

Optionally, the determining the degree of dispersion of the initial corner points in each image block includes:

clustering the initial angular points of the image blocks to obtain a clustering center;

and determining the pixel distance from each initial corner point to the clustering center, and determining the dispersion degree according to the pixel distance from each initial corner point to the clustering center.

Optionally, the dispersing degree is a pixel distance and a value of each initial corner in the image block, and the setting of the corresponding screening proportion for each image block according to the dispersing degree and the angular response value of the initial corner in each image block includes:

solving a mean square error sum value of the angular response value of the initial angular point of the image block;

calculating evaluation parameters according to the pixel distance and value of each initial angular point in the image block and the mean square error and value of the angular response value;

and determining the screening ratio of each image block according to the evaluation parameters of each image block.

Optionally, the extracting feature points from the key frames, and determining a relative pose of the second camera according to the matched feature points between adjacent key frames includes:

setting a window with a preset pixel size by taking the feature point in the key frame as a center;

determining the absolute value of the difference value between the gray value of the characteristic point and the gray values of other pixel points in the window;

generating an adjacency matrix according to the absolute value of the gray difference between the characteristic point and other pixel points in the window;

generating a description vector as a descriptor of the feature point according to the adjacency matrix;

determining matched feature points between adjacent key frames according to the position information and the descriptors of the feature points of the adjacent key frames;

and determining the relative pose of a second camera between the adjacent key frames according to the matched feature points between the adjacent key frames.

Optionally, the extracting, from the key frame, a line segment that meets a preset feature line condition as a feature line includes:

detecting line segments in the key frame, and taking two non-parallel line segments as a line segment pair;

screening out the line segment pairs meeting the preset characteristic line condition as characteristic lines; the preset characteristic line piece comprises: the length of the line segment is larger than or equal to a preset length threshold value, the distance between the intersection point of the line segment pair and the line segment pair is smaller than or equal to a preset distance threshold value, and the intersection point of the line segment pair is in the image.

Optionally, the determining a third camera pose between adjacent key frames according to the matched feature lines between the adjacent key frames includes:

taking two nonparallel characteristic lines in the key frame as a characteristic line segment pair;

determining the descriptor of the feature line segment pair by taking the acute angle bisector of the intersection point of the feature line segment pair as the direction quantity of the descriptor and taking the pixel size of the center pixel block of the intersection point as the scale quantity of the descriptor;

determining matched characteristic lines between adjacent key frames according to the position information and the descriptors of the characteristic lines of the adjacent key frames;

and determining a third camera pose between the adjacent key frames according to the matched feature lines between the adjacent key frames.

The embodiment of the invention also discloses a camera position and posture determining device, which comprises:

the image acquisition module is used for acquiring a plurality of image frames acquired by the camera; the plurality of image frames comprise key frames and non-key frames;

the first pose determining module is used for dividing the non-key frames into a plurality of image blocks, extracting angular points meeting preset characteristic point conditions from the image blocks as characteristic points, and determining the relative pose of a first camera by adopting an optical flow method according to the characteristic points matched between the adjacent non-key frames;

the second pose determining module is used for extracting feature points from the key frames and determining relative poses of a second camera by adopting a feature point matching method according to the matched feature points between adjacent key frames;

and the optimization pose determining module is used for determining the optimization camera pose corresponding to the current key frame according to the first camera relative pose between a plurality of adjacent non-key frames and the second camera relative pose between the adjacent key frames.

The embodiment of the invention also discloses an electronic device, which comprises: a processor, a memory and a computer program stored on the memory and capable of running on the processor, which computer program, when executed by the processor, implements the steps of the camera pose determination method as described above.

The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when being executed by a processor, the computer program realizes the steps of the camera pose determination method.

The embodiment of the invention has the following advantages:

according to the embodiment of the invention, a plurality of image frames collected by a camera can be obtained, wherein the image frames comprise key frames and non-key frames; dividing the non-key frames into a plurality of image blocks, extracting angular points meeting preset characteristic point conditions from the image blocks as characteristic points, and determining the relative pose of a first camera by adopting an optical flow method according to the matched characteristic points between adjacent non-key frames; extracting feature points from the key frames, and determining the relative pose of the second camera by using a feature point matching method according to the matched feature points between the adjacent key frames; and determining an optimized camera pose corresponding to the current key frame according to a first camera relative pose between a plurality of adjacent non-key frames and a second camera relative pose between adjacent key frames. The embodiment of the invention integrates the advantages of an optical flow tracking algorithm and a feature point matching algorithm, and can achieve balance on attitude precision and processing efficiency; and the non-key frames are processed by determining the uniformly distributed characteristic points by adopting an optical flow method, so that the relative pose of the first camera between the adjacent non-key frames is determined, the calculation precision of the relative pose of the first camera can be provided, and the calculation efficiency and the stability of the optical flow method are improved.

Drawings

Fig. 1 is a flowchart illustrating steps of a method for determining a pose of a camera according to an embodiment of the present invention;

FIG. 2 is a flow chart of Kalman filtering in combination with a feature point matching method and an optical flow method according to an embodiment of the present invention;

FIG. 3 is a flow chart of steps of another method for determining camera pose provided by an embodiment of the present invention;

FIG. 4 is a diagram illustrating uniform feature point screening from each image block according to an embodiment of the present invention;

FIG. 5 is a schematic illustration of determining a feature line reprojection error;

FIG. 6 is a flow chart of a visual odometer optimization method based on camera pose optimization in an embodiment of the invention;

fig. 7 is a block diagram of a camera pose determination apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

The visual odometer VIO (visual Inertial odometry) utilizes images acquired by a camera and motion information (including acceleration, angular velocity and the like) detected by an Inertial navigation unit IMU (Inertial Measurement unit) to perform advantage complementation, the IMU can accurately measure motion in a short time, when the tracking quality of the camera at the front end between adjacent frames is poor, the Measurement of the IMU can measure the motion between the frames and provide inter-frame constraint, the system is ensured to continue to operate, and meanwhile, the method of fusion of the vision and the IMU can enable the pose estimation to be more accurate.

Aiming at the visual part, the core concept of the embodiment of the invention is that the advantages of an optical flow tracking algorithm and a feature point matching algorithm are fused, and the balance between the attitude and position precision and the processing efficiency can be achieved; and the non-key frames are processed by determining the uniformly distributed characteristic points by adopting an optical flow method, so that the relative pose of the first camera between the adjacent non-key frames is determined, the calculation precision of the relative pose of the first camera can be provided, and the calculation efficiency and the stability of the optical flow method are improved.

Referring to fig. 1, a flowchart illustrating steps of a camera pose determination method provided in an embodiment of the present invention is shown, where the method may specifically include the following steps:

step 101, acquiring a plurality of image frames acquired by a camera; the plurality of image frames includes key frames and non-key frames.

In a practical application scenario, a camera may be disposed on a mobile device such as a vehicle or an aircraft, and the camera captures a sequence of images of a surrounding environment. The image sequence may include a plurality of image frames, which may be key frames and non-key frames, the key frames being representative image frames. Illustratively, the key frame selection principle may be: 1. if the number of the tracked feature points of the current image frame is less than 20, the new image frame is counted as a key frame. 2. If the average disparity between the current image frame and the previous N and previous N +1 historical key frames (the value of N may be set by itself, for example, N is 10) is greater than a certain threshold (for example, 0.02), the current image frame is counted as a key frame. The specific key frame selection method is not limited in the embodiments of the present invention.

Referring to fig. 2, a flow chart of kalman filtering by combining the feature point matching method and the optical flow method according to an embodiment of the present invention is shown. And performing Kalman filtering fusion on the camera pose obtained by adopting the feature point matching algorithm and the camera pose obtained by adopting the optical flow method to obtain the optimized camera pose.

Step 102, dividing the non-key frames into a plurality of image blocks, extracting angular points meeting preset feature point conditions from the image blocks as feature points, and determining the relative pose of the first camera by adopting an optical flow method according to the matched feature points between adjacent non-key frames.

The motion of an object in a real scene can cause the change of the brightness mode of a corresponding pixel point on an image besides the motion of a corresponding point on the image. The optical flow is the speed of the luminance pattern motion of the pixel points on the image or the luminance change of the pixel points caused by the projection of the movement of the luminous part carried by the object on the image plane. The optical flow method may determine a relative camera pose between adjacent image frames based on optical flow constraints of pixel points between the adjacent image frames.

The optical flow method utilizes angular points to track optical flow, wherein the angular points refer to points with large gray value change in the image. The corner point detection method may include: harris corner detection algorithm, ORB corner detection algorithm, FAST detection algorithm and the like.

In the embodiment of the invention, the characteristic points in the non-key frames are tracked by adopting an optical flow method, and the optical flow tracking is easy to have the problem that the tracking points are intensively distributed in a small image area. In this regard, in the embodiment of the present invention, an improved corner detection method may be adopted to uniformly screen and extract feature points in an image frame, and then a random sample consensus (random sample consensus) may be used to further process the homogenized corner points to reduce possible pairing errors.

In the embodiment of the invention, the non-key frame can be divided into a plurality of image blocks, the size of each image block is the same, and the feature points of the preset number are respectively extracted from each image block, so that the feature points in the whole image can be ensured to be uniformly distributed. For example, the image frame may be divided into 9 image blocks with the same size, and each image block is individually extracted with an angular point satisfying a preset characteristic point condition as a characteristic point. The preset feature point condition can be used for uniformly screening out angular points from the image block as feature points.

And processing by adopting an optical flow method according to the uniformly distributed characteristic points between the adjacent non-key frames, determining the relative pose of the first camera between the adjacent non-key frames, providing the calculation precision of the relative pose of the first camera, and improving the calculation efficiency and stability of the optical flow method.

And 103, extracting feature points from the key frames, and determining the relative pose of the second camera by using a feature point matching method according to the matched feature points between the adjacent key frames.

The feature point matching method needs to extract feature points and determine descriptors, the matched feature points are determined based on the descriptors of the feature points, and the feature point matching method is only adopted for key frames due to the fact that the time consumption of matching is large. And determining the relative pose of the second camera between the two key frames by adopting the feature points in the two key frames based on a feature point matching method.

And 104, determining an optimized camera pose corresponding to the current key frame according to a first camera relative pose between a plurality of adjacent non-key frames and a second camera relative pose between the adjacent key frames.

An optimized camera pose corresponding to a current keyframe may be determined from a first camera relative pose between a plurality of adjacent non-keyframes between adjacent keyframes and a second camera relative pose between adjacent keyframes.

Referring to fig. 3, a flowchart illustrating steps of another method for determining a pose of a camera according to an embodiment of the present invention is shown, where the method specifically includes the following steps:

step 301, acquiring a plurality of image frames acquired by a camera; the plurality of image frames includes key frames and non-key frames.

Step 302, dividing the non-key frames into a plurality of image blocks, extracting angular points meeting preset feature point conditions from the plurality of image blocks as feature points, and determining the relative pose of the first camera by adopting an optical flow method according to the matched feature points between adjacent non-key frames.

In the embodiment of the invention, an improved Harris corner detection algorithm can be adopted to divide the non-key frame into a plurality of image blocks, and the corner points meeting the preset characteristic point condition are extracted from the plurality of image blocks to serve as the characteristic points.

In this embodiment of the present invention, the step of extracting corner points satisfying the preset feature point condition from the plurality of image blocks as feature points may include the following sub-steps:

and a substep S11 of performing low-pass filtering processing on the plurality of image blocks.

When the Harris angular point detection algorithm is adopted, the traditional Harris angular point detection algorithm firstly performs Gaussian smoothing on the image, namely filtering the image by adopting Gaussian. However, gaussian smoothing greatly weakens the high frequency part (the edge of the object), the edge becomes gentle, and the image histogram is compressed in appearance. Resulting in corner loss when non-maxima suppression is done.

In contrast, the Harris corner detection algorithm improved by the invention can perform low-pass filtering processing on a plurality of image blocks. Illustratively, a cubic B-spline function having a low-pass characteristic may be employed instead of the gaussian function for smoothing filtering.

And a substep S12 of calculating angular response values for the pixels in each image block subjected to the low-pass filtering.

Assume that the gradation change due to the minute movement of the image window is E (u, v) ═ u, v M (u, v)^T。

This regional variation is similar to the local autocorrelation function, with M being the matrix representation of the autocorrelation function, λ₁，λ₂Is the two eigenvalues of the matrix M, expressed as the first order curvature of the autocorrelation function.

At pixel point (x, y), the angular response function is as follows: c (x, y) ═ λ₁λ₂-α(λ₁+λ₂)²Where α is a verified value, typically set at 0.04-0.06.

And a substep S13 of sorting the angular response values corresponding to the pixel points in each image block, and selecting a preset number of pixel points as initial angular points according to the sorting result.

For example, N angular response values in each image block may be sorted, and the top B ═ k × N angular points with relatively large values are selected as the initial angular points to be detected (k takes a value of 0 to 1), where k takes a different value in different image blocks, and the value of k is to ensure that each image block has a considerable number of initial angular points.

And a sub-step S14 of determining the degree of dispersion of the initial corner points in each image block.

The degree of scatter may represent the initial corner scatter in the image block. In one embodiment, the sub-step S14 may further include:

the substep S141, clustering the initial corner points of the image blocks to obtain a clustering center;

and a substep S142 of determining the pixel distance from each initial corner point to the clustering center and determining the dispersion degree according to the pixel distance from each initial corner point to the clustering center.

In one example, the dispersion degree of the initial corner points in the image block may be determined by summing the pixel distances of the initial corner points to the cluster center. In another example, the dispersion degree of the initial corner points in the image block may be determined according to an average value of pixel distances from each initial corner point to the center of the cluster.

And a substep S15 of setting a corresponding screening proportion for each image block according to the dispersion degree and the angular response value of the initial angular points in each image block.

The screening proportion is the proportion of screening and reserving angular points from the initial angular points, and in order to ensure that each image block has angular points reserved for use, the angular points of the image block should be reserved in the places with more dispersion and higher response values. And setting a corresponding screening proportion for each image block according to the dispersion degree and the angular response value of the initial angular points in each image block.

In one embodiment, the sub-step S15 may further include:

and a substep S151 of calculating a sum of mean square errors for the angular response values of the initial corners of the image blocks.

And a substep S152, calculating evaluation parameters according to the pixel distance and value of each initial angular point in the image block and the mean square error and value of the angular response value.

Illustratively, the parameters are evaluated

Representing an image block; i represents an initial corner index;

representing the mean value of the initial corner point response values in the block; d_iRepresenting the pixel distance from the origin point i to the cluster center.

And a substep S153 of determining the screening ratio of each image block according to the evaluation parameters of each image block.

For example, the image blocks may be sorted according to the evaluation parameters of the respective image blocks, and the screening ratio of the respective image blocks may be determined according to the sorting result. Wherein the more top the ranking, the more the screening proportion.

For example, according to the ranking result based on the evaluation parameter W, a screening proportion η W is determined, where (η W ∈ (0, 1), and finally η W · B candidate corner points are extracted from a single image block.

Fig. 4 is a schematic diagram illustrating uniform feature point screening from each image block according to an embodiment of the present invention. And deleting more initial corner points for the image blocks in the initial corner point comparison set. For image blocks with scattered initial corner points, more initial corner points are reserved, and a plurality of corner points with equivalent quantity can be reserved in each image block after screening.

And a substep S16 of screening candidate corner points from the corresponding initial corner points according to the screening proportion of each image block.

And a substep S17, selecting a target corner point from the candidate corner points by using a random sampling consistency algorithm.

And deleting local points from the candidate corner points by using a random sampling consistency algorithm, and taking the reserved candidate corner points as target corner points.

In a set P consisting of N given data points, the random sampling consistency algorithm assumes that most points in the set can be generated by one model, and at least N points (N < N) can be used for fitting the parameters of the model, and points which do not conform to the fitted model are outliers.

Illustratively, fitting can be performed in an iterative manner, specifically including the following steps: a. randomly selecting n data points from P; b. fitting a model M by using the n data points; c. calculating the distance between each point and the model M by using the remaining data points in the P, considering the distance as a local outer point if the distance exceeds a threshold value, and considering the distance as a local inner point if the distance exceeds the threshold value, and recording the tree quantity M of the local inner point corresponding to the model; d. and after k iterations, selecting the model M with the maximum M as a fitting result.

Step 303, extracting feature points from the key frames, and determining the relative pose of the second camera by using a feature point matching method according to the feature points matched between the adjacent key frames.

In an embodiment of the present invention, the step 303 may include the following sub-steps:

and a sub-step S21 of setting a window of a preset pixel size centered on the feature point in the key frame.

And a substep S22 of determining an absolute value of a difference between the gray value of the feature point and the gray values of other pixel points in the window.

And a substep S23 of generating an adjacency matrix according to the absolute value of the gray difference between the feature point and other pixel points in the window.

And a substep S24 of generating a description vector as a descriptor of the feature point based on the adjacency matrix.

And a sub-step S25 of determining matched feature points between adjacent key frames according to the location information and descriptors of the feature points of the adjacent key frames.

For example, in order to reduce the probability of mismatching in the feature point matching process, a window with a size of 3 × 3 pixels may be set with the feature point as a center, and the absolute value of the difference between the center of the window and the 8-domain pixel point and the feature gray value may be calculated. I is_pRepresenting the centre point, i.e. the characteristic point, I_xiIs a gray value of 8 fields.

I_i＝|I_p-I_xi|,i＝1,2,...8

A adjacency matrix F is generated and,

the feature points p are represented by a description vector H,

describing a vector of which λ_i(p) represents the decomposed characteristic value, | λ, of the adjacency matrix F_pAnd |' represents the second norm of the feature value. The description vector of matching point q of characteristic point p can be obtained in the same way

Is provided with

| H | and | G | are the modulo lengths of the two vectors, respectively, and a threshold t is set when v>And when t is reached, the characteristic point pair is reserved, and if not, the characteristic point pair is deleted.

And a sub-step S26 of determining a second camera relative pose between the adjacent key frames according to the matched feature points between the adjacent key frames.

And 304, performing Kalman filtering fusion according to the relative poses of the first camera between the adjacent non-key frames and the relative poses of the second camera between the adjacent key frames, and determining the estimated camera pose corresponding to the current key frame.

And fusing the light stream tracking result and the optimized feature point matching result through a Kalman filtering algorithm. Kalman filtering is carried out at each key frame, and the accumulated error of non-key frame optical flow tracking between two key frames can be corrected by using the high-precision feature point matching result of the two key frames. By adopting the idea of Kalman filtering, the respective advantages of the optical flow tracking algorithm and the feature point matching algorithm are fused, and the balance of precision and efficiency can be achieved.

The Kalman filtering comprises two steps of prediction and updating:

in the prediction stage, the relative poses among 2 key frames are accumulated by an optical flow tracking method and used as estimation, and the relative pose of a camera obtained by optimization matching of characteristic points is used as observation.

Wherein A is_kThe state transition matrix represents the relative pose transformation of the camera obtained by accumulating from the last key frame to the current key frame by an optical flow method; epsilon_k(0, R) Gaussian noise representing equation of motion;

the corrected state estimate and covariance for the previous keyframe; r is the covariance of the noise and is set as a constant;

and

is the state estimate and covariance of the current key frame. z is a radical of_kIs the camera pose obtained by the current time characteristic point method, C is set as an identity matrix, delta_kN (0, Q) represents the observed noise. Since the error of the state estimation equation (the pose estimated by the optical flow method) is larger than that of the characteristic point method, the Q value is generally smaller than the R value.

In the updating stage, firstly, Kalman gain K is calculated, and then the state estimation and covariance of the current key frame are corrected to obtain the fused estimated camera pose

Sum covariance

And 305, determining a feature point re-projection error aiming at the current key frame according to the estimated camera pose corresponding to the current key frame.

The position information of the corresponding feature point P1' at the moment of the current key frame can be calculated based on the position information of the feature point P1 in the key frame of the previous frame and the estimated camera pose; the position information of the feature point P2 matched with the feature point P1 is determined from the current key, and the feature point reprojection error can be calculated from the position information of P1' and the position information of P2. In practice, the feature point reprojection error may be calculated from a plurality of feature points between adjacent keyframes.

Step 306, extracting line segments meeting the preset characteristic line condition from the key frame as characteristic lines.

In the overhead weak texture area of a city and in the scene with severe light change, the problems of positioning failure and unstable output easily occur if the visual odometer of the video camera is constrained by single point features. When the urban area is viewed as a scene from air to ground, the ground building has strong structural characteristics. The line feature formed by the geometric boundary of the building can provide the direction information which cannot be provided by the feature point. Therefore, the robustness of the system in an urban scene can be improved by introducing the linear features, and the problems of positioning failure and unstable output can be well solved by fusing the feature points and the optical flow and adding the linear feature constraint.

In an embodiment of the present invention, the step 306 may include the following sub-steps:

and a sub-step S31 of detecting a line segment in the key frame, wherein two non-parallel line segments are used as a line segment pair.

A substep S32 of screening out a line segment pair meeting a preset characteristic line condition as a characteristic line for the line segment pair; the preset characteristic line piece comprises: the length of the line segment is larger than or equal to a preset length threshold value, the distance between the intersection point of the line segment pair and the line segment pair is smaller than or equal to a preset distance threshold value, and the intersection point of the line segment pair is in the image.

In particular, for non-parallel line segments (line segment extensions) in the image, there must be intersections, and the number of internal elements of such a line segment set will usually be very large. In order to better extract the characteristic straight line, the following screens are carried out: removing the line segment pairs with the distance between the intersection point and the line segment pairs larger than or equal to a preset distance threshold value, and keeping the line segment pairs with the distance between the intersection point and the line segment pairs smaller than or equal to the preset distance threshold value; removing the line segment pairs of the intersection points outside the image, and keeping the line segment pairs of the intersection points inside the image; removing segment pairs with the length smaller than a preset length threshold value, and reserving segment pairs with the segment length larger than or equal to the preset length threshold value; and the line segment obtained after screening is the effective characteristic line segment.

And 307, determining a third camera pose between the adjacent key frames according to the matched feature lines between the adjacent key frames.

In an embodiment of the present invention, the step 307 may include the following sub-steps:

in sub-step S41, two feature lines that are not parallel in the key frame are used as a pair of feature line segments.

And a substep S42, determining the descriptor of the characteristic line segment pair by taking the acute angle bisector of the intersection point of the characteristic line segment pair as the direction quantity of the descriptor and taking the pixel size of the pixel block at the center of the intersection point as the scale quantity of the descriptor.

In the embodiment of the present invention, for the description of the line segment feature, the description may be based on the intersection point of the feature line segment pair. The multi-scale rotation BRIEF descriptor can be calculated based on the intersection points, specifically, the direction quantity of the descriptor is the acute angle bisector of the intersection points of the feature line segment pairs, and the scale quantity of the descriptor is the pixel size of the pixel block in the center of the intersection points, so that the descriptor of the feature line segment pairs is determined.

And a sub-step S43 of determining matching feature lines between adjacent key frames according to the location information and descriptors of the feature lines of the adjacent key frames.

And a substep S44 of determining a third camera pose between adjacent keyframes according to the matched feature lines between the adjacent keyframes.

Referring to fig. 5, a schematic diagram of determining the feature line reprojection error is shown. Suppose that the line segment L seen in the image frame at time i matches the observed line segment L in the image at time j. However, the line segment l at the time i is projected to the image at the time j by the camera pose (including the rotation matrix R and the translation matrix T) to obtain the line segment ab. The line segment reprojection error is represented By the distance from the two ports a and b of the projected line segment to the observation line L (Ax + By + C is 0).

e represents a reprojection error of a certain line segment, and d (a, L) represents a distance from the end point a of the projected line segment to the observation line L. The camera pose RT can be optimized by optimizing the reprojection error of the line segments.

Step 308, determining a feature line reprojection error for the current keyframe according to a third camera pose between the adjacent keyframes.

Step 309, determining an optimized camera pose corresponding to the current keyframe based on the feature point reprojection error and the feature line reprojection error for the current keyframe.

And performing nonlinear least square optimization on the feature point reprojection error and the feature line reprojection error so as to determine the optimized camera pose corresponding to the current key frame.

In the embodiment of the invention, a plurality of image frames collected by a camera can be acquired, wherein the image frames comprise key frames and non-key frames; dividing the non-key frames into a plurality of image blocks, extracting angular points meeting preset characteristic point conditions from the image blocks as characteristic points, and determining the relative pose of a first camera by adopting an optical flow method according to the matched characteristic points between adjacent non-key frames; extracting feature points from the key frames, and determining the relative pose of the second camera by using a feature point matching method according to the matched feature points between the adjacent key frames; performing Kalman filtering fusion according to a first camera relative pose between a plurality of adjacent non-key frames and a second camera relative pose between adjacent key frames, and determining an estimated camera pose corresponding to the current key frame; determining a feature point re-projection error aiming at the current key frame according to the estimated camera pose corresponding to the current key frame; extracting line segments meeting preset characteristic line conditions from the key frames to serve as characteristic lines; determining a third camera pose between the adjacent key frames according to the matched feature lines between the adjacent key frames; determining a feature line reprojection error for the current keyframe according to a third camera pose between adjacent keyframes; and determining the optimized camera pose corresponding to the key frame based on the feature point re-projection error and the feature line re-projection error aiming at the key frame. According to the embodiment of the invention, the relative pose of the first camera between the adjacent non-key frames is determined by adopting the optical flow method to process according to the uniformly distributed feature points between the adjacent non-key frames, so that the calculation precision of the relative pose of the first camera can be provided, and the calculation efficiency and the stability of the optical flow method are improved. By adopting the idea of Kalman filtering, the advantages of the optical flow tracking algorithm and the characteristic point matching algorithm are fused, and the balance of precision and efficiency is achieved. And the robustness of the system in an urban scene can be improved by introducing the linear features, and the stability of pose estimation can be improved by fusing the feature points and the optical flow and adding linear feature constraints.

In order to enable a person skilled in the art to better understand the embodiments of the present invention, the following description is given by way of an example: fig. 6 is a flowchart of a visual odometer optimization method based on camera pose optimization according to an embodiment of the present invention.

In this example, a binocular camera is provided at the vehicle to capture images, and motion information is captured by an inertial navigation unit. The camera and the inertial navigation unit may be rigidly connected.

For the inertial navigation unit part, measuring motion information (including acceleration and angular velocity) through the inertial navigation unit, combining a plurality of pieces of motion information among image frames into an observation value by adopting a pre-integration method, and outputting the observation value to obtain camera motion accumulation; determining a pre-integration error according to camera motion accumulation; and finally, sending the pre-integration error to a back-end server for optimization.

For the visual part, firstly, initializing to acquire an image frame; judging whether the image frame is a key frame or a non-key frame; if the frame is a non-key frame, detecting the corner by adopting the improved corner detection method in the embodiment of the invention, and taking the corner as a characteristic point; and tracking the feature points in the adjacent non-key frames by adopting a tube optical flow method to obtain the camera pose change accumulation. For the key frame, on one hand, extracting feature points from the key frame, and calculating descriptors of the feature points; determining the relative pose of the camera by adopting a feature point matching method according to the matched feature points between the adjacent key frames; and performing Kalman filtering by adopting camera pose change accumulation determined according to an optical flow method and camera relative pose determined according to a feature point matching method to obtain the camera pose corresponding to the current key frame. On the other hand, feature lines are detected from the key frames, descriptors of the feature lines are calculated, matching is carried out according to the matched feature lines between adjacent key frames, and the camera pose corresponding to the current key frame is solved.

And determining the error of the feature point re-projection side by taking the camera pose solved based on the feature points and the IMU motion accumulation between two key frames detected by the inertial navigation unit as camera constraints. And determining the reprojection side error of the characteristic line by taking the camera pose solved based on the characteristic line and the IMU motion accumulation between two key frames detected by the inertial navigation unit as camera constraints. And optimizing by the back-end server according to the feature point re-projection side error, the feature line re-projection side error and the pre-integral error to finally obtain the optimized camera pose corresponding to the current key frame.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 7, a block diagram of a structure of a camera pose determination apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:

an image obtaining module 701, configured to obtain a plurality of image frames acquired by a camera; the plurality of image frames comprise key frames and non-key frames;

a first pose determining module 702, configured to divide the non-key frame into a plurality of image blocks, extract, from the plurality of image blocks, corner points that meet a preset feature point condition as feature points, and determine, according to feature points matched between adjacent non-key frames, a relative pose of a first camera by using an optical flow method;

a second pose determining module 703, configured to extract feature points from the key frames, and determine a relative pose of the second camera by using a feature point matching method according to feature points matched between adjacent key frames;

an optimized pose determination module 704, configured to determine an optimized camera pose corresponding to the current key frame according to a first camera relative pose between a plurality of the adjacent non-key frames and a second camera relative pose between the adjacent key frames.

In the embodiment of the present invention, the apparatus may further include:

the characteristic line extraction module is used for extracting line segments meeting preset characteristic line conditions from the key frames to serve as characteristic lines;

the third pose determining module is used for determining a third camera pose between the adjacent key frames according to the matched characteristic lines between the adjacent key frames;

the optimization pose determination module includes:

and the optimization pose determination submodule is used for determining the optimization camera pose corresponding to the current key frame according to the first camera relative pose among the plurality of adjacent non-key frames, the second camera relative pose among the adjacent key frames and the third camera pose among the adjacent key frames.

In an embodiment of the present invention, the optimization pose determination sub-module may include:

the estimated pose determining unit is used for performing Kalman filtering fusion according to a first camera relative pose between a plurality of adjacent non-key frames and a second camera relative pose between the adjacent key frames to determine an estimated camera pose corresponding to the current key frame;

a first error determination unit, configured to determine a feature point re-projection error for the current keyframe according to an estimated camera pose corresponding to the current keyframe;

a second error determination unit, configured to determine a feature line re-projection error for the current keyframe according to a third camera pose between the adjacent keyframes;

and the optimization pose determining unit is used for determining the optimization camera pose corresponding to the current key frame based on the feature point re-projection error and the feature line re-projection error aiming at the current key frame.

In an embodiment of the present invention, the first posture determining module may include:

the filtering submodule is used for carrying out low-pass filtering processing on the plurality of image blocks;

the response value calculation operator module is used for calculating angular response values of pixel points in the image blocks subjected to low-pass filtering;

the initial angular point selection sub-module is used for respectively sequencing angular point response values corresponding to all pixel points in all the image blocks, and selecting a preset number of pixel points as initial angular points according to a sequencing result;

the dispersion degree determining submodule is used for determining the dispersion degree of the initial corner points in each image block;

the screening proportion determining sub-module is used for setting a corresponding screening proportion for each image block according to the dispersion degree of the initial angular points in each image block and the angular response value;

the candidate angular point determining submodule is used for screening candidate angular points from the corresponding initial angular points according to the screening proportion of each image block;

and the target corner screening submodule is used for screening the target corners from the candidate corners by a random sampling consistency algorithm.

In an embodiment of the present invention, the dispersion degree determining sub-module may include:

the clustering unit is used for clustering the initial angular points of the image blocks to obtain clustering centers;

and the dispersion degree determining unit is used for determining the pixel distance from each initial angular point to the clustering center and determining the dispersion degree according to the pixel distance from each initial angular point to the clustering center.

In this embodiment of the present invention, the dispersion degree is a pixel distance and a value of each initial corner point in the image block, and the screening ratio determining sub-module may include:

the response value processing unit is used for solving a mean square error sum value of angular response values of the initial angular points of the image blocks;

the evaluation parameter calculation unit is used for calculating evaluation parameters according to the pixel distance and value of each initial angular point in the image block and the mean square error and value of the angular response value;

and the screening ratio determining unit is used for determining the screening ratio of each image block according to the evaluation parameters of each image block.

In an embodiment of the present invention, the second posture determination module may include:

the window setting submodule is used for setting a window with a preset pixel size by taking the feature point in the key frame as a center;

the difference value determining submodule is used for determining the absolute value of the difference value between the gray value of the characteristic point and the gray values of other pixel points in the window;

the matrix generation module is used for generating an adjacent matrix according to the absolute value of the gray difference between the characteristic point and other pixel points in the window;

the first descriptor determining submodule is used for generating a description vector as a descriptor of the characteristic point according to the adjacency matrix;

the feature point matching submodule is used for determining matched feature points between adjacent key frames according to the position information and the descriptors of the feature points of the adjacent key frames;

and the second pose determining submodule is used for determining a second camera relative pose between the adjacent key frames according to the matched feature points between the adjacent key frames.

In an embodiment of the present invention, the feature line extraction module may include:

the line segment pair selection submodule is used for detecting line segments in the key frame, and two non-parallel line segments are used as line segment pairs;

the line segment pair screening submodule is used for screening line segment pairs meeting preset characteristic line conditions as characteristic lines for the line segment pairs; the preset characteristic line piece comprises: the length of the line segment is larger than or equal to a preset length threshold value, the distance between the intersection point of the line segment pair and the line segment pair is smaller than or equal to a preset distance threshold value, and the intersection point of the line segment pair is in the image.

In an embodiment of the present invention, the third posture determining module may include:

the feature line segment pair determining submodule is used for taking two nonparallel feature lines in the key frame as a feature line segment pair;

the second descriptor determining submodule is used for determining the descriptor of the characteristic line segment pair by taking the acute angle bisector of the intersection point of the characteristic line segment pair as the direction quantity of the descriptor and taking the pixel size of the intersection point center pixel block as the scale quantity of the descriptor;

the characteristic line matching submodule is used for determining the matched characteristic line between the adjacent key frames according to the position information and the descriptor of the characteristic line of the adjacent key frames;

and the third pose determining submodule is used for determining a third camera pose between the adjacent key frames according to the matched feature lines between the adjacent key frames.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

An embodiment of the present invention further provides an electronic device, including:

the camera pose determination method comprises a processor, a memory and a computer program which is stored in the memory and can run on the processor, wherein when the computer program is executed by the processor, each process of the camera pose determination method embodiment is realized, the same technical effect can be achieved, and the details are not repeated here to avoid repetition.

The embodiment of the invention also provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program realizes each process of the camera pose determination method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the description is not repeated here.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The camera pose determination method and the camera pose determination device provided by the invention are described in detail, specific examples are applied in the text to explain the principle and the implementation mode of the invention, and the description of the examples is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A camera pose determination method is characterized by comprising the following steps:

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein determining an optimized camera pose for the current keyframe from a first camera relative pose between a plurality of the neighboring non-keyframes, a second camera relative pose between the neighboring keyframes, and a third camera pose between the neighboring keyframes comprises:

4. The method according to claim 1, wherein the extracting corner points satisfying a preset feature point condition from the plurality of image blocks as feature points comprises:

performing low-pass filtering processing on the plurality of image blocks;

5. The method according to claim 4, wherein the determining the degree of dispersion of the initial corner points in each image block comprises:

6. The method according to claim 5, wherein the dispersion degree is a pixel distance and a value of each initial corner point in the image block, and the setting of the corresponding screening proportion for each image block according to the dispersion degree and the angular response value of the initial corner point in each image block comprises:

7. The method of claim 1, wherein extracting feature points from the keyframes and determining second camera relative poses from matched feature points between neighboring keyframes comprises:

8. The method according to claim 2, wherein the extracting, from the key frame, line segments satisfying a preset feature line condition as feature lines comprises:

9. The method of claim 8, wherein determining a third camera pose between adjacent keyframes from the matched feature lines between adjacent keyframes comprises:

10. A camera pose determination device, comprising:

11. An electronic device, comprising: a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the camera pose determination method of any one of claims 1-9.

12. A computer-readable storage medium, characterized in that a computer program is stored thereon, which when executed by a processor implements the steps of the camera pose determination method according to any one of claims 1 to 9.