CN108921899B

CN108921899B - Indoor visual positioning method for solving fundamental matrix based on pixel threshold

Info

Publication number: CN108921899B
Application number: CN201810772251.XA
Authority: CN
Inventors: 马琳; 谭竞扬; 秦丹阳
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2018-07-13
Filing date: 2018-07-13
Publication date: 2021-05-28
Anticipated expiration: 2038-07-13
Also published as: CN108921899A

Abstract

The invention provides an indoor visual positioning method for solving a basic matrix based on a pixel threshold value, and belongs to the technical field of image processing. Firstly, extracting feature points from a user image with the size cut to be consistent with an image in a database by using a SURF algorithm, and matching the feature points; then, the matched feature point pairs are arranged in ascending order from small to large according to Euclidean distance, the pixel distance of the matched feature point pairs is sequentially calculated, the matched feature point pairs are stored for standby if the pixel distance is smaller than a pixel threshold value, and the matched feature point pairs are removed if the pixel distance is larger than the pixel threshold value until eight pairs of matched feature point pairs with pixel distances meeting a threshold value condition are obtained; and finally, solving the basic matrix by using the eight pairs of matched characteristic point pairs obtained in the step five, and further carrying out indoor visual positioning. The invention solves the problem of large error of the existing indoor visual positioning method based on solving the fundamental matrix. The invention can be used for indoor visual positioning.

Description

Indoor visual positioning method for solving fundamental matrix based on pixel threshold

Technical Field

The invention relates to an indoor visual positioning method, and belongs to the technical field of image processing.

Background

The geographic position, angle, and internal parameters of the camera cause the matching feature point pairs to differ in pixel coordinates on different images. The pixel distance is the distance of the characteristic point to the pixel coordinates on the two images. The existing fundamental matrix solving algorithm generally adopts an eight-point method. Because the only standard for selecting eight pairs of matching characteristic points by the traditional eight-point method is the Euclidean distance of the matching characteristic point pairs, when the indoor environment is changed due to human factors, the pixel drift of the matching point pairs is generated, the fundamental matrix solution is carried out by utilizing the matching point pairs with distorted pixel coordinates, a great error is caused, and the fundamental matrix reflects the rotation and translation relation between cameras, so that the subsequent positioning error is increased. Therefore, a method with strong robustness is needed to optimize the matching feature point pairs, so as to improve the positioning accuracy.

Disclosure of Invention

The invention provides an indoor visual positioning method for solving a basic matrix based on a pixel threshold value, aiming at solving the problem of large error of the existing indoor visual positioning method for solving the basic matrix.

The invention relates to an indoor visual positioning method for solving a basic matrix based on a pixel threshold, which is realized by the following technical scheme:

cutting the image size of a user to be consistent with the image size in a database;

secondly, extracting feature points from the user image and the image in the database respectively by utilizing an SURF algorithm;

step three, matching the characteristic points of the user image and the images in the database, and obtaining the Euclidean distance of each matched characteristic point pair; the image with the most matching characteristic points with the user image in the database is the image matched with the user;

step four, arranging the matching characteristic point pairs between the user image and the image matched with the user in an ascending order from small to large according to the Euclidean distance;

step five, calculating pixel distances of matching feature point pairs between the user image and the image matched with the user in sequence, judging whether the pixel distances are smaller than a pixel threshold value or not, storing the matching feature point pairs for later use, and if the pixel distances are larger than the pixel threshold value, removing the matching feature point pairs until eight pairs of matching feature point pairs with pixel distances meeting a threshold value condition are obtained;

and step six, solving the basic matrix by using the eight pairs of matched characteristic point pairs obtained in the step five, and further carrying out indoor visual positioning.

As a further elaboration of the above technical solution:

further, the specific process of extracting the feature points in the second step includes:

step two, feature point detection:

convolving the image by using a box filter, convolving the image by using box filters with different sizes by changing the size of the box filter and constructing a scale space pyramid to form a multi-scale space function D_xx，D_yy，D_xy(ii) a Wherein D_xxRepresenting points on an image with a second-order partial derivative of Gaussian

The result of the convolution, wherein D_yyRepresenting points on an image with a second-order partial derivative of Gaussian

The result of the convolution, wherein D_xyRepresenting points on an image with a second-order partial derivative of Gaussian

The result of the convolution; x represents the abscissa of a pixel point on the image, y represents the ordinate of a pixel point on the image, and g (σ) represents a Gaussian kernel function; σ represents the scale of the gaussian kernel function;

after the scale space pyramid is constructed, the local extreme value detH under a certain specific scale is obtained through the following formula:

detH＝D_xx×D_yy-(0.9×D_xy)² (1)

carrying out non-maximum suppression on points on the image in a 3 multiplied by 3 stereo neighborhood, screening the points meeting the conditions as feature points, and simultaneously saving the positions and the sizes of the feature points;

step two, describing feature points:

after the positions of the characteristic points are determined, the main direction of the characteristic points is determined by using haar wavelets, so that the rotation and scale invariance of the characteristic points are ensured.

Further, the specific process of matching the feature points in step three includes:

respectively calculating Euclidean distances of a feature point in a user image and all feature points of images in a database, selecting a nearest neighbor feature point Euclidean distance Ed _ min1 and a next nearest neighbor feature point Euclidean distance Ed _ min2, calculating a ratio of the nearest neighbor feature point Euclidean distance Ed _ min1 and the next nearest neighbor feature point Euclidean distance Ed _ min2, regarding the feature points with the ratio less than or equal to a first threshold value T _ Ed as correctly matched feature points, and otherwise, regarding the feature points as incorrectly matched feature points, and forming matched feature point pairs by the correctly matched feature points; repeating the steps for all the characteristic points in the user image to complete the matching of the characteristic points of the user image and the image in the database;

the feature point matching formula is shown in formula (2):

further, the specific process of the step five includes:

fifthly, selecting a group of matching characteristic point pairs with the minimum Euclidean distance from the matching characteristic point pairs which are arranged in the fourth step according to the ascending order of the Euclidean distances from small to large;

step two, calculating the pixel distance between the matched feature point pairs: calculating to obtain the pixel distance between the matched feature point pairs by using the pixel coordinates of the feature points obtained in the step two;

step three, comparing the pixel distance obtained in the step two with a preset pixel threshold phi, if the pixel distance is smaller than the pixel threshold, reserving and storing the matching feature point pair, and if the pixel distance is larger than the pixel threshold, rejecting the matching feature point pair and selecting the next group of matching feature point pairs with the minimum Euclidean distance;

and fifthly, repeating the fifth step, the second step, the fifth step and the fifth step until a matching characteristic point pair with the distance of eight pairs of pixels meeting the threshold condition is obtained.

Further, the specific process of solving the fundamental matrix by using the eight pairs of matching feature point pairs obtained in the step five in the step six includes:

the basic matrix F is a 3 x 3 matrix with a rank of 2 and a degree of freedom of 7;

the notation F is:

wherein, F₁₁、F₁₂、F₁₃、F₂₁、F₂₂、F₂₃、F₃₁、F₃₂、F₃₃Is an element in F;

the pixel coordinates of the ith pair of matching feature point pairs are recorded as:

wherein the content of the first and second substances,

pixel coordinates representing points on the image of the ith pair of matching feature point pairs that match the user,

the pixel coordinates of the points on the user image in the ith pair of matched characteristic point pairs are represented;

respectively representing the abscissa and the ordinate of a point on the image matched with the user in the ith pair of matched feature point pairs;

respectively representing the abscissa and the ordinate of the point on the user image in the ith pair of matched characteristic point pairs; i ∈ {1, …,8 };

substituting the equations (5) and (6) into the epipolar geometric constraint relation equation to obtain:

wherein, superscript T represents transposition;

equation (7) is developed as a homogeneous linear equation as follows:

wherein, w_iAnd f are respectively represented as:

f＝(F₁₁，F₁₂，F₁₃，F₂₁，F₂₂，F₂₃，F₃₁，F₃₂，F₃₃)^T (10)

and (4) substituting the pixel coordinates of the matching characteristic point pairs with the eight pairs of pixel distances meeting the threshold condition obtained in the step five into an expression (8) and expressing the pixel coordinates into the following form:

Wf＝0 (11)

wherein W ═ W₁，w₂,......，w₈)^T；

W is a full rank matrix under noise interference, i.e. the solution of equation (11) has only 0 vector, so the solution of the fundamental matrix needs to be solved into equation:

performing singular value decomposition on the matrix W:

W＝UDV^T (13)

wherein U is formed by WW^TD is WW^TOr W^TA diagonal matrix formed by descending arrangement of square root of W characteristic value, wherein V is formed by W^TThe characteristic vectors corresponding to the descending arrangement of the characteristic values of W are formed; in order to estimate the most accurate basic matrix, the condition that the | | | Wf | | | is closest to 0 needs to be ensured; thus, the last column vector V in the matrix V is selected₉And constructing to obtain a basic matrix F, and further obtaining the basic matrix F.

Further, in step five and three, the pixel threshold value Φ is (image pixel length + image pixel width) × 0.08.

Further, in step three, the first threshold T _ Ed is 0.7.

The most prominent characteristics and remarkable beneficial effects of the invention are as follows:

the invention relates to an indoor visual positioning method for solving a basic matrix based on a pixel threshold value.

The method solves the problem that when the environment changes, the traditional eight-point method selects eight pairs of matching points, which have larger pixel drift and poor robustness, and causes large positioning error. The invention introduces the concept of pixel threshold value to eliminate the matching characteristic point pairs with larger pixel drift, so that the algorithm is suitable for variable indoor environment. The method of the invention is used for indoor visual positioning, and the positioning error is reduced by about 40 percent compared with the prior method.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic structural diagram of a multimedia mobile acquisition platform in an embodiment;

FIG. 3 is a diagram illustrating the results of eight pairs of matching points extracted by the conventional eight-point method in the embodiment;

FIG. 4 is a diagram showing the results of eight pairs of matching points extracted by the method of the present invention in an embodiment;

FIG. 5 is a graph of error versus time for solving a fundamental matrix using a conventional method and the method of the present invention;

FIG. 6 is a CDF (cumulative distribution function) graph of indoor visual localization using the conventional method and the method of the present invention;

the reference numbers illustrate:

1. pulley, 2, bottom plate, 3, pole setting, 4, camera installed part, 5, drawer.

Detailed Description

The first embodiment is as follows: the present embodiment is described with reference to fig. 1, and the method for indoor visual positioning based on solving a fundamental matrix based on pixel threshold provided by the present embodiment specifically includes the following steps:

thirdly, matching the characteristic points of the user image and the images in the database by using an SURF algorithm, and obtaining the Euclidean distance of each matched characteristic point pair; the image with the most matching characteristic points with the user image in the database is the image matched with the user;

step five, starting from the matching characteristic point pair with the minimum Euclidean distance, sequentially calculating the pixel distance of the matching characteristic point pair between the user image and the image matched with the user, judging whether the pixel distance is smaller than a pixel threshold value or not, storing the matching characteristic point pair for later use, and if the pixel distance is larger than the pixel threshold value, removing the matching characteristic point pair until eight pairs of matching characteristic point pairs with the pixel distance meeting a threshold value condition are obtained;

and step six, solving the basic matrix by using the eight pairs of matching characteristic point pairs with small Euclidean distance obtained in the step five, and further carrying out indoor visual positioning.

For the process reference for indoor visual localization using the fundamental matrix, reference may be made to the following documents:

xuehao.visual localization algorithm based on epipolar geometry theory studied [ D ]. Harbin: Harbin Industrial university, 2016.

Other steps and parameters are the same as those in the first embodiment.

The second embodiment is as follows: the difference between the present embodiment and the first embodiment is that the specific process of extracting the feature points in the second step includes:

step two, feature point detection:

the first step of feature point extraction by using SURF algorithm is feature point detection, the image is convoluted by using a box filter, and box filters with different sizes are used by changing the size of the box filterPerforming convolution in the x, y and z directions of the image and constructing a scale space pyramid to form a multi-scale space function D_xx，D_yy，D_xy(ii) a Wherein D_xxRepresenting points on an image with a second-order partial derivative of Gaussian

detH＝D_xx×D_yy-(0.9×D_xy)² (1)

after obtaining the local extreme value, the non-maximum value of the point on the image is restrained in a 3 multiplied by 3 stereo neighborhood, the point which meets the condition is screened as a characteristic point, and the position and the size of the characteristic point are simultaneously saved to finish the detection of the characteristic point;

step two, describing feature points:

Other steps and parameters are the same as those in the first embodiment.

The third concrete implementation mode: the difference between this embodiment and the second embodiment is that the specific process of matching the feature points in step three includes:

feature point matching refers to finding out the most similar feature vector in a high-dimensional vector space; and measuring the similarity of the feature points according to the Euclidean distance between the feature vectors.

Sequentially carrying out the following steps on all feature points in the user image:

respectively calculating Euclidean distances of a feature point in a user image and all feature points of images in a database, selecting a nearest neighbor feature point Euclidean distance Ed _ min1 and a next nearest neighbor feature point Euclidean distance Ed _ min2, calculating a ratio of the nearest neighbor feature point Euclidean distance and the next nearest neighbor feature point Euclidean distance, regarding the feature point of which the ratio is less than or equal to a first threshold value T _ Ed as a correctly matched feature point, and otherwise, regarding the feature point as an incorrectly matched feature point, connecting the correctly matched feature points to form a matched feature point pair;

the feature point matching formula is shown in formula (2):

other steps and parameters are the same as those in the second embodiment.

The fourth concrete implementation mode: the difference between this embodiment and the third embodiment is that the specific process of step five includes:

step two, calculating the pixel distance between the matched feature point pairs: calculating to obtain the pixel distance between the matched feature point pairs by using the feature vectors and the pixel coordinates of the feature points obtained in the step two;

the geographic position, angle, and internal parameters of the camera cause the matching feature point pairs to differ in pixel coordinates on different images. The pixel distance is the distance of the characteristic point to the pixel coordinates on the two images. Usually, we can use the pixel coordinates of eight pairs of matching points to calculate the rotation and translation matrix between different cameras, but when the environment changes, or the feature points on the image are artificially moved, such matching feature points will generate a large error when used to calculate the fundamental matrix, and the fundamental matrix reflects the rotation and translation relationship between cameras, so that the subsequent positioning error increases.

and fifthly, repeating the fifth step, the second step, the fifth step and the fifth step until a matching characteristic point pair with the distance of eight pairs of pixels meeting the threshold condition is obtained. The eight pairs of matching characteristic point pairs are the matching characteristic point pairs with pixel distance distortion removed, and the error of a basic matrix obtained by subsequent solving can be ensured to be small.

Other steps and parameters are the same as those in the third embodiment.

The fifth concrete implementation mode: the difference between this embodiment and the first, second, third, or fourth embodiment is that the specific process of solving the basic matrix by using the eight pairs of matching feature point pairs obtained in the fifth step in the sixth step includes:

the epipolar geometric constraint relation is as follows:

p_d ^TFp_u＝0 (3)

wherein p is_dPixel coordinates, p, representing feature points of an image in a database matching a user_uPixel coordinates representing the user image feature points; the basic matrix F can be obtained by matching characteristic point pairs on the photos under the two camera systems; it is a 3 x 3 matrix with rank of 2 and degree of freedom of 7; therefore, the pixel coordinates of the matching feature points with the minimum euclidean distance of the first eight pairs can be selected to restore the basic matrix.

The notation F is:

wherein the content of the first and second substances,

respectively representing the abscissa and the ordinate of the point on the user image in the ith pair of matched characteristic point pairs; i ∈ {1, …,8 }; the parameter 1 is added to the coordinates for computational convenience.

Substituting the equations (5) and (6) into the epipolar geometric constraint relation (3) yields:

wherein, superscript T represents transposition;

equation (7) is developed as a homogeneous linear equation as follows:

wherein, w_iAnd f are respectively represented as:

f＝(F₁₁,F₁₂,F₁₃,F₂₁,F₂₂,F₂₃,F₃₁,F₃₂,F₃₃)^T (10)

Wf＝0 (11)

wherein W ═ W₁,w₂,......,w₈)^T；

The rank of the basic matrix F is 2, namely the formula (11) has a non-zero solution, and in an ideal case, the rank of W is 8, and the linear equation set can be directly solved for the formula (11); however, in practical situations, image noise is generated when the camera captures signals, and W is a full-rank matrix under noise interference, that is, the solution of equation (11) is only 0 vector, so the solution of the basic matrix needs to be solved into equation:

subject to is a model description for the optimization problem, the objective function representing the optimization problem is constrained to g (-) so that the variables satisfy the condition;

performing singular value decomposition on the matrix W:

W＝UDV^T (13)

wherein U is formed by WW^TD is WW^TOr W^TA diagonal matrix formed by descending arrangement of square root of W characteristic value, wherein V is formed by W^TThe characteristic vectors corresponding to the descending arrangement of the characteristic values of W are formed; to estimate the best accuracyEnsuring the base matrix to ensure that the W is closest to 0; thus, the last column (column 9) vector V in the matrix V is selected. And constructing to obtain a basic matrix F, and further obtaining the basic matrix F.

Other steps and parameters are the same as those in the first, second, third or fourth embodiments.

The sixth specific implementation mode: the present embodiment is different from the fourth embodiment in that the pixel threshold value Φ in step five or three is (image pixel length + image pixel width) × 0.08.

Other steps and parameters are the same as those in the fourth embodiment.

The seventh embodiment: the present embodiment is different from the fifth embodiment in that the pixel threshold value Φ in step five is (image pixel length + image pixel width) × 0.08.

The other steps and parameters are the same as those in the fifth embodiment.

The specific implementation mode is eight: the seventh embodiment is different from the seventh embodiment in that the first threshold T _ Ed in step three is 0.7, and a lot of experiments prove that the optimal choice is when T _ Ed is 0.7.

Other steps and parameters are the same as those in the sixth embodiment.

Examples

The following examples were used to demonstrate the beneficial effects of the present invention:

the indoor visual positioning method for solving the fundamental matrix based on the pixel threshold value is carried out according to the following steps:

1 pushing a multimedia mobile acquisition platform (as shown in fig. 2) carrying a rechargeable battery, a notebook computer and an industrial camera to run an MATLAB program in the notebook computer to realize calling of the industrial camera and acquire images of the environment in a corridor, wherein the pixel of an original image is 1280 x 640, the characteristic points of the image are extracted by using a SURF algorithm, the pixel coordinate of each characteristic point is recorded and stored in a database.

2, cutting the user image input by the user into 1280-640 pixels, extracting the feature points of the user image by using the SURF algorithm, and matching the feature points with the images in the database, wherein the image with the most matched feature points is the image matched with the user.

And 3, solving the Euclidean distance between the feature point pairs in the feature point matching process, and arranging the matched feature point pairs between the user image and the image matched with the user in an ascending order from small to large according to the Euclidean distance.

Starting from a matching feature point pair with a small Euclidean distance, calculating the pixel distance of the matching feature point pair between the user image and the image matched with the user and comparing the pixel distance with a preset pixel threshold value phi, wherein phi is (1280+640) multiplied by 0.08 is 153.6; and if the pixel distance is greater than the pixel threshold, rejecting the characteristic point pair, and if the pixel distance is less than the pixel threshold, storing the matched characteristic point pair for standby until eight pairs of matched characteristic point pairs with the pixel distance meeting the threshold condition are selected. As shown in fig. 3, which is a result graph of eight pairs of matching feature points extracted by using the conventional eight-point method, and as shown in fig. 4, which is a result graph of eight pairs of matching feature points extracted by the method of the present invention, it can be seen that the eight pairs of matching feature points selected by the method of the present invention have the matching feature point pairs with pixel distance distortion removed, so that the error of the basic matrix obtained by the subsequent solution can be ensured to be small.

And 5, solving the basic matrix by using the eight obtained pairs of matched characteristic points, and further carrying out indoor visual positioning. Fig. 5 is a graph showing the error contrast of the fundamental matrix solved by the conventional method (the conventional eight-point method-based indoor visual positioning method for solving the fundamental matrix) and the method of the present invention (the pixel threshold value-based indoor visual positioning method for solving the fundamental matrix), and the CDF (cumulative distribution function) contrast curve of the final positioning accuracy is shown in fig. 6. As can be seen, the method of the invention is used for indoor visual positioning, and the positioning error is reduced by about 40% compared with the existing method.

The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims

1. An indoor visual positioning method for solving a fundamental matrix based on a pixel threshold is characterized by specifically comprising the following steps:

2. The method for indoor visual positioning based on pixel threshold solution basis matrix according to claim 1, wherein the specific process for extracting feature points in step two includes:

step two, feature point detection:

detH＝D_xx×D_yy-(0.9×D_xy)² (1)

step two, describing feature points:

3. The method for indoor visual positioning based on pixel threshold solution basis matrix as claimed in claim 2, wherein the specific process of feature point matching in step three includes:

the feature point matching formula is shown in formula (2):

4. the method for indoor visual positioning based on pixel threshold solving fundamental matrix as claimed in claim 3, wherein the concrete process of the fifth step includes:

5. The method for indoor visual positioning based on pixel threshold solving basic matrix as claimed in claim 1, 2, 3 or 4, wherein the specific process of solving basic matrix using eight pairs of matching feature point pairs obtained in step five in step six comprises:

the notation F is:

wherein the content of the first and second substances,

respectively representing the abscissa and the ordinate of the point on the user image in the ith pair of matched characteristic point pairs; i ∈ {1,. 8 };

wherein, superscript T represents transposition;

equation (7) is developed as a homogeneous linear equation as follows:

wherein, w_iAnd f are respectively represented as:

Wf＝0 (11)

wherein W ═ W₁,w₂,......,w₈)^T；

performing singular value decomposition on the matrix W:

W＝UDV^T (13)

wherein U is formed by WW^TD is WW^TOr W^TDiagonal formed by descending arrangement of square root of W characteristic valueArray, V is arranged by W^TThe characteristic vectors corresponding to the descending arrangement of the characteristic values of W are formed; in order to estimate the most accurate basic matrix, the condition that the | | | Wf | | | is closest to 0 needs to be ensured; thus, the last column vector V in the matrix V is selected₉And constructing to obtain a basic matrix F, and further obtaining the basic matrix F.

6. The method as claimed in claim 4, wherein the pixel threshold value φ in step fifty or thirty is (image pixel length + image pixel width) x 0.08.

7. The method as claimed in claim 5, wherein the pixel threshold value φ in step fifty-three is (image pixel length + image pixel width) x 0.08.

8. The method as claimed in claim 3, wherein the first threshold T _ Ed is 0.7 in step three.