CN110889349A

CN110889349A - VSLAM-based visual positioning method for sparse three-dimensional point cloud chart

Info

Publication number: CN110889349A
Application number: CN201911127519.5A
Authority: CN
Inventors: 马琳; 姜晗; 谭学治; 王彬
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2020-03-17

Abstract

The invention provides a visual positioning method of a sparse three-dimensional point cloud image based on VSLAM, which obtains a camera track, a transfer matrix of a camera coordinate system and a global coordinate system and the sparse three-dimensional point cloud image under the global coordinate system by changing the output of an open-source ORB-SLAM system, wherein three-dimensional point cloud is the three-dimensional coordinate of a key road mark point, and an initial image database is established based on the obtained information, so that the user positioning is realized based on the established image database. Meanwhile, the image database can be further compressed based on the method, and the capacity of the image database is reduced.

Description

VSLAM-based visual positioning method for sparse three-dimensional point cloud chart

Technical Field

The invention belongs to the technical field of image processing, relates to an indoor Visual positioning method based on Visual based Simultaneous positioning and Mapping technology (VSLAM) and multi-view geometric theory in the field of computer vision, and particularly relates to a Visual positioning method of a sparse three-dimensional point cloud picture based on VSLAM.

Background

With the development of science and technology, people can often come in and go out of large unknown indoor environments, such as superstores, art exhibitions, airports and the like. Therefore, the position location service is increasingly paid attention to. In a more complex indoor environment, the conventional GPS positioning method fails due to signal attenuation. In recent years, various indoor positioning technologies have been proposed. Compared with other indoor positioning methods which need to arrange additional equipment, the vision-based indoor positioning method does not need other facilities, and has certain advantages in cost and precision. Meanwhile, because indoor visual information is rich, the image-based feature extraction technology is mature day by day, and mobile phone devices are rapidly developed, so that more than one camera is integrated on each smart phone, and the like, the indoor positioning technology based on vision gradually becomes a research hotspot.

The indoor vision-based indoor positioning system is mainly divided into an off-line stage and an on-line stage: and in the off-line stage, acquiring an image of the indoor database to be positioned and the position of the shot image, and in the on-line stage, shooting a picture by a user and positioning. In visual positioning, a user firstly shoots an image in an indoor scene through a camera equipped on an intelligent mobile terminal; then, visual feature extraction is carried out through the images shot by the user; finally, the image shot by the user is subjected to feature matching with the database image, and the position of the camera is solved on the basis. The visual positioning has the advantages that positioning facilities do not need to be arranged, an image database is only required to be established in an indoor scene in advance, and images shot by a user are acquired in a positioning stage, so that the position of the user can be estimated. In addition, since the visual positioning algorithm is to solve the camera position in units of pixels, the visual positioning algorithm can theoretically realize high-precision position estimation.

The technical key points of visual positioning are the establishment of an image database, image searching and positioning methods, and the technical basis is the establishment of the image database. Compared with the traditional image database establishing method for acquiring image information point by point and calculating the position information of the image information point by point, the method for establishing the image database by utilizing the VSLAM technology is more efficient. The SLAM technology is originated in the field of robots and aims to solve the problems of positioning and mapping of the robots in position environments, and the VSLAM mainly means that input signals are visual signals. The VSLAM algorithm can be broadly divided into four parts, front-end visual odometer, back-end optimization, loop detection, and mapping: firstly, obtaining image information from a vision sensor, and estimating camera motion between adjacent pictures by a vision odometer; then, whether the camera returns to the position which is reached before is judged through loop detection; then, the contents of the visual odometer and the loop detection are sent to the rear end for optimization; and finally, realizing three-dimensional scene reconstruction according to the estimated camera track and the estimated attitude.

Generally, the location service requires real-time performance, and the biggest factor of visual positioning influencing the positioning speed at the present stage is that an image database needs to be traversed during image matching. Most of the current methods improve the image search algorithm or build the image database in a layered structure, so as to improve the image matching speed. However, there is often a large redundancy between adjacent images in the image database, and multiple repeated matching of the same features of multiple images is required when traversing the image database. When the positioning environment is large and the capacity of the image database is large, the positioning speed is seriously influenced. Based on the method, the established sparse three-dimensional point cloud image is matched with the user input image, so that the repeated matching times are reduced, and the positioning speed is improved. Meanwhile, the image database can be further compressed based on the method, and the capacity of the image database is reduced.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a visual positioning method of a VSLAM-based sparse three-dimensional point cloud chart. According to the invention, the output of an open-source ORB-SLAM system is changed to obtain a camera track, a transfer matrix of a camera coordinate system and a global coordinate system and a sparse three-dimensional point cloud map under the global coordinate system, wherein the three-dimensional point cloud is a three-dimensional coordinate of a key landmark point. And an initial image database is established based on the obtained information.

The invention is realized by the following technical scheme, and provides a visual positioning method of a VSLAM-based sparse three-dimensional point cloud picture, which comprises the following steps:

step one, establishing an image database based on VSLAM;

step two, extracting SURF characteristic points from the user input image;

step three, roughly matching the user input image with the representative image in the image database by using the SURF descriptor to find an image with the highest matching degree;

step four, counting the pixel distribution of SURF matching characteristic points of the matching image obtained in the step three and the user input image in a v axis under a pixel coordinate system, and determining a three-dimensional point cloud range to be searched according to the pixel distribution and IndexLists of the image;

step five, performing fine matching of the user input image and the three-dimensional point cloud in the search range obtained in the step four to obtain a well-matched 2D-3D matching pair;

and step six, calculating the position coordinates of the user by utilizing an EPnP algorithm according to the 2D-3D matching pair obtained in the step five, and completing indoor positioning.

Further, the first step specifically comprises:

one by one, selecting a proper coordinate origin p according to the indoor environment to be positioned_W,0(x₀,y₀,z₀) Establishing a three-dimensional rectangular coordinate system; the three-dimensional rectangular coordinate system is a global coordinate system;

step two, starting from the coordinate origin selected in the step one by one, stably walking in an environment to be positioned by using a platform carrying KinectV2 equipment, and collecting color image information and depth image information to form RGB-D information;

inputting RGB-D information acquired in the first step and the second step, changing the output of an open source system ORB SALM, and acquiring a camera track, a transfer matrix of a camera coordinate system and a global coordinate system and a sparse three-dimensional point cloud picture under the global coordinate system by using the open source system;

step four, for each image, obtaining pixels corresponding to the three-dimensional point cloud through a projection relation by using the transfer matrix and the global three-dimensional point cloud obtained in the step three, and extracting SURF feature descriptors from the pixels; ordering the sparse three-dimensional point cloud according to a certain rule;

and fifthly, extracting partial representative images through an image key frame extraction strategy, forming an image database, establishing IndexLists for the selected representative images, and giving a three-dimensional point cloud feature descriptor to complete the establishment of the image database.

Further, in the first step three, the sparse three-dimensional point cloud chart is a set composed of global three-dimensional coordinates of ORB feature points extracted from each image, and is recorded as

The transfer matrix T is a 4-order square matrix, and T consists of a rotation matrix R and a translation matrix T, and is shown in (1):

where R denotes a rotation matrix from the camera coordinate system of the current frame to the selected global coordinate system.

Further, the sorting rule stores each frame of image I according to the sequence of the time stamps of the collected images_iAnd the corresponding point clouds sort the three-dimensional point clouds belonging to each frame according to the pixel coordinates of the three-dimensional point clouds in the image, and arrange the three-dimensional point clouds according to the sequence that the u axis is from small to large and then the v axis is from small to large.

Further, for each image, the transfer matrix and the global three-dimensional point cloud obtained in the first step and the third step are utilized to obtain pixels corresponding to the three-dimensional point cloud through a projection relation, and the method specifically comprises the following steps:

let the coordinate of a point on the image be p_I＝[u,v]^TThe camera reference matrix is K, and the point cloud under the global coordinate system is p_w＝[x,y,z]^TThen, the satisfied relation is shown as formula (2), the value calculated by formula (2) is taken as the nearest neighbor integer to obtain the characteristic point pixel value corresponding to the point cloud,

further, the first step and the fifth step are specifically as follows:

step one, five or one, extracting SURF characteristic points from all images, wherein the descriptor of the characteristic is s, and each image is expressed as a set of characteristic point pixels and the descriptor thereof

Step one, five or two, calculating any two images I with sequences of a and b_aAnd I_bSimilarity between S (I)_a,I_b) The similarity is described by the matching degree of the feature points between the two images, and the matching degree of the feature points is measured by the Euclidean distance between the corresponding descriptors of the feature points;

suppose the Euclidean distance Ed between any two feature points between two images_ijComprises the following steps:

Ed_ij＝||s_i-s_j||₂s_i∈I_a,s_j∈I_b(5)

in the formula, s_iRepresenting the image sequence as the ith feature point, s, in the a image_jRepresenting that the image sequence is the jth characteristic point in the b image, and if the two characteristic points satisfy the formula (6), determining that the two characteristic points are mutually matched;

in the formula, Ed_min1Representing nearest neighbor feature point Euclidean distance, Ed_min2Representing the Euclidean distance of the next neighbor feature points, wherein epsilon is a correct matching judgment threshold value;

comparing the characteristic points in the two images one by one, and counting two images I_aAnd I_bNumber of matching points between, noted as N_(a,b)(ii) a Image similarity is defined as follows:

step one, five and three, similarity S (I)_a,I_b) Images smaller than a preset threshold value are gathered into one class, and a first image is reserved in each class to serve as a representative image so as to form an image database; assigning a feature descriptor to the three-dimensional point cloud, wherein the descriptor is an image SURF feature descriptor of the three-dimensional point cloud appearing in the image for the first time; and for IndexLists, taking the minimum value and the maximum value of the corresponding three-dimensional point cloud serial number in the class.

Further, the sixth step is specifically:

step six, obtaining n from step five_EThree-dimensional position coordinate set { p) of global spatial points_W,i＝(x,y,z),i＝1,2,...,n_EGet 4 virtual control points p_WV,i，i＝1,2,3,4；

To n_EThe gravity center is obtained from the space points and is used as a virtual control point:

and further obtaining a matrix A:

when A is^TCharacteristic value of A is lambda_iCorresponding feature vector is v_iThen, the other three virtual control points are unit points in three main directions:

in this case, the spatial points in the global coordinate system can be obtained from the solved virtual control points:

in the formula, w_ijIs that the ith spatial point corresponds to a virtual control point p_WV,jWeighted value of (1), ithOwnership weight values of spatial points need to satisfy:

sixthly, solving the coordinate of the virtual control point in the camera coordinate system, namely p_CV,i，i＝1,2,3,4；

After the coordinates of the virtual control points in the camera coordinate system are known, the position coordinates of any one space point in the camera coordinate system can be represented in the form of weighted sum of the virtual control points:

in the formula, w_ijIn accordance with formula (11);

if the homogeneous coordinate of the image point of the space point on the image plane is set as p_I,i＝[u_i,v_i,1]^TThen p is obtained from the camera model_C,iAnd p_I,iThe relationship between:

in the formula, α_sIs a scale coefficient, K is an internal parameter matrix of the camera; if the virtual control point coordinates in the camera coordinate system are represented as p_CV,j＝[x_CV,j,y_CV,j,z_CV,j]^TThen scale factor α_sExpressed as:

further sorting the formula (14) by using the camera parameters to obtain:

in the formula (f)_cIs the focal length of the camera (u)₀,v₀) At the intersection of the optical axis and the imaging planeCoordinates;

equation (14) is substituted for equation (15), and the relationship between the spatial point coordinates and the corresponding image point coordinates is expressed by a linear equation:

if it is provided with

Then equation (16) is organized as a linear equation:

M_Ez_P＝0 (17)

in the formula, matrix M_EIs formed by arranging the coefficients in the formula (16), and the matrix is 2n_EX 12 dimension; the result is the position coordinates p of the virtual control point in the camera coordinate system_CV,j；

Sixthly, resolving a final rotation matrix R and a final translation matrix t according to results obtained in the step six I and the step six II;

calculating a matrix B:

calculating matrix H ═ B^TA, and carrying out SVD on the H matrix to obtain H ═ UDV^TThen the rotation matrix and the translation matrix are respectively:

wherein if R < 0, then R (2:) — -R (2:).

The invention reduces the capacity of the image database as much as possible and improves the positioning speed on the premise of acceptable positioning accuracy. Therefore, the indoor visual positioning method based on the VSLAM is proposed, and focuses on two aspects of image database establishment and a positioning method based on the established image database. And in the off-line stage, a KinectV2 device is used for collecting color images and depth images, a SLAM technology is used for obtaining a camera track, a transfer matrix of a camera coordinate system and a global coordinate system and a sparse three-dimensional point cloud image in the global coordinate system, and an image database is established on the basis of the transfer matrix. And in an online stage, a two-step matching strategy is provided, a well-matched 2D-3D matching pair is found, and the relative pose is obtained by utilizing EPnP (effective Point n Point), so that the user positioning is completed.

Drawings

FIG. 1 is a block diagram of a visual positioning method of a VSLAM-based sparse three-dimensional point cloud graph according to the present invention;

FIG. 2 is a schematic diagram of selecting a coordinate origin on an indoor map and establishing a coordinate system;

FIG. 3 is a schematic diagram of a pixel coordinate system corresponding to each image;

FIG. 4 is a schematic diagram of an image database format;

fig. 5 is a schematic diagram of the EPnP algorithm.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In practical application, the indoor environment is a dynamic changing environment, and multiple factors such as non-rigid state change, light change in the morning and evening, and the like exist. In order to improve the indoor positioning accuracy, image information needs to be acquired at the reference point for multiple times so as to exhaust possible shooting scenes of a user. In order to reduce the workload of establishing the database, the invention provides an improved image database establishing method based on the VSLAM technology. Meanwhile, in consideration of redundant operation existing in image matching, and therefore the positioning speed is reduced, the invention provides a positioning method based on an improved image database, and a two-step matching method is adopted to obtain a 2D-3D matching pair of a user input image characteristic point and a global three-dimensional point cloud point, so that the absolute position of a user is solved.

With reference to fig. 1, the present invention provides a visual positioning method for a VSLAM-based sparse three-dimensional point cloud map, the method includes the following steps:

step one, establishing an image database based on VSLAM;

the first step is specifically as follows:

one by one, selecting a proper coordinate origin p according to the indoor environment to be positioned_W,0(x₀,y₀,z₀) Establishing a three-dimensional rectangular coordinate system; as shown in fig. 2, the three-dimensional rectangular coordinate system is a global coordinate system; one point in the coordinate system is denoted as p_w,i。

in the third step, the sparse three-dimensional point cloud chart is a set consisting of global three-dimensional coordinates of ORB feature points extracted from each image, and is recorded as

Step four, for each image, obtaining pixels corresponding to the three-dimensional point cloud through a projection relation by using the transfer matrix and the global three-dimensional point cloud obtained in the step three, and extracting SURF feature descriptors from the pixels; ordering the sparse three-dimensional point cloud according to a certain rule; storing by adopting a sequential structure;

the sequencing rule stores each frame of image I according to the sequence of the time stamps of the acquired images_iThe corresponding point clouds rank the three-dimensional point clouds belonging to each frame according to the pixel coordinates of the point clouds in the image, and are arranged from small to large according to the u-axis and then arranged from small to large according to the v-axis, as shown in fig. 3.

For each image, obtaining pixels corresponding to the three-dimensional point cloud through a projection relation by using the transfer matrix and the global three-dimensional point cloud obtained in the step one and the step three, and specifically:

step five, extracting partial representative images through an image key frame extraction strategy, forming an image database, establishing IndexLists for the selected representative images, and giving a three-dimensional point cloud feature descriptor to complete the establishment of the image database, as shown in FIG. 4.

Step two, extracting SURF characteristic points from the user input image;

step four, counting the pixel distribution P (a is less than or equal to x is less than or equal to b) of the V-axis of the SURF matching characteristic points of the matching image obtained in the step three and the user input image under the pixel coordinate system,

wherein N is the total number of matched characteristic points, and the pixel coordinate p of each matched characteristic point_I＝[u,v]^T

Determining a three-dimensional point cloud range to be searched according to pixel distribution and IndexLists of the image; suppose an image I_i-1、I_i、I_i+1Have IndexLists values of [ index1, index2 [ ]]、[index3,index4]、[index5,index6]Resolution of the image u_max×v_max. When P (x ≦ v)_max0.5, the search range is [ index1, index4 ]](ii) a When P (x ≦ v)_max[ index3, index6 ] when/2) < 0.5]。

The first step and the fifth step are specifically as follows:

Ed_ij＝||s_i-s_j||₂s_i∈I_a,s_j∈I_b(5)

With reference to fig. 5, the sixth step specifically is:

step six, obtaining n from step five_E(n_EThree-dimensional position coordinate set { p) of more than or equal to 4) global space points_W,i＝(x,y,z),i＝1,2,...,n_EGet 4 virtual control points p_WV,i，i＝1,2,3,4；

and further obtaining a matrix A:

in the formula, w_ijIs that the ith spatial point corresponds to a virtual control point p_WV,jThe ownership weight value of the ith space point needs to satisfy:

in the formula, w_ijIn accordance with formula (11);

further sorting the formula (14) by using the camera parameters to obtain:

in the formula (f)_cIs the focal length of the camera (u)₀,v₀) Is the coordinate of the intersection point of the optical axis and the imaging plane;

if it is provided with

Then equation (16) is organized as a linear equation:

M_Ez_P＝0 (17)

calculating a matrix B:

wherein if R < 0, then R (2:) — -R (2:).

The meaning of each parameter in the present invention is shown in table 1:

TABLE 1 meanings of the parameters

The VSLAM-based visual positioning method for sparse three-dimensional point cloud images is introduced in detail, and a specific example is applied to explain the principle and the implementation of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A visual positioning method of a sparse three-dimensional point cloud chart based on VSLAM is characterized by comprising the following steps: the method comprises the following steps:

step one, establishing an image database based on VSLAM;

step two, extracting SURF characteristic points from the user input image;

2. The method of claim 1, wherein: the first step is specifically as follows:

3. The method of claim 2, wherein: in the third step, the sparse three-dimensional point cloud chart is a set consisting of global three-dimensional coordinates of ORB feature points extracted from each image, and is recorded as

4. The method of claim 2, wherein: the sequencing rule stores each frame of image I according to the sequence of the time stamps of the acquired images_iAnd the corresponding point clouds sort the three-dimensional point clouds belonging to each frame according to the pixel coordinates of the three-dimensional point clouds in the image, and arrange the three-dimensional point clouds according to the sequence that the u axis is from small to large and then the v axis is from small to large.

5. The method of claim 2, wherein: for each image, obtaining pixels corresponding to the three-dimensional point cloud through a projection relation by using the transfer matrix and the global three-dimensional point cloud obtained in the step one and the step three, and specifically:

let the coordinate of a point on the image be p_I＝[u,v]^TThe camera reference matrix is K, and the point cloud under the global coordinate system is p_w＝[x,y,z]^TThen satisfyThe relation of (2) is shown in the formula, the nearest neighbor integer is taken from the value calculated by the formula (2) to obtain the characteristic point pixel value corresponding to the point cloud,

6. the method of claim 2, wherein: the first step and the fifth step are specifically as follows:

Ed_ij＝||s_i-s_j||₂s_i∈I_a,s_j∈I_b(5)

in two imagesComparing the characteristic points one by one, and counting two images I_aAnd I_bNumber of matching points between, noted as N_(a,b)(ii) a Image similarity is defined as follows: