CN113610001A

CN113610001A - Indoor mobile terminal positioning method based on depth camera and IMU combination

Info

Publication number: CN113610001A
Application number: CN202110907214.7A
Authority: CN
Inventors: 张军禹; 仇原鹰; 王宇纬; 李静; 张剑宇; 程培涛; 段学超
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-08-09
Filing date: 2021-08-09
Publication date: 2021-11-05
Anticipated expiration: 2041-08-09
Also published as: CN113610001B

Abstract

The invention discloses an indoor mobile terminal positioning method based on a depth camera and IMU combination, which comprises the following implementation steps: determining the pose of the depth camera under a camera coordinate system when each frame of gray level image is collected in a sliding window through pure visual three-dimensional reconstruction and depth image data; determining an initial pose of the depth camera under a world coordinate system through the gravity vector; calculating state increment through nonlinear optimization of tight coupling, and determining the real-time pose of the depth camera; when a loop is detected, optimizing the pose of the depth camera; and obtaining the real-time pose of the mobile terminal through the relative pose between the mobile terminal and the depth camera. The invention realizes real-time positioning only by a sensor carried by the positioning system under the indoor environment without a prior map, improves the portability of the positioning system and saves the storage space occupied by the prior map.

Description

Indoor mobile terminal positioning method based on depth camera and IMU combination

Technical Field

The invention belongs to the technical field of communication, and further relates to an indoor mobile terminal positioning method based on a depth camera and Inertial Measurement Unit (IMU) combination in the technical field of synchronous positioning and mapping. The invention can be used for positioning and tracking the autonomous mobile terminal in an indoor environment and displaying the pose and the motion trail of the terminal in real time.

Background

Synchronous positioning and Mapping (SLAM) technology refers to a technology in which a mobile terminal acquires real-time pose information only by means of a sensor of the mobile terminal. The method is widely applied to the emerging technology fields of virtual reality, automatic driving, position service and the like which are closely related to human life. Currently, positioning technologies (such as GPS positioning) for outdoor mobile terminals are mature, and the center of gravity of market demands for positioning technologies gradually shifts to indoor environments where GPS cannot be used normally. Statistics show that the number of indoor autonomous mobile terminals is increased explosively in recent years, and therefore the prospect for positioning service in an indoor environment is very wide. However, in view of the development of the current indoor positioning technology, because the indoor environment generally has the problems of narrow moving space, complex surrounding environment and the like, and the sensor carried by the mobile terminal has the limiting factors of price, volume, quality and the like, the requirement of people on indoor positioning cannot be met, and the further development of the indoor positioning technology is influenced to a certain extent.

The patent document "logo-based binocular vision indoor positioning method" (application number: 201610546034.X, application publication number: CN106228538A) filed by the industrial university of harbingi proposes a logo-based binocular vision indoor positioning method for positioning an indoor moving object. The method comprises the following specific processes: firstly, establishing a visual map database for logo image acquisition, and calibrating internal parameters and external parameters of a left camera and a right camera in a binocular camera. And then, shooting a logo image by using a binocular camera, matching Visual information features of the logo image with Visual information features of a Visual Map database image, reserving correct matching points, eliminating mismatching points, and calculating vertex coordinates. And finally, solving the coordinates of the camera under a world coordinate system to realize a positioning function. Although the method can realize positioning, the method still has the disadvantage that a visual map database of the surrounding environment needs to be established before positioning, so that positioning of indoor moving objects cannot be realized in unknown environments.

An indoor positioning method based on WiFi signals is proposed in a WiFi-based indoor positioning method (application number: 202110274155.4, application publication number: CN112689237A) of Sichuan-Bei accuse poly-Hui Internet of things science and technology Limited company in patent documents applied by the Sichuan-Bei accuse-poly-Hui Internet of things, and the specific flow is as follows: firstly, a plurality of reference points are selected indoors, and the WiFi signal intensity of each WiFi signal source is collected at each reference point. Then, noise reduction processing is carried out on each signal source signal, and a signal intensity characteristic value of each reference point position is obtained. Finally, comparing the signal intensity characteristic value of the current position of the mobile receiving equipment with the signal intensity characteristic value of each reference point position, calculating the similarity of the characteristic values, and setting the reference position point with the highest similarity as the position of the mobile receiving equipment. Although the method can realize positioning, the method still has the defects that a plurality of auxiliary devices for providing WiFi signal sources need to be installed indoors, dependence of mobile receiving equipment on external equipment cannot be eliminated, and the method is complex in deployment and high in cost.

Disclosure of Invention

The invention aims to provide an indoor mobile terminal positioning method based on a depth camera and IMU combination aiming at overcoming the defects of the prior art, and aims to solve the problems that positioning cannot be carried out indoors without a prior map and the problems of complex deployment and high cost caused by dependence on external equipment for positioning.

The idea for achieving the object of the invention is that a mobile terminal carrying a depth camera and an IMU is free to move in an unknown environment, determining the poses of all depth cameras in a sliding window under a camera coordinate system through pure vision three-dimensional reconstruction and depth image data, solving the gravity vector of the depth cameras when acquiring the hub frame gray level image by utilizing a linear equation, adjusting the pose of the depth camera to a world coordinate system through the gravity vector, determining the initial pose of the depth camera, determining the real-time pose of the depth camera through calculating the state increment of the depth camera, when a loop is detected, the position of the depth camera corresponding to all gray level images between two frames of images with the loop is optimized, the real-time position of the mobile terminal is obtained by utilizing the relative position between the mobile terminal and the depth camera, the real-time positioning of the mobile terminal is realized, and the problem that the positioning cannot be carried out under the condition without a prior map is solved. According to the invention, data are acquired in real time only through the sensor carried by the mobile terminal and are transmitted into the mobile terminal, and the mobile terminal processes the transmitted data to calculate the real-time pose of the mobile terminal, so that the problems of complex deployment and high cost caused by dependence on external equipment for indoor positioning are solved.

The specific steps for realizing the purpose of the invention are as follows:

step 1, data collected by a sensor is transmitted into a mobile terminal:

(1a) acquiring color images and depth images in real time by using a depth camera arranged on a mobile terminal, wherein the size of each color image and depth image is 640 multiplied by 480 pixels, the acquisition frequency is 30 frames per second, and the color images and the depth images are transmitted to the mobile terminal according to the time sequence;

(1b) acquiring 200 frames of acceleration data every second and 63 frames of angular velocity data every second by using an IMU (inertial measurement Unit) arranged on the mobile terminal, and transmitting the acceleration data and the angular velocity data into the mobile terminal according to a time sequence;

step 2, utilizing cv of OpenCV to perform equalization processing on the image;

and 3, selecting stable characteristic points:

(3a) establishing a two-dimensional pixel coordinate system by taking an upper-left-corner pixel point of each frame of gray level image as a coordinate origin, wherein the pixel coordinate of the origin is (0,0), the horizontal right direction of the gray level image is the positive direction of an x coordinate axis, the vertical downward direction of the gray level image is the positive direction of a y coordinate axis, and the pixel coordinate of the upper-right corner in the gray level image is (639, 0);

(3b) selecting four pixel points with pixel coordinates of an upper left pixel point (20,15), pixel coordinates of an upper right pixel point (620,15), pixel coordinates of a lower left pixel point (620,465) and pixel coordinates of a lower right pixel point (20,465) from each frame of gray level image, and sequentially connecting the four pixel points according to a regional order to obtain a rectangular region of a grid to be divided;

(3c) selecting (20,165) and (20,315) two pixel coordinates on a connecting line between an upper left pixel point and a lower left pixel point, equally dividing the connecting line into 3 parts, selecting (170,15), (320,15) and (470,15) three pixel coordinates on the connecting line between the upper left pixel point and the upper right pixel point, equally dividing the connecting line into 4 parts, taking the selected pixel coordinates as reference points for dividing a rectangular grid, and evenly dividing the whole rectangular area into 3 multiplied by 4 grids with the same size;

(3d) extracting all characteristic points in the first frame gray level image by using cv function;

(3e) determining the position of each feature point in the first frame gray image in the second frame gray image through optical flow tracking, sequencing the times of determining the corresponding positions of each feature point in the first frame gray image in the other image frames from large to small, if the times of determining the corresponding positions of the feature points in the same grid are the same, sequencing the feature points according to the sequence of feature point extraction, taking the first 7 feature points in each grid as stable feature points, and selecting 7 × 12 stable feature points in total;

(3f) determining the position of each feature point in the previous frame of gray level image in the next frame of gray level image by adopting the same method as the step (3e), determining the sequence of the feature points in each grid, if the tracking loss of the feature points occurs, extracting a new feature point from the grid to which the feature point belongs, and complementing the number of the feature points in the grid to 7;

step 4, integral preprocessing is carried out on IMU data:

step 5, determining the relative pose of the depth camera with the scale factor in the sliding window:

(5a) sequentially selecting 11 frames of gray images from the gray images containing the stable characteristic points according to the time sequence, establishing a sliding window, and performing pure visual three-dimensional reconstruction on the 11 frames of images in the sliding window;

(5b) finding out a frame of gray level image from the first 10 frames of gray level images in the sliding window as a pivot frame image, wherein the parallax between the pivot frame image and the 11 th frame of gray level image is larger than 20 pixels, and at least 25 pairs of common viewpoints exist between the two frames of images, wherein the common viewpoints represent characteristic points existing in at least two frames of gray level images, and the parallax represents the average pixel difference of all the common viewpoints in the two frames of gray level images;

(5c) calculating the relative pose of the depth camera when the hinge frame image is collected in the sliding window and when the gray level images of other frames are collected by utilizing cv:: triangle points () and cv:: solvePnP () functions;

and 6, calculating the average value of the scale factors:

(6a) finding out all the common viewpoints in the image of the pivot frame and the gray image of the next frame after the pivot frame in the sliding window to obtain the distance d between the position coordinate of each common viewpoint and the position coordinate of the depth camera when the image of the pivot frame is collected_i；

(6b) Obtaining the depth value D of each common-view point central pixel from the depth image corresponding to the pivot frame image_i；

(6c) The average value s of the depth camera scale factors within the sliding window is calculated as follows:

wherein n represents the total number of the common viewpoints in the pivot frame gray image and the next frame gray image after the pivot frame, Σ represents the summation operation, and i represents the sequence number of the common viewpoints in the pivot frame gray image;

step 7, solving the speed vector and the gravity vector of the depth camera:

calculating a speed vector of the depth camera when two adjacent frames of gray level images are collected in the sliding window and a gravity vector of the depth camera when a hinge frame gray level image is collected by utilizing a linear equation, and adjusting the pose of the depth camera to be under a world coordinate system;

step 8, calculating the state increment of the depth camera:

(8a) calculating variables to be optimized in the visual re-projection residual error and the IMU measurement residual error through a tightly coupled nonlinear optimization formula to obtain the optimal pose of the depth camera when each frame of gray level image is collected in the sliding window;

(8b) obtaining the state increment of the depth camera by utilizing the optimized position and posture of the depth camera in the sliding window, and determining the real-time position and posture of the depth camera;

step 9, optimizing the pose of the depth camera:

(9a) extracting at least 300 image characteristic points from each frame of gray level image collected by a depth camera by using a Fast corner detection algorithm, and storing descriptor information of all the characteristic points in each frame of gray level image in a database;

(9b) calculating a similarity score between the current image frame and each frame of gray level image in the database by using a loop detection algorithm based on DboW2, judging whether a loop exists in the current image frame according to the similarity score, if the score is greater than or equal to 0.6, executing the step (9c), otherwise, executing the step 10;

(9c) carrying out nonlinear optimization on the position and posture of the depth camera corresponding to each frame of image between two frames of gray level images with loop, outputting the position and posture of the depth camera after each frame of optimization, and obtaining the position and posture of the mobile terminal after each frame of optimization by using the relative position and posture between the mobile terminal and the depth camera;

and step 10, outputting the pose of the current depth camera, and obtaining the pose of the current mobile terminal by using the relative pose between the mobile terminal and the depth camera.

Compared with the prior art, the invention has the following advantages:

firstly, the mobile terminal freely moves in an unknown environment, the poses of all depth cameras in a sliding window under a camera coordinate system are obtained through pure vision three-dimensional reconstruction and depth image data, the poses of the depth cameras are adjusted to be under a world coordinate system through gravity vectors, the initial poses of the depth cameras are determined, the state increment of the depth cameras is obtained through close-coupled nonlinear optimization, the real-time poses of the depth cameras are determined, the real-time poses of the mobile terminal are determined through the relative poses of the depth cameras and the mobile terminal, the defect that prior maps are needed for indoor positioning in the prior art is overcome, and the storage space occupied by a vision map database is saved before positioning.

Secondly, the mobile terminal can calculate the real-time pose of the mobile terminal only by processing the data acquired by the depth camera and the IMU carried by the mobile terminal, so that the defects of complex arrangement and high cost of a positioning system caused by the need of installing external equipment indoors in the prior art are overcome, the positioning can be realized under the condition of only using a sensor of the mobile terminal, and the portability of the positioning system is improved.

Drawings

FIG. 1 is a flow chart of the present invention;

fig. 2 is a schematic diagram of extracting stable feature points according to the present invention.

Detailed description of the invention

The invention is further described below with reference to the figures and examples.

The implementation steps of the present invention are further described with reference to fig. 1.

Step 1, data collected by a sensor is transmitted into a mobile terminal.

The method comprises the steps of utilizing a depth camera arranged on a mobile terminal to collect color images and depth images in real time, wherein the size of each color image and depth image is 640 multiplied by 480 pixels, the collection frequency is 30 frames per second, and the color images and the depth images are transmitted to the mobile terminal according to the time sequence.

The method comprises the steps of collecting 200 frames of acceleration data every second and 63 frames of angular velocity data every second by using an IMU (inertial measurement Unit) arranged on the mobile terminal, and transmitting the acceleration data and the angular velocity data into the mobile terminal according to a time sequence.

And 2, equalizing the image by using cv (cv:: createCLAHE () function of OpenCV.

The equalization processing means that each frame of color image format is converted into a gray image format which can be identified by OpenCV, and then each frame of gray image is equalized by using a createCLAHE () function.

And 3, selecting stable characteristic points.

Step 1, establishing a two-dimensional pixel coordinate system by taking an upper left-corner pixel point of each frame of gray level image as a coordinate origin, wherein the pixel coordinate of the origin A is (0,0), the horizontal right direction of the gray level image is the positive direction of an x coordinate axis, the vertical downward direction of the gray level image is the positive direction of a y coordinate axis, the pixel coordinate of an upper right-corner pixel point B in the gray level image is (639,0), and the pixel coordinate of a lower left-corner pixel point C in the gray level image is (0,479).

And 2, selecting four pixel points of which the pixel coordinate of the upper left pixel point D is (20,15), the pixel coordinate of the upper right pixel point E is (620,15), the pixel coordinate of the lower left pixel point F is (620,465) and the pixel coordinate of the lower right pixel point G is (20,465) from each frame of gray scale image, and sequentially connecting the four pixel points according to the regional sequence to obtain a rectangular region of the grid to be divided.

And 3, selecting two pixel points K (20,165) and L (20,315) on a connecting line between the upper left pixel point D and the lower left pixel point F, equally dividing the connecting line into 3 parts, selecting three pixel points H (170,15), I (320,15) and J (470,15) on the connecting line between the upper left pixel point D and the upper right pixel point E, equally dividing the connecting line into 4 parts, taking the selected pixel points as reference points for dividing rectangular grids, and uniformly dividing the whole rectangular area into 3 multiplied by 4 grids with the same size.

And 4, extracting all characteristic points in the gray image of the first frame by using the cv: (goodfeaturedtrack () function).

And 5, determining the position of each feature point in the first frame gray image in the second frame gray image through optical flow tracking, sequencing the times of determining the corresponding positions of each feature point in the first frame gray image in the rest image frames from large to small, sequencing according to the sequence of feature point extraction if the times of determining the corresponding positions of the feature points in the same grid are the same, taking the first 7 feature points in each grid as stable feature points, and selecting 7 multiplied by 12 stable feature points in total.

And 6, determining the position of each feature point in the previous frame of gray level image in the next frame of gray level image by adopting the same method as the step 5, determining the sequence of the feature points in each grid, extracting a new feature point from the grid to which the feature point belongs if the tracking of the feature point is lost, and complementing the number of the feature points in the grid to 7.

In the circular area of the feature point, the central pixel point of other feature points cannot exist, and the radius of the circular area is 10 pixels.

Referring to fig. 2, the selected stable characteristic points of the present invention are further described:

in fig. 2, feature points 1, 2, 3, 4, 5, 6, and 7 are top-ranked 7 feature points in the same grid, and are stable feature points.

As shown in fig. 2, feature points 5 and 6, the center pixel points of the two feature points are not in the circular area of the opposite side, and the two feature points are sorted in the front, so that both feature points are reserved.

The feature points 8 in the grid are sorted to 8 th order, and the feature points 8 are removed after the 8 th order.

Like the feature points 7 and 9 in fig. 2, the central pixel point of the feature point 9 is in the circular area of the feature point 7, and the feature points 9 are sorted backward, so that the feature point 9 is removed and the feature point 7 is retained.

Like the feature point 10 in fig. 2, the feature point center pixel point is not in the grid area, so the feature point 10 is eliminated.

And 4, performing integral preprocessing on the IMU data as follows.

And establishing a three-dimensional world coordinate system by taking the initial position on the mobile terminal as an origin, the forward extending direction of the mobile terminal as the positive direction of an x axis, the leftward extending direction of the mobile terminal as the positive direction of a y axis and the upward extending direction of the mobile terminal as the positive direction of a z axis.

And under a world coordinate system, acquiring the position, the speed and the posture of the IMU of the next frame according to the acceleration and angular velocity data of the current frame and the next frame of the IMU and the position, the speed and the posture of the IMU of the current frame by using a median integration-based method.

And obtaining the position increment, the speed increment and the attitude increment of the IMU between two adjacent frames according to the acceleration and angular velocity data of the current frame and the next frame of the IMU by utilizing an IMU pre-integration formula based on a median method.

And 5, determining the relative pose of the depth camera with the scale factor in the sliding window.

And step 1, sequentially selecting 11 frames of gray images from the gray images containing the stable characteristic points according to the time sequence, establishing a sliding window, and performing pure visual three-dimensional reconstruction on the 11 frames of images in the sliding window.

And 2, finding out one frame of gray level image from the first 10 frames of gray level images in the sliding window as a pivot frame image, wherein the parallax between the pivot frame image and the 11 th frame of gray level image is larger than 20 pixels, and at least 25 pairs of common viewpoints exist between the two frames of images, wherein the common viewpoints represent characteristic points existing in at least two frames of gray level images, and the parallax represents the average pixel difference of all the common viewpoints in the two frames of gray level images.

And 3, establishing a three-dimensional camera coordinate system by taking the position of the depth camera when the hub frame gray level image is collected as an original point, wherein the forward extending direction of the depth camera is the positive direction of an x axis, the leftward extending direction of the depth camera is the positive direction of a y axis, and the upward extending direction of the depth camera is the positive direction of a z axis.

And 4, determining the relative pose of the depth camera when the gray scale images of the pivot frame and the 11 th frame are acquired according to at least 5 pairs of co-viewpoint central pixel coordinates in the gray scale images of the pivot frame and the 11 th frame by using a five-point method under a camera coordinate system, wherein the relative pose consists of a relative rotation matrix and a relative displacement vector with a scale factor.

And 5, triangularizing the hinge frame gray image and the 11 th frame gray image by using a cv (cv) function of OpenCV to obtain the position coordinates of each common viewpoint between the two frames of images.

And 6, sequentially carrying out PnP solution on each frame of image from the L +1 th frame to the 10 th frame of gray image by using cv: (solvapP () function of OpenCV) to obtain the relative pose of the depth camera when the L +1 th frame of gray image is collected and when the L +1 th frame of gray image is collected, wherein the L +1 th frame of gray image is a hinge frame gray image, and the L +1 th frame of gray image is a first frame of gray image after the hinge frame.

And 7, obtaining the relative pose of the depth camera when the L frame and the L-1 frame gray level images are acquired by adopting the same method as the step 6, and obtaining the position coordinates of each common viewpoint between the L frame and the L-1 frame gray level images by adopting the same method as the step 5, wherein the L-1 frame gray level image is the first frame gray level image before the pivot frame.

And 8, obtaining the relative pose of the depth camera when the gray level image of the hinge frame is collected and when each gray level image before the hinge frame is collected by adopting the same method as the step 7.

And 6, calculating the average value of the scale factors.

Finding out all the common viewpoints in the image of the pivot frame and the gray image of the next frame after the pivot frame in the sliding window to obtain the distance d between the position coordinate of each common viewpoint and the position coordinate of the depth camera when the image of the pivot frame is collected_i。

Obtaining the depth value D of each common-view point central pixel from the depth image corresponding to the pivot frame image_i。

The average value s of the depth camera scale factors within the sliding window is calculated as follows:

where n denotes the total number of common viewpoints in the pivot frame gray scale image and the frame gray scale image subsequent to the pivot frame, Σ denotes a summing operation, and i denotes the number of common viewpoints in the pivot frame gray scale image.

And 7, solving the speed vector and the gravity vector of the depth camera.

The rotational and translational external parameters between the depth camera and the IMU are calibrated by a Kalibr toolkit.

And calculating a speed vector of the depth camera when the two adjacent frames of gray level images are acquired in the sliding window and a gravity vector of the depth camera when the hinge frame gray level images are acquired by utilizing a linear equation, and adjusting the position and the attitude of the depth camera to be under a world coordinate system.

The linear equation is as follows:

wherein, I represents a 3 × 3 unit matrix, Δ t represents the time interval for collecting two adjacent frames of gray images,

a rotation matrix, v, representing the relative pose between the depth camera when acquiring the grayscale image of the hub frame and the depth camera when acquiring the grayscale image of the kth frame_kRepresenting the velocity vector, v, of the depth camera at the time of acquisition of the k-th frame of the grayscale image_k+1Representing the velocity vector of the depth camera at the time the (k + 1) th frame grayscale image was acquired,

representing the position increment obtained by pre-integrating all IMU acceleration data in the time interval of collecting two adjacent frames of gray level images, p represents the translational external parameter between the depth camera and the IMU,

a rotation matrix, p, representing the relative pose between the depth camera when acquiring the (k + 1) th frame of the grayscale image and the depth camera when acquiring the (k) th frame of the grayscale image_k+1Representing the position of the depth camera when acquiring the (k + 1) th frame of grayscale images, p_kRepresenting the position of the depth camera when acquiring the k-th frame of the grayscale image, g^LRepresenting the gravity vector of the depth camera when acquiring the hub frame gray scale image,

representing the velocity obtained by pre-integrating all IMU acceleration data in the time interval of collecting two adjacent frames of gray level imagesAnd (4) increasing.

The step of adjusting the pose of the depth camera to a world coordinate system is as follows:

and under a camera coordinate system, rotating the gravity vector of the depth camera to a z axis in a world coordinate system when the hinge frame gray level image is acquired, wherein the direction of the gravity vector is consistent with the negative direction of the z axis, and obtaining a rotation matrix from the camera coordinate system to the world coordinate system.

And adjusting the pose of the depth camera in the sliding window to be under a world coordinate system by the rotating matrix to obtain the initial pose of the depth camera under the world coordinate system.

Step 8, calculating the state increment of the depth camera:

calculating variables to be optimized in the visual re-projection residual error and the IMU measurement residual error through a tightly coupled nonlinear optimization formula to obtain the optimal pose of the depth camera when each frame of gray level image is collected in the sliding window;

obtaining the state increment of the depth camera by utilizing the optimized position and posture of the depth camera in the sliding window, and determining the real-time position and posture of the depth camera;

the close-coupled nonlinear optimization formula is as follows:

wherein min {. is equal to } represents the minimum operation, X represents all variables to be optimized in the sliding window, B represents the set of two norms of all IMU measurement residual error matrixes in the sliding window, | | · |₂Which means that the operation of taking the two norms,

representing a rotation matrix between the pose of the IMU and the coordinate axes of the world coordinate system when acquiring the k-th frame of grayscale images,

respectively representing the position of the IMU in a world coordinate system when the k frame gray level image is collected and the position of the IMU in a world coordinate system when the k +1 frame gray level image is collected,

representing the velocity, g, of the IMU in the world coordinate system at the time of acquiring the k-th frame gray image^wRepresenting the gravity vector in the world coordinate system,

representing a quaternion form of attitude increment obtained by pre-integrating all IMU acceleration data in the time interval of collecting two adjacent frames of gray level images,

a quaternion form representing the attitude of IMU in world coordinate system when collecting the k frame gray image and when collecting the k +1 frame gray image respectively [ ·]_eThe operation of taking the imaginary part of a quaternion is represented,

represents the velocity of the IMU in a world coordinate system when acquiring a (k + 1) th frame gray scale image,

representing the velocity increment obtained by pre-integrating all IMU acceleration data in the time interval of collecting the two adjacent frames of gray level images,

respectively representing the acceleration bias when acquiring the k frame gray level image and the acceleration bias when acquiring the k +1 frame gray level image,

respectively representing the angular velocity bias when acquiring the k frame gray level image and the k +1 frame gray level image, C representing the set of two norms of all visual re-projection residual error matrixes in the sliding window,

a first value and a second value representing the position coordinates of the ith common viewpoint relative to the depth camera when acquiring the gray image of the jth frameA third value, a fourth value,

and the first value and the second value of the pixel coordinate corresponding to the common viewpoint are found in the j frame gray level image by optical flow tracking.

And 9, optimizing the pose of the depth camera.

Step 1, extracting at least 300 image characteristic points from each frame of gray level image collected by a depth camera by using a Fast corner detection algorithm, and storing descriptor information of all the characteristic points in each frame of gray level image in a database.

And 2, calculating a similarity score between the current image frame and each frame of gray level image in the database by using a loop detection algorithm based on DboW2, judging whether a loop exists in the current image frame or not according to the similarity score, if the score is greater than or equal to 0.6, executing the step 3 of the step, otherwise, executing the step 10.

And 3, carrying out nonlinear optimization on the poses of the depth cameras corresponding to all the gray level images between the two frames of images with the loop, outputting all the optimized poses of the depth cameras, and obtaining all the optimized poses of the mobile terminal by using the relative poses between the mobile terminal and the depth cameras.

Claims

1. An indoor mobile terminal positioning method based on a depth camera and IMU combination is characterized in that a mobile terminal respectively collects data by using the depth camera and the IMU, extracts feature points in each gray level image by using a specific rule, and solves a scale factor required by camera pose initialization by using a depth image; the positioning method comprises the following steps:

step 1, data collected by a sensor is transmitted into a mobile terminal:

step 2, utilizing cv of OpenCV to perform equalization processing on the image;

and 3, selecting stable characteristic points:

step 4, integral preprocessing is carried out on IMU data:

and 6, calculating the average value of the scale factors:

step 7, solving the speed vector and the gravity vector of the depth camera:

step 8, calculating the state increment of the depth camera:

step 9, optimizing the pose of the depth camera:

2. The method as claimed in claim 1, wherein the equalization processing in step (2) is to convert each frame of color image format into a gray image format that can be recognized by OpenCV, and then perform equalization processing on each frame of gray image by using createclhe () function.

3. The method of claim 1, wherein the step of performing integral preprocessing on the IMU data in step (4) comprises:

the method comprises the steps that firstly, a three-dimensional world coordinate system is established by taking an initial position on a mobile terminal as an original point, the forward extending direction of the mobile terminal is the positive direction of an x axis, the leftward extending direction of the mobile terminal is the positive direction of a y axis, and the upward extending direction of the mobile terminal is the positive direction of a z axis;

secondly, acquiring the position, the speed and the posture of a next frame of IMU by utilizing a median integration-based method under a world coordinate system according to the acceleration and angular velocity data of a current frame and a next frame of the IMU and the position, the speed and the posture of the current frame of the IMU;

and thirdly, obtaining the position increment, the speed increment and the attitude increment of the IMU between two adjacent frames according to the acceleration and angular velocity data of the current frame and the next frame of the IMU by utilizing an IMU pre-integration formula based on a median method.

4. The method of claim 1, wherein the step (5c) of calculating the relative pose of the depth camera when capturing the hinge frame image and when capturing the gray scale image of the other frames using cv:: trianagelationpoints () and cv:: solvePnP () functions comprises the steps of:

the method comprises the steps that firstly, a three-dimensional camera coordinate system is established by taking the position of a depth camera when a hub frame gray image is collected as an original point, the forward extending direction of the depth camera is the positive direction of an x axis, the leftward extending direction of the depth camera is the positive direction of a y axis, and the upward extending direction of the depth camera is the positive direction of a z axis;

secondly, determining the relative pose of the depth camera when the gray scale images of the pivot frame and the 11 th frame are collected according to at least 5 pairs of co-viewpoint central pixel coordinates in the gray scale images of the pivot frame and the 11 th frame by using a five-point method under a camera coordinate system, wherein the relative pose consists of a relative rotation matrix and a relative displacement vector with a scale factor;

thirdly, triangularization processing is carried out on the hinge frame gray image and the 11 th frame gray image by using cv of OpenCV, and the position coordinate of each common viewpoint between the two frames of images is obtained;

carrying out PnP solution on each frame of image from the L +1 frame to the 10 th frame of gray image in sequence by using cv of OpenCV, and obtaining the relative pose of the depth camera when the L +1 frame of gray image is collected and when the L +1 frame of gray image is collected, wherein the L +1 frame of gray image is a hinge frame gray image, and the L +1 frame of gray image is a first frame of gray image after the hinge frame;

fifthly, obtaining the relative pose of the depth camera when the L-th frame and the L-1 th frame of gray level images are collected by adopting the method the same as the fourth step, and obtaining the position coordinates of each common viewpoint between the L-th frame and the L-1 th frame of gray level images by adopting the method the same as the third step, wherein the L-1 th frame of gray level images are the first frame of gray level images before the hinge frame;

and sixthly, obtaining the relative pose of the depth camera when the gray level image of the hinge frame is collected and when each gray level image before the hinge frame is collected by adopting the method the same as the fifth step.

5. The depth camera and IMU combination based indoor mobile terminal positioning method of claim 1, wherein the linear equation in step (7) is as follows:

and representing the velocity increment obtained by pre-integrating all IMU acceleration data in the time interval of collecting the two adjacent frames of gray level images.

6. The combined depth camera and IMU based indoor mobile terminal positioning method of claim 1, wherein the step of adjusting the pose of the depth camera to the world coordinate system in step (7) is as follows:

firstly, under a camera coordinate system, rotating a gravity vector of a depth camera to a z axis in a world coordinate system when a pivot frame gray image is collected, wherein the direction of the gravity vector is consistent with the negative direction of the z axis, and obtaining a rotation matrix from the camera coordinate system to the world coordinate system;

and secondly, adjusting the pose of the depth camera in the sliding window to be under a world coordinate system by the rotating matrix to obtain the initial pose of the depth camera under the world coordinate system.

7. A depth camera and IMU combined indoor mobile terminal positioning method according to claim 1, characterized in that the close-coupled non-linear optimization formula in step (8a) is as follows:

representing world coordinates of IMU at the time of acquiring the k-th frame gray imageSpeed under tether, g^wRepresenting the gravity vector in the world coordinate system,

a first value, a second value, a third value representing the position coordinates of the ith co-viewpoint relative to the depth camera when acquiring the jth frame of the grayscale image,