CN113610001B

CN113610001B - Indoor mobile terminal positioning method based on combination of depth camera and IMU

Info

Publication number: CN113610001B
Application number: CN202110907214.7A
Authority: CN
Inventors: 张军禹; 仇原鹰; 王宇纬; 李静; 张剑宇; 程培涛; 段学超
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-08-09
Filing date: 2021-08-09
Publication date: 2024-02-09
Anticipated expiration: 2041-08-09
Also published as: CN113610001A

Abstract

The invention discloses an indoor mobile terminal positioning method based on a combination of a depth camera and an IMU, which comprises the following implementation steps: determining the pose of a depth camera under a camera coordinate system when each frame of gray level image is acquired in a sliding window through pure vision three-dimensional reconstruction and depth image data; determining the initial pose of the depth camera under the world coordinate system through the gravity vector; calculating state increment through nonlinear optimization of tight coupling, and determining real-time pose of the depth camera; when detecting loop, optimizing the pose of the depth camera; and obtaining the real-time pose of the mobile terminal through the relative pose between the mobile terminal and the depth camera. According to the invention, under the indoor environment without the prior map, the real-time positioning is realized only by virtue of the sensor carried by the indoor environment, so that the portability of the positioning system is improved, and the storage space occupied by the prior map is saved.

Description

Indoor mobile terminal positioning method based on combination of depth camera and IMU

Technical Field

The invention belongs to the technical field of communication, and further relates to an indoor mobile terminal positioning method based on the combination of a depth camera and an inertial measurement unit IMU (Inertial Measurement Unit) in the technical field of synchronous positioning and mapping. The invention can be used for positioning and tracking the autonomous mobile terminal in indoor environment and displaying the pose and the motion trail of the terminal in real time.

Background

Synchronous positioning and mapping (Simultaneous Localization and Mapping, SLAM) refers to a technique in which a mobile terminal obtains real-time pose information only by means of its own sensor. It is widely applied to the emerging scientific and technological fields of virtual reality, automatic driving, location service and the like which are closely related to human life. Currently, positioning technologies (such as GPS positioning) for outdoor mobile terminals are very mature, and the center of gravity of the market demand of the positioning technologies is gradually shifted to an indoor environment where GPS cannot be normally used. Statistics shows that the number of indoor autonomous mobile terminals has shown explosive growth in recent years, so the method has very broad prospect for positioning service in indoor environment. However, in terms of the development status of the indoor positioning technology at present, because the indoor space is usually narrow, the surrounding environment is complex, and the like, and the sensor carried by the mobile terminal has restriction factors such as price, volume, quality, and the like, the requirements of people on indoor positioning cannot be met, and the further development of the indoor positioning technology is affected to a certain extent.

A logo-based binocular vision indoor positioning method is proposed by the university of Harbin industry in the patent literature of logo-based binocular vision indoor positioning method (application number: 2016610546034. X, application publication number: CN 106228538A) applied by the university of Harbin, and is used for positioning an indoor moving target. The method comprises the following specific processes: firstly, a visual map database for logo image acquisition is established, and internal parameters and external parameters of left and right cameras in the binocular camera are calibrated. Then, shooting logo images by using a binocular camera, matching the Visual information features of the logo images with the Visual information features of the Visual Map database images, reserving correct matching points, removing mismatching points, and calculating vertex coordinates. And finally, obtaining the coordinates of the camera under the world coordinate system, and realizing the positioning function. Although the method can realize positioning, the method still has the defect that a visual map database of surrounding environment needs to be established before positioning, so that the positioning of an indoor moving target cannot be realized in an unknown environment.

An indoor positioning method based on WiFi signals is provided in a patent document filed by Sichuan North control and Hui Internet of things science and technology limited company (application number: 202110274155.4, application publication number: CN 112689237A), and the specific flow is as follows: firstly, a plurality of reference points are selected indoors, and the WiFi signal intensity of each WiFi signal source is acquired at each reference point position. And then, carrying out noise reduction processing on each signal source signal to obtain a signal intensity characteristic value of each reference point position. And finally, comparing the signal intensity characteristic value of the current position of the mobile receiving device with the signal intensity characteristic value of each reference point position, calculating the similarity of the characteristic values, and setting the reference position point with the highest similarity as the position of the mobile receiving device. Although the positioning method can realize positioning, the positioning method still has the defects that a plurality of auxiliary devices for providing WiFi signal sources are required to be installed indoors, dependence of mobile receiving devices on external devices cannot be eliminated, deployment is complex, and cost is high.

Disclosure of Invention

The invention aims to solve the problems that the indoor mobile terminal cannot be positioned under the condition of no priori map, and the positioning is complicated and high in cost due to dependence on external equipment.

The method comprises the steps that a mobile terminal carrying a depth camera and an IMU freely moves in an unknown environment, the pose of all the depth cameras in a sliding window under a camera coordinate system is determined through pure vision three-dimensional reconstruction and depth image data, the gravity vector of the depth camera when the gray level images of a hinge frame are acquired is solved by utilizing a linear equation, the pose of the depth camera is adjusted to be under a world coordinate system by utilizing the gravity vector, the initial pose of the depth camera is determined, the real-time pose of the depth camera is determined by calculating the state increment of the depth camera, when a loop is detected, the pose of the depth camera corresponding to all gray level images between two frames of images of the loop is optimized, the real-time pose of the mobile terminal is obtained by utilizing the relative pose between the mobile terminal and the depth camera, the real-time positioning of the mobile terminal is realized, and the problem that the mobile terminal cannot be positioned under the condition without a priori map is solved. According to the invention, the data is collected in real time only through the sensor carried by the mobile terminal and is transmitted into the mobile terminal, the mobile terminal processes the transmitted data, the real-time pose of the mobile terminal is calculated, and the problems of complex deployment and high cost caused by dependence on external equipment in indoor positioning are solved.

The specific steps for achieving the purpose of the invention are as follows:

step 1, transmitting data acquired by a sensor into a mobile terminal:

(1a) Acquiring color images and depth images in real time by using a depth camera arranged on the mobile terminal, wherein the size of each frame of color images and depth images is 640 multiplied by 480 pixels, the acquisition frequency is 30 frames per second, and the color images and the depth images are transmitted into the mobile terminal in time sequence;

(1b) Acquiring 200 frames of acceleration data per second by using an IMU (inertial measurement unit) arranged on the mobile terminal, acquiring 63 frames of angular velocity data per second, and transmitting the acceleration data and the angular velocity data into the mobile terminal according to time sequence;

step 2, performing equalization processing on the image by using a cv CreateCLAHE () function of OpenCV;

step 3, selecting stable characteristic points:

(3a) Establishing a two-dimensional pixel coordinate system by taking an upper left corner pixel point of each frame of gray image as a coordinate origin, wherein the pixel coordinate of the origin is (0, 0), the horizontal right direction of the gray image is the positive direction of an x coordinate axis, the vertical downward direction of the gray image is the positive direction of a y coordinate axis, and the pixel coordinate of the upper right corner in the gray image is (639,0);

(3b) Selecting four pixels with pixel coordinates (20, 15) of the upper left pixel, pixel coordinates (620,15) of the upper right pixel, pixel coordinates (620,465) of the lower left pixel and pixel coordinates (20,465) of the lower right pixel from each frame of gray level image, and sequentially connecting the four pixels according to the region sequence to obtain a rectangular region to be grid-divided;

(3c) Selecting (20,165) and (20,315) two pixel coordinates on a connecting line between an upper left pixel point and a lower left pixel point, equally dividing the connecting line into 3 parts, selecting (170,15), 320,15) and (470,15) three pixel coordinates on the connecting line between the upper left pixel point and the upper right pixel point, equally dividing the connecting line into 4 parts, taking the selected pixel coordinates as reference points for dividing rectangular grids, and uniformly dividing the whole rectangular region into 3X 4 grids with the same size;

(3d) Extracting all characteristic points in the first frame gray level image by using cv;

(3e) Determining the position of each characteristic point in the first frame gray image in the second frame gray image through optical flow tracking, sequencing the times of determining the corresponding position of each characteristic point in the first frame gray image in the rest image frames from large to small, sequencing according to the sequence of characteristic point extraction if the times of determining the corresponding position of the characteristic points in the same grid are the same, taking the first 7 characteristic points in each grid as stable characteristic points, and selecting 7 multiplied by 12 stable characteristic points in total;

(3f) Determining the position of each characteristic point in the gray level image of the previous frame in the gray level image of the next frame by adopting the same method as the step (3 e), determining the sequence of the characteristic points in each grid, and if the condition that the tracking of the characteristic points is lost occurs, extracting new characteristic points from the grid to which the characteristic points belong, and complementing the number of the characteristic points in the grid to 7;

step 4, carrying out integral pretreatment on IMU data:

step 5, determining the relative pose of the depth camera with scale factors in the sliding window:

(5a) Sequentially selecting 11 frames of gray images from the gray images containing the stable characteristic points according to time sequence, establishing a sliding window, and performing pure vision three-dimensional reconstruction on the 11 frames of images in the sliding window;

(5b) Finding out a frame of gray level image from the first 10 frames of gray level images in the sliding window to serve as a pivot frame image, wherein the parallax between the pivot frame image and the 11 th frame of gray level image is larger than 20 pixels, and at least 25 pairs of common view points exist between the two frames of images, wherein the common view points represent characteristic points existing in at least two frames of gray level images, and the parallax represents average pixel differences of all common view points in the two frames of gray level images;

(5c) Calculating the relative pose of the depth camera when the pivot frame image is acquired and when other frame gray images are acquired in the sliding window by using a cv:triangulatepoints () function and a cv:sovePnP () function;

step 6, calculating the average value of the scale factors:

(6a) In the sliding window, finding out all common view points in the pivot frame image and the gray level image of the frame after the pivot frame to obtain the distance d between the position coordinate of each common view point and the position coordinate of the depth camera when the pivot frame image is acquired _i ；

(6b) Obtaining a depth value D of each common view point center pixel from a depth image corresponding to the pivot frame image _i ；

(6c) The average value s of the depth camera scale factors within the sliding window is calculated as follows:

wherein n represents the total number of common view points in the gray level image of the pivot frame and the gray level image of the frame after the pivot frame, Σ represents the summation operation, and i represents the serial number of the common view point in the gray level image of the pivot frame;

step 7, solving a speed vector and a gravity vector of the depth camera:

calculating a speed vector of the depth camera when two adjacent frames of gray level images are collected in the sliding window and a gravity vector of the depth camera when the hinge frame of gray level images are collected by using a linear equation, and adjusting the pose of the depth camera to be under a world coordinate system;

step 8, calculating state increment of the depth camera:

(8a) Calculating variables to be optimized in the visual re-projection residual error and the IMU measurement residual error through a tightly coupled nonlinear optimization formula, and obtaining the optimal pose of the depth camera when each frame of gray level image is acquired in the sliding window;

(8b) Obtaining state increment of the depth camera by utilizing the optimized pose of the depth camera in the sliding window, and determining real-time pose of the depth camera;

step 9, optimizing the pose of the depth camera:

(9a) Extracting at least 300 image feature points from each frame of gray level image acquired by a depth camera by using a Fast corner detection algorithm, and storing descriptor information of all feature points in each frame of gray level image in a database;

(9b) Calculating the similarity score of the current image frame and each frame gray level image in the database by using a DboW 2-based loop detection algorithm, judging whether the loop exists in the current frame image or not according to the similarity score, if the score is more than or equal to 0.6, executing the step (9 c), otherwise, executing the step 10;

(9c) Performing nonlinear optimization on the pose of the depth camera corresponding to each frame of image between two frames of gray level images with loops, outputting the pose of the depth camera after each frame of optimization, and obtaining the pose of the mobile terminal after each frame of optimization by using the relative pose between the mobile terminal and the depth camera;

and 10, outputting the pose of the current depth camera, and obtaining the pose of the current mobile terminal by using the relative pose between the mobile terminal and the depth camera.

Compared with the prior art, the invention has the following advantages:

firstly, the mobile terminal freely moves in an unknown environment, the pose of all depth cameras in a sliding window under a camera coordinate system is obtained through pure vision three-dimensional reconstruction and depth image data, the pose of the depth cameras is adjusted to the world coordinate system through a gravity vector, the initial pose of the depth cameras is determined, the state increment of the depth cameras is obtained through tight coupling nonlinear optimization, the real-time pose of the depth cameras is determined, and the real-time pose of the mobile terminal is determined through the relative pose of the depth cameras and the mobile terminal, so that the defect that a priori map is needed for indoor positioning in the prior art is overcome, and the storage space occupied by a vision map database is saved before positioning.

Secondly, the mobile terminal can calculate the real-time pose of the mobile terminal only by processing the data acquired by the depth camera and the IMU carried by the mobile terminal, so that the defects of complex deployment and high cost of a positioning system caused by the need of installing external equipment indoors in the prior art are overcome.

Drawings

FIG. 1 is a flow chart of the present invention;

fig. 2 is a schematic diagram of the extraction of stable feature points according to the present invention.

Detailed description of the preferred embodiments

The invention is further described below with reference to the drawings and examples.

The steps of implementing the invention are further described with reference to fig. 1.

And step 1, transmitting data acquired by the sensor into the mobile terminal.

And acquiring color images and depth images in real time by using a depth camera arranged on the mobile terminal, wherein the size of each frame of color images and depth images is 640 multiplied by 480 pixels, the acquisition frequency is 30 frames per second, and the color images and the depth images are transmitted into the mobile terminal in time sequence.

And acquiring 200 frames of acceleration data per second by using an IMU arranged on the mobile terminal, acquiring 63 frames of angular velocity data per second, and transmitting the acceleration data and the angular velocity data into the mobile terminal in time sequence.

And step 2, performing equalization processing on the image by using a cv:createCLAHE () function of OpenCV.

The equalization processing means that each frame of color image format is converted into a gray image format which can be recognized by OpenCV, and then the equalization processing is performed on each frame of gray image by using createCLAHE () function.

And 3, selecting stable characteristic points.

Step 1, a two-dimensional pixel coordinate system is established by taking the upper left corner pixel point of each frame of gray level image as the origin of coordinates, the pixel coordinate of the origin A is (0, 0), the horizontal right direction of the gray level image is the positive direction of the x coordinate axis, the vertical downward direction of the gray level image is the positive direction of the y coordinate axis, the pixel coordinate of the upper right corner pixel point B in the gray level image is (639,0), and the pixel coordinate of the lower left corner pixel point C in the gray level image is (0,479).

And 2, selecting four pixel points with the pixel coordinates of (20, 15) of the upper left pixel point D, the pixel coordinates of (620,15) of the upper right pixel point E, the pixel coordinates of (620,465) of the lower left pixel point F and the pixel coordinates of (20,465) of the lower right pixel point G from each frame of gray level image, and sequentially connecting the four pixel points according to the region sequence to obtain a rectangular region to be grid-divided.

And 3, selecting two pixel points K (20,165) and L (20,315) on a connecting line between the upper left pixel point D and the lower left pixel point F, equally dividing the connecting line into 3 parts, selecting three pixel points H (170,15), I (320,15) and J (470,15) on the connecting line between the upper left pixel point D and the upper right pixel point E, equally dividing the connecting line into 4 parts, taking the selected pixel points as reference points for dividing rectangular grids, and uniformly dividing the whole rectangular region into 3X 4 grids with the same size.

And 4, extracting all characteristic points in the gray image of the first frame by using a cv:. Goodfeaturestrack () function.

And 5, determining the position of each characteristic point in the first frame gray image in the second frame gray image through optical flow tracking, sorting the times of determining the corresponding position of each characteristic point in the first frame gray image in the rest image frames from large to small, sorting the first 7 characteristic points in each grid as stable characteristic points according to the sequence of characteristic point extraction if the times of determining the corresponding position of the characteristic points in the same grid are the same, and selecting 7 multiplied by 12 stable characteristic points.

And step 6, determining the position of each characteristic point in the gray level image of the previous frame in the gray level image of the next frame by adopting the same method as the step 5, determining the sequence of the characteristic points in each grid, and if the condition that the tracking of the characteristic points is lost occurs, extracting new characteristic points from the grid to which the characteristic points belong, and complementing the number of the characteristic points in the grid to 7.

In the circular region of the feature point, there is no center pixel of the other feature point, and the radius of the circular region is 10 pixels.

The selection of stable feature points according to the present invention will be further described with reference to fig. 2:

in fig. 2, feature point 1, feature point 2, feature point 3, feature point 4, feature point 5, feature point 6, and feature point 7 are the top 7 feature points in the same grid, and are stable feature points.

As with feature point 5 and feature point 6 in fig. 2, the center pixel point of both feature points is not in the circular area of each other, and the order of both feature points is forward, so both remain.

As shown in fig. 2, the feature points 8 are ranked in the grid as 8 th bit, and the feature points 8 are removed since the ranking is later.

As for the feature points 7 and 9 in fig. 2, the center pixel point of the feature point 9 is in the circular area of the feature point 7, and the order of the feature points 9 is back, so the feature points 9 are removed and the feature points 7 are reserved.

As with the feature point 10 in fig. 2, the feature point center pixel point is not within the grid area, so the feature point 10 is eliminated.

And 4, performing integral preprocessing on the IMU data.

And taking the initial position on the mobile terminal as an original point, taking the forward extending direction of the mobile terminal as an x-axis positive direction, taking the leftward extending direction of the mobile terminal as a y-axis positive direction, and taking the upward extending direction of the mobile terminal as a z-axis positive direction to establish a three-dimensional world coordinate system.

And under the world coordinate system, acquiring the position, the speed and the gesture of the IMU of the next frame according to the acceleration and angular velocity data of the current frame and the next frame of the IMU and the position, the speed and the gesture of the IMU of the current frame by using a median integration method.

And obtaining the position increment, the speed increment and the attitude increment of the IMU between two adjacent frames according to the acceleration and angular velocity data of the current frame and the next frame of the IMU by using an IMU pre-integration formula based on a median method.

And 5, determining the relative pose of the depth camera with the scale factor in the sliding window.

Step 1, sequentially selecting 11 frames of gray images from the gray images containing stable characteristic points according to time sequence, establishing a sliding window, and performing pure vision three-dimensional reconstruction on the 11 frames of images in the sliding window.

And 2, finding out a frame of gray level image from the first 10 frames of gray level images in the sliding window to serve as a pivot frame image, wherein the parallax between the pivot frame image and the 11 th frame of gray level image is larger than 20 pixels, and at least 25 pairs of common view points exist between the two frames of images, wherein the common view points represent characteristic points existing in at least two frames of gray level images, and the parallax represents average pixel differences of all common view points in the two frames of gray level images.

And 3, taking the position of the depth camera when the gray level image of the pivot frame is acquired as an origin, taking the forward extending direction of the depth camera as the positive x-axis direction, taking the leftward extending direction of the depth camera as the positive y-axis direction, and establishing a three-dimensional camera coordinate system taking the upward extending direction of the depth camera as the positive z-axis direction.

And 4, determining the relative pose of the depth camera when the gray level images of the hinge frame and the 11 th frame are acquired according to the coordinate of at least 5 pairs of common viewpoint center pixels in the gray level images of the hinge frame and the 11 th frame by using a five-point method in a camera coordinate system, wherein the relative pose consists of a relative rotation matrix and a relative displacement vector with a scale factor.

And step 5, performing triangulation processing on the gray level image of the hub frame and the gray level image of the 11 th frame by using a cv:triangulatepoints () function of OpenCV to obtain the position coordinate of each common view point between the two frames of images.

And 6, sequentially carrying out PnP solving on each frame of images from the L+1st frame to the 10 th frame of gray level images by using a cv:soivePnP () function of OpenCV to obtain the relative pose of the depth camera when the L frame of gray level images are acquired and when the L+1st frame of gray level images are acquired, wherein the L frame of gray level images are hub frame gray level images, and the L+1st frame of gray level images are first frame of gray level images after the hub frame.

And 7, obtaining the relative pose of the depth camera when acquiring the L-th frame and the L-1-th frame gray level images by adopting a method which is the same as that of the step 6, and obtaining the position coordinates of each common view point between the L-th frame and the L-1-th frame gray level images by adopting a method which is the same as that of the step 5, wherein the L-1-th frame gray level image is the first frame gray level image before the pivot frame.

And 8, obtaining the relative pose of the depth camera when the gray level image of the hub frame is acquired and when each gray level image of the frame before the hub frame is acquired by adopting the same method as the step 7.

And 6, calculating the average value of the scale factors.

In the sliding window, finding out all common view points in the pivot frame image and the gray level image of the frame after the pivot frame to obtain the distance d between the position coordinate of each common view point and the position coordinate of the depth camera when the pivot frame image is acquired _i 。

Obtaining a depth value D of each common view point center pixel from a depth image corresponding to the pivot frame image _i 。

The average value s of the depth camera scale factors within the sliding window is calculated as follows:

wherein n represents the total number of common view points in the gray level image of the pivot frame and the gray level image of the frame subsequent to the pivot frame, Σ represents the summation operation, and i represents the sequence number of the common view point in the gray level image of the pivot frame.

And 7, solving a speed vector and a gravity vector of the depth camera.

The rotation and translation parameters between the depth camera and the IMU are calibrated by a Kalibr toolbox.

And calculating a speed vector of the depth camera when two adjacent frames of gray level images are acquired in the sliding window and a gravity vector of the depth camera when the hinge frame of gray level images are acquired by using a linear equation, and adjusting the pose of the depth camera to be under a world coordinate system.

The linear equation is as follows:

wherein I represents a 3×3 unit momentAn array, deltat, represents the time interval between acquisition of two adjacent frames of gray scale images,rotation matrix, v, representing relative pose between depth camera when acquiring pivot frame gray level image and depth camera when acquiring kth frame gray level image _k Velocity vector v representing depth camera at the time of acquisition of k-th frame gray level image _k+1 Speed vector representing depth camera when acquiring k+1th frame gray level image, +.>Representing the position increment obtained by pre-integrating all IMU acceleration data in the time interval of acquiring two adjacent frames of gray level images, wherein p represents the translation external parameter between the depth camera and the IMU, < >>Rotation matrix, p, representing relative pose between depth camera when collecting k+1th frame gray scale image and depth camera when collecting k frame gray scale image _k+1 Representing the position of the depth camera when acquiring the k+1th frame gray level image, p _k Represents the position of the depth camera g when the k-th frame gray level image is acquired ^L Gravity vector representing depth camera when acquiring gray level image of pivot frame,/->And representing the speed increment obtained by pre-integrating all the IMU acceleration data in the time interval of collecting two adjacent frames of gray level images.

The step of adjusting the pose of the depth camera to the world coordinate system is as follows:

and under the camera coordinate system, rotating the gravity vector of the depth camera to the z-axis in the world coordinate system when acquiring the gray level image of the pivot frame, wherein the direction of the gravity vector is consistent with the negative direction of the z-axis, and obtaining a rotation matrix from the camera coordinate system to the world coordinate system.

And adjusting the pose of the depth camera in the sliding window to the world coordinate system by the rotation matrix to obtain the initial pose of the depth camera in the world coordinate system.

Step 8, calculating state increment of the depth camera:

calculating variables to be optimized in the visual re-projection residual error and the IMU measurement residual error through a tightly coupled nonlinear optimization formula, and obtaining the optimal pose of the depth camera when each frame of gray level image is acquired in the sliding window;

obtaining state increment of the depth camera by utilizing the optimized pose of the depth camera in the sliding window, and determining real-time pose of the depth camera;

the nonlinear optimization formula of the tight coupling is as follows:

wherein, min {.cndot }' represents the operation of taking the minimum value, X represents all variables to be optimized in the sliding window, B represents the set of the two norms of all IMU measurement residual matrices in the sliding window, |·|| ₂ An operation of taking the two norms is shown,representing a rotation matrix between the pose of the IMU and the coordinate axis of the world coordinate system when the kth frame gray level image is acquired, +.>Respectively representing the position of the IMU under the world coordinate system when the kth frame gray level image is acquired and the position of the IMU under the world coordinate system when the kth+1frame gray level image is acquired,/for the image>Represents the speed of IMU in world coordinate system when the kth frame gray level image is acquired, g ^w Represents the gravity vector in world coordinate system, +.>Representing the quaternion form of attitude increment obtained by pre-integrating all IMU acceleration data in the time interval of collecting two adjacent frames of gray level images, < >>Respectively representing quaternion forms of attitudes of IMU under world coordinate system when the kth frame gray level image is acquired and when the (k+1) th frame gray level image is acquired, [ ·] _e Representing the operation of taking the imaginary part of the quaternion ++>Representing the IMU speed in world coordinate system when acquiring the k+1th frame gray level image,/>Representing the speed increment obtained by pre-integrating all IMU acceleration data in the time interval of collecting two adjacent frames of gray level images,/for>Respectively representing acceleration bias,/and/or->Respectively represent the angular velocity bias when the kth frame gray level image is acquired and the kth+1 frame gray level image is acquired, C represents the set of two norms of all the vision re-projection residual matrixes in the sliding window,first, second and third values representing the position coordinates of the first common view point relative to the depth camera when the jth frame of gray level image was acquired, < +.>The first value and the second value of the pixel coordinate corresponding to the first common view point are found in the j-th frame gray level image by optical flow tracking.

And 9, optimizing the pose of the depth camera.

Step 1, extracting at least 300 image feature points from each frame of gray level image acquired by a depth camera by using a Fast corner detection algorithm, and storing descriptor information of all feature points in each frame of gray level image in a database.

Step 2, calculating the similarity score of the current image frame and each frame of gray level image in the database by using a DboW 2-based loop detection algorithm, judging whether the current frame of image has loops or not according to the similarity score, if the score is greater than or equal to 0.6, executing the step 3 of the step, otherwise, executing the step 10.

And 3, carrying out nonlinear optimization on the positions of the depth cameras corresponding to all gray level images between two frames of images with loops, outputting all optimized positions of the depth cameras, and obtaining all optimized positions of the mobile terminal by utilizing the relative positions of the mobile terminal and the depth cameras.

Claims

1. The indoor mobile terminal positioning method based on the combination of the depth camera and the IMU is characterized in that the mobile terminal respectively collects data by using the depth camera and the IMU, extracts characteristic points in each gray level image by using a specific rule, and solves scale factors required by camera pose initialization by using the depth image; the positioning method comprises the following steps:

step 1, transmitting data acquired by a sensor into a mobile terminal:

step 3, selecting stable characteristic points:

step 4, carrying out integral pretreatment on IMU data:

step 6, calculating the average value of the scale factors:

step 7, solving a speed vector and a gravity vector of the depth camera:

step 8, calculating state increment of the depth camera:

step 9, optimizing the pose of the depth camera:

2. The indoor mobile terminal positioning method based on the combination of the depth camera and the IMU according to claim 1, wherein the equalization processing in the step (2) means that each frame of color image format is converted into a gray image format that can be recognized by OpenCV, and then each frame of gray image is subjected to equalization processing by using a create clear () function.

3. The method for positioning an indoor mobile terminal based on a combination of a depth camera and an IMU according to claim 1, wherein the step of performing integral preprocessing on IMU data in step (4) is as follows:

the method comprises the steps that a three-dimensional world coordinate system is built by taking an initial position on a mobile terminal as an origin, taking a forward extending direction of the mobile terminal as an x-axis positive direction, taking a leftward extending direction of the mobile terminal as a y-axis positive direction, and taking an upward extending direction of the mobile terminal as a z-axis positive direction;

secondly, acquiring the position, the speed and the gesture of the IMU of the next frame according to the acceleration and the angular speed data of the current frame and the next frame of the IMU and the position, the speed and the gesture of the IMU of the current frame by using a median integration method under a world coordinate system;

and thirdly, obtaining the position increment, the speed increment and the attitude increment of the IMU between two adjacent frames according to the acceleration and angular velocity data of the current frame and the next frame of the IMU by utilizing an IMU pre-integration formula based on a median method.

4. The method for positioning an indoor mobile terminal based on a combination of a depth camera and an IMU according to claim 1, wherein the step of calculating the relative pose of the depth camera when acquiring the images of the hinge frame and when acquiring the gray images of other frames in the sliding window by using cv:triangulatepoints () and cv:soivnP () functions in the step (5 c) is as follows:

the method comprises the steps that firstly, a three-dimensional camera coordinate system is established by taking the position of a depth camera as an origin when a gray level image of a hub frame is acquired, the forward extending direction of the depth camera is the positive x-axis direction, the leftward extending direction of the depth camera is the positive y-axis direction, and the upward extending direction of the depth camera is the positive z-axis direction;

secondly, determining the relative pose of the depth camera when the gray level images of the hinge frame and the 11 th frame are acquired according to the coordinate of at least 5 pairs of common viewpoint center pixels in the gray level images of the hinge frame and the 11 th frame by using a five-point method under a camera coordinate system, wherein the relative pose consists of a relative rotation matrix and a relative displacement vector with a scale factor;

thirdly, triangulating the gray level image of the hub frame and the gray level image of the 11 th frame by using a cv:triangulating Points () function of OpenCV to obtain the position coordinate of each common viewpoint between the two frames of images;

step four, using a cv of OpenCV, namely, a software PnP () function to sequentially carry out PnP solving on each frame of images from an L+1st frame to a 10 th frame gray level image to obtain the relative pose of a depth camera when the L frame gray level image is acquired and when the L+1st frame to the 10 th frame gray level image is acquired, wherein the L frame gray level image is a hub frame gray level image, and the L+1st frame gray level image is a first frame gray level image after the hub frame;

fifthly, adopting the same method as the fourth step to obtain the relative pose of the depth camera when the L-th frame and the L-1-th frame gray level images are acquired, and adopting the same method as the third step to obtain the position coordinates of each common view point between the L-th frame and the L-1-th frame gray level images, wherein the L-1-th frame gray level image is the first frame gray level image before the pivot frame;

and sixthly, obtaining the relative pose of the depth camera when the gray level image of the hub frame is acquired and when each gray level image of the frame before the hub frame is acquired by adopting the same method as the fifth step.

5. The method for locating an indoor mobile terminal based on a combination of a depth camera and an IMU according to claim 1, wherein the linear equation in step (7) is as follows:

wherein I represents a 3×3 identity matrix, Δt represents a time interval for acquiring two adjacent frames of gray images,rotation matrix, v, representing relative pose between depth camera when acquiring pivot frame gray level image and depth camera when acquiring kth frame gray level image _k Velocity vector v representing depth camera at the time of acquisition of k-th frame gray level image _k+1 Speed vector representing depth camera when acquiring k+1th frame gray level image, +.>Representing the position increment obtained by pre-integrating all IMU acceleration data in the time interval of acquiring two adjacent frames of gray level images, wherein p represents the translation external parameter between the depth camera and the IMU, < >>Rotation matrix, p, representing relative pose between depth camera when collecting k+1th frame gray scale image and depth camera when collecting k frame gray scale image _k+1 Representing the position of the depth camera when acquiring the k+1th frame gray level image, p _k Represents the position of the depth camera g when the k-th frame gray level image is acquired ^L Gravity vector representing depth camera when acquiring gray level image of pivot frame,/->And representing the speed increment obtained by pre-integrating all the IMU acceleration data in the time interval of collecting two adjacent frames of gray level images.

6. The method for positioning an indoor mobile terminal based on a combination of a depth camera and an IMU according to claim 1, wherein the step of adjusting the pose of the depth camera to the world coordinate system in step (7) is as follows:

the method comprises the steps that firstly, under a camera coordinate system, a gravity vector of a depth camera is rotated to a z-axis in a world coordinate system when a gray level image of a pivot frame is acquired, and the direction of the gravity vector is consistent with the negative direction of the z-axis, so that a rotation matrix from the camera coordinate system to the world coordinate system is obtained;

and secondly, adjusting the pose of the depth camera in the sliding window to the world coordinate system by the rotation matrix to obtain the initial pose of the depth camera in the world coordinate system.

7. The method for locating an indoor mobile terminal based on a combination of a depth camera and an IMU according to claim 1, wherein the tightly coupled nonlinear optimization formula in step (8 a) is as follows: