CN112465907A

CN112465907A - Indoor visual navigation method and system

Info

Publication number: CN112465907A
Application number: CN202011183247.3A
Authority: CN
Inventors: 杨铮; 张佳麟; 徐京傲
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-10-29
Filing date: 2020-10-29
Publication date: 2021-03-09

Abstract

The embodiment of the invention provides an indoor visual navigation method and a system, wherein the method comprises the following steps: acquiring an optimal monitoring camera; acquiring a conversion ratio according to the relative position and the real coordinate of the optimal monitoring camera and the information point; acquiring a conversion formula between a coordinate system of the mobile camera and a coordinate system of the plane graph based on the conversion proportion, a preset projection matrix and a relative pose; and acquiring the motion track of the target in the plane graph coordinate system according to the conversion formula and the motion track of the target in the moving camera coordinate system. The embodiment of the invention takes the widely deployed monitoring camera as an assistant, combines a plane map of an area where a target is located, fully utilizes position information of information points and the monitoring camera in the environment, adds real distance information and semantic information to a visual navigation system, and combines the information to realize high-precision user positioning and navigation.

Description

Indoor visual navigation method and system

Technical Field

The invention relates to the technical field of indoor positioning, in particular to an indoor visual navigation method and system.

Background

Indoor location services have laid the foundation for intelligent life and space. Over the past decade, researchers have proposed using solutions based on technologies including Wi-Fi, RFID, inertial sensors, and cameras. Among these systems, vision-based navigation has become one of the most effective solutions in practice. By utilizing Visual Odometry (VO), location-on-the-fly, and mapping (SLAM) techniques, the vision-based system is able to fine-grained locate users and build maps of the surrounding environment. Furthermore, with the constant development of AR/MR technology, vision-based solutions have the potential to render user-friendly interactions and instructions from real objects on a user interface.

However, there are two limitations to vision-based navigation systems. First, the construction of indoor maps incurs high self-overhead. In particular, all existing visual SLAM-based solutions involve labor intensive and time consuming field investigations, collecting images (or key frames) at every location in the environment. Furthermore, such laborious field investigations need to be repeated over time due to linear vision (LOS) blockage often caused by dynamic factors in the population or environment. Secondly, the positioning result of the image is independent of the indoor floor plan, and thus lacks semantic information required for navigation, such as a specific destination name. Specifically, the map generated from the image is simply a set of keyframes and map points whose positions are in camera coordinates, not in plan view coordinates. To navigate a user to a destination in a plan view, the plan view coordinates and camera coordinates must be linked. Nowadays, surveillance cameras have been widely deployed in public places such as shopping centers, museums, art museums, and the like. On the one hand, surveillance cameras have the potential to automatically construct maps and update maps in real time, which may reduce or even eliminate the manual effort of field investigations. On the other hand, with a priori information of the camera position, the camera coordinate system and the plan map coordinate system can be associated using the surveillance camera as an anchor point, thereby providing the generated map with the corresponding absolute position in the plan map.

The introduction of surveillance camera aids in visual navigation systems faces three major challenges: the first is the lack of a true scale, and the navigation service must provide the user with absolute position, direction, and distance in the overall plan. However, monocular cameras on commercial smartphones. Although some recent smartphones are equipped with more than two cameras, their properties and parameters are not identical. Monocular camera-based positioning will only acquire relative pixel proportions, not absolute distances in the real world, and thus monocular cameras lacking true proportions cannot meet the requirements for navigation. The second is the difference in the viewing angle of the cameras, and although the mobile camera and the surveillance camera can capture overlapping areas, they obtain viewing angles and contents that differ greatly. The surveillance camera is stationary and monitors the area from a top view perspective, which is different from the view of the moving camera. It is difficult to directly match their visual characteristics using current computer vision techniques. As a result, the conversion between the camera coordinate system and the plane map coordinate system is not trivial due to the difference in the camera view angle. The third is the lack of semantic information, in a general application scenario of navigation systems, the user would input the name of the store or select a location in a map as the destination. From the user's perspective, the destination refers to semantic information in the floor plan.

However, maps constructed by visual SLAM are typically a set of key frames and map points, lacking semantic information that can be applied to navigation without field investigation.

Therefore, a new method and concept for solving these three challenges efficiently without adding extra overhead is needed.

Disclosure of Invention

The embodiment of the invention provides an indoor visual navigation method and system, which are used for solving the defect that extra overhead is required during navigation in the prior art and realizing high-precision user indoor position positioning and navigation without increasing extra hardware overhead.

The embodiment of the invention provides an indoor visual navigation method, which comprises the following steps:

acquiring an optimal monitoring camera, wherein the optimal monitoring camera is the monitoring camera with the highest video frame similarity with the mobile camera;

calculating the relative pose between the mobile camera and the optimal monitoring camera to obtain the relative position between the optimal monitoring camera and the information point, and acquiring the conversion ratio between the coordinate system of the mobile camera and the coordinate system of the plane graph according to the relative position and the real position information;

acquiring a conversion formula between the coordinate system of the mobile camera and the coordinate system of the plane graph based on the conversion proportion, a preset projection matrix and the relative pose, wherein the conversion formula comprises a rotation matrix and a two-dimensional translation vector;

and acquiring the motion track of the target in the plane graph coordinate system according to the conversion formula and the motion track of the target in the mobile camera coordinate system.

According to an embodiment of the indoor visual navigation method, the rotation matrix is obtained by:

detecting information points of the optimal monitoring camera;

and acquiring the rotation matrix according to the rotation proportion and the difference value between a projection vector and a real vector, wherein the projection vector is determined according to the positions of the optimal monitoring camera and the information point in the coordinate system of the mobile camera, and the real vector is determined according to the positions of the optimal monitoring camera and the information point in the three-dimensional coordinate system of the real world.

According to the indoor visual navigation method of one embodiment of the present invention, the rotation matrix is obtained according to the rotation ratio, the difference between the projection vector and the true vector, and the specific calculation formula is as follows:

wherein R is_fRepresenting the rotation matrix, B representing the position of the optimal monitoring camera in the plan view coordinate system, C representing the position of the information point in the plan view coordinate system, r representing the rotation scale,

represents the projection vector, O'_sRepresenting the optimum in the moving camera coordinate systemThe position of the monitoring camera, and P' represents the position of the information point in the coordinate system of the moving camera.

According to an embodiment of the indoor visual navigation method, the two-dimensional translation vector is obtained by the following formula:

t_AB＝rR_fM_pt_cs，

wherein, t_ABRepresenting said two-dimensional translation vector, R representing said transformation ratio, R_fRepresenting said rotation matrix, M_pRepresenting said rotation matrix, t_csRepresenting translation vectors in the relative pose.

According to an embodiment of the indoor visual navigation method, the obtaining of the optimal monitoring camera specifically includes:

acquiring BRISK feature points of a target video in the mobile camera and BRISK feature points of monitoring videos in each monitoring camera;

acquiring word vectors of the target video and the reference video of each monitoring camera through a word bag algorithm based on the BRISK feature points of the target video and the BRISK feature points of the reference video in each monitoring camera;

and selecting the optimal monitoring camera from all the monitoring cameras based on the word vector of the target video and the word vector of the reference video of each monitoring camera.

According to an embodiment of the indoor visual navigation method, the preset projection matrix is

The indoor visual navigation method according to one embodiment of the invention further comprises the following steps:

and acquiring the real-time position of the target in the plane graph coordinate system according to the real-time position of the target in the mobile camera coordinate system and the conversion formula.

An embodiment of the present invention further provides an indoor visual navigation system, including:

the system comprises a selection module, a processing module and a display module, wherein the selection module is used for acquiring an optimal monitoring camera which has the highest video frame similarity with a mobile camera;

the conversion ratio acquisition module is used for calculating the relative pose between the mobile camera and the optimal monitoring camera to obtain the relative position between the optimal monitoring camera and the information point, and acquiring the conversion ratio between the coordinate system of the mobile camera and the coordinate system of the plane image according to the relative position and the real position information;

a conversion formula obtaining module, configured to obtain a conversion formula between the coordinate system of the mobile camera and the coordinate system of the plan view based on the conversion ratio, a preset projection matrix, and the relative pose, where the conversion formula includes a rotation matrix and a two-dimensional translation vector;

and the motion track module is used for acquiring the motion track of the target under a plane graph coordinate system according to the conversion formula and the motion track of the target between the mobile cameras.

An embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of any one of the above-mentioned indoor visual navigation methods when executing the program.

Embodiments of the present invention further provide a non-transitory computer readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the steps of the indoor visual navigation method as described in any one of the above.

According to the indoor visual navigation method and system provided by the embodiment of the invention, the widely deployed monitoring camera is used as an assistant, the position information of information points and the monitoring camera in the environment is fully utilized by combining the plan view of the area where the target is located, the real distance information and the semantic information are added into the visual navigation system, and the high-precision user positioning and navigation are realized by combining the information.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a flowchart of an indoor visual navigation method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an application scenario in an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an indoor visual navigation system according to an embodiment of the present invention;

fig. 4 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As described above, the current visual positioning system mainly faces three challenges of lack of scale information, camera view difference, and lack of semantic information, and to solve these challenges, the embodiment of the present invention provides an indoor visual navigation system, which uses a surveillance camera in an environment as an indoor positioning anchor point, and has a similar status as an outdoor satellite in a GPS navigation system.

Fig. 1 is a flowchart of an indoor visual navigation method according to an embodiment of the present invention, as shown in fig. 1, the method includes:

fig. 2 is a schematic diagram of an application scenario in an embodiment of the present invention, as shown in fig. 2, where a camera coordinate system refers to a mobile camera coordinate system, and in the embodiment of the present invention, the mobile camera is installed at different positions of a target area, the mobile camera is located on a user, the monitoring camera is used to locate an absolute position of the user on a plane view of the target area, provide semantic information about a surrounding environment to the user, provide an instruction to the user, and provide a distance to a next waypoint, where the semantic information refers to a position and a name of a store, a restaurant, and the like in the environment.

S1, acquiring an optimal monitoring camera, wherein the optimal monitoring camera is the monitoring camera with the highest video frame similarity with the mobile camera;

in the process that the target holds the mobile camera to move forward, firstly, the monitoring camera with the highest similarity between the shot video and the video frame shot by the mobile camera is obtained from all the monitoring cameras, and the monitoring camera is used as the optimal monitoring camera.

S2, calculating the relative pose between the mobile camera and the optimal monitoring camera to obtain the relative position between the optimal monitoring camera and the information point, and acquiring the conversion ratio between the mobile camera coordinate system and the plane image coordinate system according to the relative position and the real position information;

and then, calculating the relative position between the optimal monitoring camera and an information point according to the relative pose between the mobile camera and the optimal monitoring camera, wherein the information point is the point of some known road signs in the environment.

And calculating a conversion ratio between a coordinate system of the mobile camera and a coordinate system of the plane graph according to the calculated relative position, the real position of the optimal monitoring camera and the real position of the information point, wherein the coordinate system of the mobile camera is a three-dimensional coordinate system based on the mobile camera, and the coordinate system of the plane graph is a two-dimensional coordinate system established by a plane where the target area is located.

S3, acquiring a conversion formula between the coordinate system of the mobile camera and the coordinate system of the plane map based on the conversion proportion, a preset projection matrix and the relative pose, wherein the conversion formula comprises a rotation matrix and a two-dimensional translation vector;

and calculating a conversion formula between the coordinate system of the mobile camera and the coordinate system of the plane graph according to the conversion proportion, the preset projection matrix and the translation vector in the relative pose, wherein the conversion formula represents the conversion relation between the three-dimensional coordinate system of the target in the mobile camera and the two-dimensional coordinate system of the plane graph, and the conversion formula comprises a rotation matrix and a two-dimensional translation vector, but is not limited to the rotation matrix and the two-dimensional translation vector.

And S4, acquiring the motion track of the target in the plane graph coordinate system according to the conversion formula and the motion track of the target in the moving camera coordinate system.

And obtaining the motion track of the target in the plane graph coordinate system according to the motion track of the target in the moving camera coordinate system and the previously calculated conversion formula.

On the basis of the above embodiment, preferably, the rotation matrix is obtained by:

detecting information points of the optimal monitoring camera;

Specifically, it is first detected that the optimal monitoring camera detects an information point on a video frame. In order to obtain the real position of the user on the plan view, the coordinates of the user must be converted from the coordinates of the mobile camera into the coordinate system of the plan view, as shown in fig. 2, in the embodiment of the present invention, the initial position of the target on the plan view is represented as a, and the known positions of the optimal monitoring camera and the information point (POI for short) are represented as B and C, respectively, and the real vector is determined according to the position of the optimal monitoring camera and the position of the information point.

And in a mobile camera coordinate system, determining a projection vector according to the position of the optimal monitoring camera and the position of the information point.

And then calculating a rotation matrix according to the rotation proportion, the difference value between the projection vector and the real vector.

On the basis of the foregoing embodiment, preferably, the rotation matrix is obtained according to the rotation ratio and the difference between the projection vector and the true vector, and a specific calculation formula is as follows:

represents the projection vector, O'_sRepresents the position of the optimal monitoring camera in the moving camera coordinate system, and P' represents the position of the information point in the moving camera coordinate system.

Specifically, the rotation matrix can be calculated by the above formula, which expresses the meaning of finding an R that minimizes the calculation of the right-hand end_f。

On the basis of the above embodiment, preferably, the two-dimensional translation vector is obtained by the following formula:

t_AB＝rR_fM_pt_cs，

wherein, t_ABRepresenting said two-dimensional translation vector, R representing said transformation ratio, R_fRepresenting said rotation matrix, M_pRepresenting said predetermined projection matrix, t_csRepresenting translation vectors in the relative pose.

Specifically, by the above formula, the true relative position t between the user and the optimal monitoring camera can be calculated_ABThe symbol represents a two-dimensional translation vector.

On the basis of the foregoing embodiment, preferably, the acquiring the optimal monitoring camera specifically includes:

and for the monitoring videos shot by the monitoring cameras, the BRISK feature points of each monitoring video are calculated.

and calculating a word vector for each frame by using a bag-of-words algorithm according to the BRISK feature points of the target video and the BRISK feature points of each reference video.

By comparing the word vectors of the mobile camera and each surveillance camera, the optimal surveillance camera that captures the scene most similar to the mobile camera is selected.

On the basis of the above embodiment, preferably, the preset projection matrix is

In particular, most users are accustomed to having the y-axis of the smartphone perpendicular to the ground, and therefore available

To represent a 3D-2D projection matrix M_pThe projection matrix is the preset projection matrix.

On the basis of the above embodiment, preferably, the real-time position of the target in the plane-view coordinate system is obtained according to the real-time position of the target in the moving camera coordinate system and the conversion formula.

After the conversion formula between the coordinate system of the mobile camera and the coordinate system of the plane image is obtained, the real relative position of the user and the monitoring camera is obtained, and therefore the position of the user in the real coordinate system of the plane image is obtained.

Or, by using a Visual odometer (Visual odometer), estimating the motion track of the user in the camera coordinate system in real time by using frames input by the user camera according to time sequence, and then applying the obtained transformation equation to obtain the user track in the plane graph coordinate system through conversion.

In the navigation function, a connected graph G is established by reading semantic information of a plane graph, wherein V is a set of each landmark point in the plane graph, and E is a real distance set between communicable landmark points.

During navigation, the user's real-time location will be continuously monitored and guided to the next landmark in the planned path until the user reaches the destination. In addition, a repositioning function is introduced, and the navigation success rate in long-distance navigation is obviously improved. During navigation, once the mobile camera successfully identifies the POI, the system automatically performs monitoring camera auxiliary positioning again to correct the current position of the user.

In the mapping section, the visual odometry module will also continuously track and navigate the user, creating hundreds of new 3D map points through triangulation.

And converting the map points from the camera coordinate system to the plane map coordinate system, and determining the semantics according to the functional area to which the projection belongs, wherein the process completes the establishment of the visual semantic map of the navigation area.

Fig. 3 is a schematic structural diagram of an indoor visual navigation system according to an embodiment of the present invention, and as shown in fig. 3, the system includes a selection module 301, a conversion ratio obtaining module 302, a conversion formula obtaining module 303, and a motion trajectory module 304, where:

the selection module 301 is configured to obtain an optimal surveillance camera, where the optimal surveillance camera has the highest similarity with the mobile camera;

the conversion ratio obtaining module 302 is configured to calculate a relative pose between the mobile camera and the optimal monitoring camera, obtain a relative position between the optimal monitoring camera and an information point, and obtain a conversion ratio between a coordinate system of the mobile camera and a coordinate system of a plane map according to the relative position and real position information;

the conversion formula obtaining module 303 is configured to obtain a conversion formula between the coordinate system of the mobile camera and the coordinate system of the plan view based on the conversion ratio, a preset projection matrix, and the relative pose, where the conversion formula includes a rotation matrix and a two-dimensional translation vector;

the motion trail module 304 is configured to obtain a motion trail of the target in a plane graph coordinate system according to the conversion formula and the motion trail of the target between the moving cameras.

The present embodiment is a system embodiment corresponding to the above method embodiment, and please refer to the above method embodiment for details, which is not described herein again.

Fig. 4 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device may include: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. Processor 410 may invoke logic instructions in memory 430 to perform an indoor visual navigation method comprising:

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is capable of executing an indoor visual navigation method provided by the above-mentioned method embodiments, where the method includes:

In yet another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the indoor visual navigation method provided in the foregoing embodiments, and the method includes:

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An indoor visual navigation method, comprising:

2. The indoor visual navigation method of claim 1, wherein the rotation matrix is obtained by:

detecting information points of the optimal monitoring camera;

and acquiring the rotation matrix according to the rotation proportion, the difference value of a projection vector and a real vector, wherein the projection vector is determined according to the optimal monitoring camera in the coordinate system of the mobile camera, the position of the information point and a preset projection matrix, and the real vector is determined according to the optimal monitoring camera in the three-dimensional coordinate system of the real world and the position of the information point.

3. The indoor visual navigation method according to claim 2, wherein the rotation matrix is obtained according to the rotation ratio, the difference between the projection vector and the true vector, and a specific calculation formula is as follows:

representing said projection vector, O_s'denotes a position of the optimal monitoring camera in the moving camera coordinate system, and P' denotes a position of the information point in the moving camera coordinate system.

4. The method for indoor visual navigation according to any one of claims 1 to 3, wherein the two-dimensional translation vector is obtained by the following formula:

t_AB＝rR_fM_pt_cs，

5. The indoor visual navigation method according to claim 4, wherein the obtaining of the optimal monitoring camera specifically includes:

acquiring a word vector of the target video frame and a word vector of each monitoring camera video frame through a word bag algorithm based on the BRISK feature points of the target video and the BRISK feature points of the reference video in each monitoring camera;

6. The indoor visual navigation method of claim 1, wherein the preset projection matrix is

7. The indoor visual navigation method of claim 1, further comprising:

8. An indoor visual navigation system, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method for indoor visual navigation according to any of claims 1 to 7.

10. A non-transitory computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the steps of the indoor visual navigation method of any one of claims 1 to 7.