CN114119737A

CN114119737A - Visual positioning method for indoor navigation and related equipment

Info

Publication number: CN114119737A
Application number: CN202110002614.3A
Authority: CN
Inventors: 车广富; 郭景昊; 安山
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2021-01-04
Filing date: 2021-01-04
Publication date: 2022-03-01

Abstract

The embodiment of the disclosure provides a visual positioning method and device for indoor navigation, a computer readable storage medium and electronic equipment, and belongs to the technical field of computers and communication. The method comprises the following steps: acquiring an indoor query image; acquiring feature points of the query image; obtaining descriptors of the feature points of the query image; querying an indoor map according to the description of the query image to obtain a candidate key image set in the indoor map of the query image; comparing each feature point of the query image with the feature points in each key image in the candidate key image set, and screening the feature points of the query image according to the optimal and suboptimal proportion to obtain the screened feature points of the query image; and determining the position and the posture when the query image is shot according to the screened feature points of the query image. The method disclosed by the invention can realize indoor visual positioning.

Description

Visual positioning method for indoor navigation and related equipment

Technical Field

The present disclosure relates to the field of computer and communication technologies, and in particular, to a visual positioning method and apparatus for indoor navigation, a computer-readable storage medium, and an electronic device.

Background

With the progress of Virtual Reality (VR) and Augmented Reality (AR) technologies, the interactive experience effects of space in markets such as AR navigation, AR landscape, AR red envelope and AR games can be achieved through the application of the AR technology to the smart phone camera. When people come to a scene unfamiliar with the environment, particularly indoor places with inaccurate Global Positioning (GPS) signals, the specific direction in the building is difficult to know. In order to realize seamless fusion in real and virtual worlds, the posture and orientation of a mobile device (a smart phone) need to be accurately set.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The embodiment of the disclosure provides a visual positioning method and device for indoor navigation, a computer readable storage medium and an electronic device, which can realize indoor visual positioning.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the present disclosure, there is provided a visual positioning method for indoor navigation, including:

acquiring an indoor query image;

acquiring feature points of the query image;

obtaining descriptors of the feature points of the query image;

querying an indoor map according to the description of the query image to obtain a candidate key image set in the indoor map of the query image;

comparing each feature point of the query image with the feature points in each key image in the candidate key image set, and screening the feature points of the query image according to the optimal and suboptimal proportion to obtain the screened feature points of the query image;

determining the position and the posture when the query image is shot according to the screened feature points of the query image;

the indoor map comprises key image information for constructing the indoor map, feature point information in the key image for constructing the indoor map and three-dimensional map point information for constructing the indoor map.

In one embodiment, querying an indoor map according to a description of the query image to obtain a set of candidate key images in the indoor map of the query image comprises:

and acquiring all key images with the same characteristic points as the query image in the indoor map.

and acquiring key images with the same characteristic points as the query image in a specific proportion from all key images of the indoor map with the same characteristic points as the query image from large to small.

and acquiring a certain number of key image groups adjacent to each key image in the key images with the specific proportion from large to small in number of the common-view feature points in the key images of the indoor map.

and acquiring a key image group with a specific proportion from large to small according to the sum of the number of the same feature points of each key image in the key image group of each key image in the key images with the specific proportion and the query image, and taking all key images in the key image group with the specific proportion as the candidate key image set.

In one embodiment, the step of screening the feature points of the query image at an optimal and suboptimal ratio to obtain the screened feature points of the query image comprises:

when one feature point of the query image is compared with a feature point in one key image in the candidate key image set, if the similarity ratio of the one feature point of the query image to two feature points in the one key image in the candidate key image set is greater than or equal to a specific ratio, the one feature point of the query image is screened out.

only one feature point of the query image that is filtered out in the comparison of the feature point of the query image with the feature points in each key image of the set of candidate key images is eventually filtered out.

According to an aspect of the present disclosure, there is provided a visual positioning device for indoor navigation, comprising:

the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is configured to acquire an indoor query image, acquire feature points of the query image and acquire descriptors of the feature points of the query image;

a query module configured to query an indoor map according to the description of the query image to obtain a set of candidate key images in the indoor map of the query image;

the comparison screening module is configured to compare each feature point of the query image with a feature point in each key image in the candidate key image set, and screen the feature points of the query image according to an optimal proportion and a suboptimal proportion to obtain the screened feature points of the query image;

the positioning module is used for determining the position and the posture when the query image is shot according to the screened feature points of the query image;

According to an aspect of the present disclosure, there is provided an electronic device including:

one or more processors;

a storage device configured to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the method of any of the above embodiments.

According to an aspect of the present disclosure, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any one of the above embodiments.

In the technical scheme provided by some embodiments of the present disclosure, a low-cost visual positioning technology is adopted to improve the positioning accuracy of a user.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The following figures depict certain illustrative embodiments of the invention in which like reference numerals refer to like elements. These described embodiments are to be considered as exemplary embodiments of the disclosure and not limiting in any way.

Fig. 1 shows a schematic diagram of an exemplary system architecture of a visual positioning method for indoor navigation or a visual positioning apparatus for indoor navigation to which embodiments of the present disclosure may be applied;

FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device implementing embodiments of the present disclosure;

FIG. 3 schematically illustrates a schematic diagram of formation of three-dimensional points upon indoor map construction in visual positioning for indoor navigation according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a visual positioning method of indoor navigation according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a schematic diagram of pose estimation of a room in visual localization for indoor navigation according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a block diagram of a visual positioning apparatus for indoor navigation according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of a visual positioning apparatus for indoor navigation according to another embodiment of the present invention;

fig. 8 schematically shows a block diagram of a visual positioning apparatus for indoor navigation according to another embodiment of the present invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 shows a schematic diagram of an exemplary system architecture 100 of a visual positioning method for indoor navigation or a visual positioning apparatus for indoor navigation to which embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is a medium to provide communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.

A staff member or a client may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may be various electronic devices having display screens including, but not limited to, smart phones, tablets, portable and desktop computers, digital cinema projectors, and the like.

The server 105 may be a server that provides various services. For example, the staff member sends a visual positioning request for indoor navigation to the server 105 by using the terminal device 103 (which may also be the terminal device 101 or 102). The server 105 may obtain query images indoors; acquiring feature points of the query image; obtaining descriptors of the feature points of the query image; querying an indoor map according to the description of the query image to obtain a candidate key image set in the indoor map of the query image; comparing each feature point of the query image with the feature points in each key image in the candidate key image set, and screening the feature points of the query image according to the optimal and suboptimal proportion to obtain the screened feature points of the query image; determining the position and the posture when the query image is shot according to the screened feature points of the query image; the indoor map comprises key image information for constructing the indoor map, feature point information in the key image for constructing the indoor map and three-dimensional map point information for constructing the indoor map. The server 105 may transmit the position and the posture at the time of photographing the query image to the terminal device 103 to display the position and the posture at the time of photographing the query image on the terminal device 103, and the worker may perform indoor positioning or navigation based on the content displayed on the terminal device 103.

Also for example, the terminal device 103 (also may be the terminal device 101 or 102) may be a smart tv, a VR (Virtual Reality)/AR (Augmented Reality) helmet display, or a mobile terminal such as a smart phone, a tablet computer, etc. on which a navigation, a car appointment, an instant messaging, a video Application (APP) and the like are installed, and a worker may send a visual positioning request for indoor navigation to the server 105 through the smart tv, the VR/AR helmet display or the navigation, the car appointment, the instant messaging, the video APP. Server 105 can obtain based on this indoor navigation's visual positioning request and shoot position and gesture during the inquiry image to will shoot position and gesture during the inquiry image return for this smart television, VR/AR helmet display or this navigation, net car of appointment, instant messaging, video APP, and then will shoot through this smart television, VR/AR helmet display or this navigation, net car of appointment, instant messaging, video APP position and gesture during the inquiry image are shown.

FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.

It should be noted that the computer system 200 of the electronic device shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.

As shown in fig. 2, the computer system 200 includes a Central Processing Unit (CPU)201 that can perform various appropriate actions and processes in accordance with a program stored in a Read-Only Memory (ROM) 202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In the RAM 203, various programs and data necessary for system operation are also stored. The CPU 201, ROM 202, and RAM 203 are connected to each other via a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.

The following components are connected to the I/O interface 205: an input portion 206 including a keyboard, a mouse, and the like; an output section 207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 208 including a hard disk and the like; and a communication section 209 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 209 performs communication processing via a network such as the internet. A drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 210 as necessary, so that a computer program read out therefrom is installed into the storage section 208 as necessary.

In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication section 209 and/or installed from the removable medium 211. The computer program, when executed by a Central Processing Unit (CPU)201, performs various functions defined in the methods and/or apparatus of the present application.

It should be noted that the computer readable storage medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) or flash Memory), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF (Radio Frequency), etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods, apparatus, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules and/or units and/or sub-units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described modules and/or units and/or sub-units may also be disposed in a processor. Wherein the names of such modules and/or units and/or sub-units in some cases do not constitute a limitation on the modules and/or units and/or sub-units themselves.

As another aspect, the present application also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiment; or may exist separately without being assembled into the electronic device. The computer-readable storage medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the embodiments below. For example, the electronic device may implement the steps of fig. 4.

In the related art, for example, a machine learning method, a deep learning method, or the like may be used to perform visual positioning of indoor navigation, and the application range of different methods is different.

Fig. 3 schematically shows a schematic diagram of formation of three-dimensional points at the time of indoor map construction in visual positioning of indoor navigation according to an embodiment of the present disclosure.

Indoor map construction

The three-dimensional reconstruction solves the problem of indoor map construction. Three-dimensional reconstruction generally requires the use of a Structure from Motion (SfM), which is defined as "in an unknown environment, a robot can construct the environment and estimate its own Motion". In popular terms, the three-dimensional structure of the entire scene is restored using images or videos captured by a camera. The input of the system is a plurality of images or video streams, and the output is the three-dimensional structure of the scene and the shooting pose of each image. The camera pose is 6degrees of freedom, 3 degrees of freedom represent position, and 3 degrees of freedom represent pose (camera orientation). Position is understood to be a point in three-dimensional space, and the pose is the orientation of the camera.

The technical framework of SfM is a total of four main steps: firstly, data acquisition; extracting characteristics; thirdly, associating data; fourthly, the structure is recovered. When the three-dimensional structure of the scene is recovered, all pixel points in the image are not used, and only stable and remarkable points, namely feature points, are extracted from the image; the data association is to determine which feature points in two images with a common view area are corresponding; the points in these images are subsequently restored to three-dimensional points in space by means of geometric methods and optimization techniques (Bundle Adjustment). The whole problem is finally converted into a large nonlinear optimization problem to be solved, and the optimization target is that the difference between the position of the reconstructed three-dimensional point re-projected in the image and the position of an observation point in the image is minimum, namely, the re-projection error is minimized.

Aiming at the indoor positioning scene, the monocular vision mapping scheme adopted by the application is as follows: and shooting and acquiring abundant image samples by using a camera in an indoor scene, and then completing the reconstruction of a three-dimensional structure by using an SfM technology. In the SfM calculation process of the image, the collected image sample is divided into a few key frames (key images) and a plurality of non-key frames according to the change of the visual angle and the moving distance. These key frames have obvious observation parallax and few factors, so that the maps constructed by the Feature points contained in them are small and fine, and there are many Feature and descriptor algorithms, more than mature ones, such as SIFT (Scale-invariant Feature transform), SURF (Speeded Up Robust Feature), ORB (organized FAST and Robust BRIEF, algorithm for FAST Feature point extraction and description), etc. As shown in FIG. 3, the three-dimensional map points generated in the SfM calculation process form huge and complex feature matching and association between key frames. Thus, each key frame has several "neighbor" key frames, with a different number of co-viewpoints between them.

The three-dimensional reconstruction algorithm has large calculation amount and slow calculation speed and can be completed in an off-line process. When the map construction is completed, the map is saved as a single file, and the map elements comprise:

three-dimensional map point coordinates and ID (identity) identification;

secondly, feature point coordinates, descriptors, ID identifications, corresponding three-dimensional map point ID identifications and located key frame ID identifications;

and thirdly, the positions, postures and ID identifications of the key frames, the ID sets of the contained feature points, and the neighbor key frames and the ID identifications which are sequenced according to the weight (the number of the common viewpoints).

The key image information of the indoor map may be (iii), the feature point information in the key image of the indoor map may be (iv), and the three-dimensional map point information of the indoor map may be (i).

In equation (1):

C₁～C_Nrepresenting N camera poses;

X₁～X_Nrepresenting N three-dimensional spatial points;

π(X_i,C_j) For projective transformation formula, spatial point X_iAt camera C_jA middle projection coordinate;

x_ijrepresenting a spatial point X_iAt camera C_jActual two-dimensional observation coordinates;

the expression of | e | is to solve the two norm of the vector, namely the Euclidean distance;

the Euclidean distance power of 2 serves as an optimization target, and the solution of mathematical aspects such as derivative is convenient.

Fig. 4 schematically shows a flow chart of a visual positioning method for indoor navigation according to an embodiment of the present disclosure. The method steps of the embodiment of the present disclosure may be executed by the terminal device, the server, or the terminal device and the server interactively, for example, the server 105 in fig. 1 described above, but the present disclosure is not limited thereto.

In step S410, a query image in a room is acquired.

In this step, the terminal device or the server acquires an inquiry image in a room. In practical application, the method can be realized by using a mobile terminal or other equipment to shoot an environment image by a user.

In the embodiments of the present disclosure, the terminal device may be implemented in various forms. For example, the terminal described in the present disclosure may include a mobile terminal such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a visual positioning device for indoor navigation, a wearable device, a smart band, a pedometer, a robot, an unmanned vehicle, and the like, and a fixed terminal such as a digital TV (television), a desktop computer, and the like.

In step S420, feature points of the query image are acquired.

In this step, the terminal device or the server acquires the feature points of the query image.

In step S430, descriptors of feature points of the query image are acquired.

In this step, the terminal device or the server acquires a descriptor of the feature point of the query image. In one embodiment, steps 420 and 430 may be performed simultaneously.

In step S440, an indoor map is queried according to the description of the query image to obtain a set of candidate key images in the indoor map of the query image.

In this step, the terminal device or the server queries an indoor map according to the description of the query image to obtain a candidate key image set in the indoor map of the query image. In one embodiment, the indoor map includes key image information for constructing the indoor map, includes feature point information in key images for constructing the indoor map, and further includes three-dimensional map point information for constructing the indoor map. In one embodiment, all key images in the indoor map having the same feature point as the query image are acquired. In one embodiment, key images with the same feature points as the query image in a certain proportion in all key images of the indoor map are obtained according to the number of the same feature points as the query image from large to small, for example, the key images with the same feature points as the query image in the first 20% of the number of the same feature points as the query image from large to small are selected. In one embodiment, a specific number of key image groups adjacent to each key image in the key images of the specific scale are obtained from the largest number to the smallest number of the common view feature points in the key images of the indoor map, and the specific number of key image groups is, for example, 5. In one embodiment, the key image group with a specific proportion is obtained by using the sum of the number of the same feature points of each key image in the key image group of each key image in the key images with the specific proportion from large to small, and all key images in the key image group with the specific proportion are used as the candidate key image set, and the key image group with the specific proportion is, for example, the key image group with the top 20% of the number of the same feature points in the query image from large to small.

In step S450, each feature point of the query image is compared with a feature point in each key image in the candidate key image set, and the feature points of the query image are filtered according to an optimal and suboptimal ratio to obtain the filtered feature points of the query image.

In this step, the terminal device or the server compares each feature point of the query image with a feature point in each key image in the candidate key image set, and screens the feature points of the query image according to an optimal and suboptimal ratio to obtain the screened feature points of the query image. In one embodiment, when comparing one feature point of the query image with a feature point in one key image in the candidate key image set, if the ratio of the similarity between the one feature point of the query image and two feature points in the one key image in the candidate key image set is greater than or equal to a specific ratio, the one feature point of the query image is filtered. Wherein the specific ratio is, for example, a ratio of similarity of two feature points of 4: 1. In one embodiment, only one feature point of the query image that is filtered out in the comparison of the feature point of the query image to the feature points in each key image in the set of candidate key images is eventually filtered out.

In step S460, the position and the posture of the query image when the query image is captured are determined according to the feature points of the query image after being screened out.

In this step, the terminal device or the server determines the position and the posture when the query image is shot according to the screened feature points of the query image. In one implementation, the position and the posture of the query image are calculated according to the filtered feature points of the query image, and finally solved by using RANSAC (RANdom SAmple Consensus) + PnP (Perspective-n-Point, n-Point Perspective algorithm).

According to the visual positioning method for indoor navigation, the position and the posture of the query image can be determined by acquiring the indoor query image and querying the indoor query image in the indoor map.

The visual positioning method for indoor navigation determines the position and the posture when the query image is shot, and can combine the augmented reality technology to display the guiding information on the mobile phone of the user, so as to provide the interaction experience of what you see is what you get for the user.

Fig. 5 schematically shows a schematic diagram of the pose estimation of a room in visual localization for indoor navigation according to an embodiment of the present disclosure.

Monocular vision positioning

In the usage scenario of AR (augmented reality) navigation, the purpose of positioning is to determine the shooting position and direction of the camera. After a query graph is obtained, extracting features and descriptors of the image, wherein the method adopted by the method is consistent with an algorithm used in an indoor map building stage. According to a large number of 2D key points and descriptors acquired from the image, if the image is converted into a bag-of-word vector, the matching frame (matching image) most likely to become the repositioning in the key frame is retrieved, which is simple, but because the score of a single frame has instability, the retrieval scheme of the candidate key frame with better robustness is adopted in the text:

searching key frames containing the same descriptors in the indoor map and the query graph, and weights (the number of common viewpoints); their collection is denoted as:

S₀＝{(KeyFrameID_i,CommonDescNum_i)|[i]hasCommonDescWith[F]}(2)

in formula (2), 'rear' indicates a condition (having a common descriptor with the query image), and 'front' indicates the identification of the key frame and the number of the same descriptors;

threshold is 0.8 times the maximum number of identical descriptors:

NumThresh＝max({x.KeyFrameID|x∈S₀})*0.8(3)

post-expression in equation (3) is the condition (at S)₀Middle),. supIs the identity of the key frame;

according to the threshold value, again from S₀Screening to obtain a key frame set S with a threshold value of more than or equal to 0.8 times₁：

S₁＝{x.KeyFrameID|x.Num＞NumThresh，x∈S₀}(4)

In equation (4), 'rear' denotes the condition (greater than 0.8 times the threshold value), and 'front' denotes the id of the key frame;

to fully mine and utilize the context information relationship in the map, the set S is paired₁Each key frame in the query image is respectively calculated according to M (actually set as 5) key frames with the maximum weight value connected in the common view, and the matching score with the query frame (query image) is respectively calculated:

G＝{g_y|y∈S₁}(6)

formulas (5) to (7) represent a process of obtaining scores of M (actually set to 5) key frames with the largest connected weights;

and taking the obtained result as a whole to obtain the whole matching score. Finally, the frame with the group score larger than a certain value (0.8 times of the maximum value) and the highest value of the group score is returned:

ScoreThresh＝max({AccScore_y|y∈S₁})*0.8(8)

S₂＝{g_y.KeyFrameID|AccScore_y＞ScoreThresh，y∈S₁}(9)

equations (8) to (9) represent obtaining a set S of key frames having a score greater than a certain value (0.8 times the maximum value)₂；

Finding a set of candidate keyframes S₂Then, each candidate frame is matched with the query frame in a feature matching mode, and the optimal and suboptimal proportion is used for matching feature pointsAnd (4) screening. Matching the 2D characteristic points reserved after filtering with the 3D points in the model, finally solving by using RANSAC + PnP, and calculating the position and the posture [ R | t ] of the query graph]Such as shown in fig. 5.

"6 DoF" is an english abbreviation of 6degrees of freedom, the chinese name is six-degree-of-freedom motion, and the six-degree-of-freedom motion is 6 basic motion names applied by movement of 3D space. Generally, in the motion of 3D space, the motion can be roughly classified into 2 categories, one category is translation (i.e. parallel line movement), and the other category is rotation. However, the vector in 3D volume is X, Y and the Z axis, each with translational and rotational motion, so there are a total of 6 fundamental motion motions in 3D volume, so called 6 DOF.

Fig. 6 schematically shows a block diagram of a visual positioning apparatus for indoor navigation according to an embodiment of the present disclosure. The visual positioning apparatus 600 for indoor navigation provided in the embodiments of the present disclosure may be disposed on a terminal device, or may be disposed on a server side, or may be partially disposed on a terminal device and partially disposed on a server side, for example, may be disposed on the server 105 in fig. 1, but the present disclosure is not limited thereto.

The visual positioning device 600 for indoor navigation provided by the embodiment of the present disclosure may include an obtaining module 610, a query module 620, a comparison screening module 630, and a positioning module 640.

The acquisition module is configured to acquire an indoor query image, acquire feature points of the query image and acquire descriptors of the feature points of the query image; the query module is configured to query an indoor map according to the description of the query image to obtain a candidate key image set in the indoor map of the query image; the comparison screening module is configured to compare each feature point of the query image with a feature point in each key image in the candidate key image set, and screen the feature points of the query image according to an optimal proportion and a suboptimal proportion to obtain the screened feature points of the query image; the positioning module determines the position and the posture when the query image is shot according to the screened feature points of the query image; the indoor map comprises key image information for constructing the indoor map, feature point information in the key image for constructing the indoor map and three-dimensional map point information for constructing the indoor map.

According to the embodiment of the present disclosure, the above-mentioned visual positioning apparatus 600 for indoor navigation may be used to implement the visual positioning method for indoor navigation described in the embodiment of fig. 4.

Fig. 7 schematically shows a block diagram of a visual positioning apparatus 700 for indoor navigation according to another embodiment of the present invention.

As shown in fig. 7, the visual positioning apparatus 700 for indoor navigation further includes a display module 710 in addition to the acquisition module 610, the query module 620, the comparison screening module 630, and the positioning module 640 of the embodiment of fig. 6.

Specifically, the display module 710 is configured to display the position and the posture when the query image is captured according to the filtered feature points of the query image.

In the visual positioning apparatus 700 for indoor navigation, the display module 710 may perform visual display for determining the position and the posture of the query image when the query image is captured according to the filtered feature points of the query image.

Fig. 8 schematically shows a block diagram of a visual positioning apparatus 800 for indoor navigation according to another embodiment of the present invention.

As shown in fig. 8, the visual positioning device 800 for indoor navigation further includes a storage module 810 in addition to the acquisition module 610, the query module 620, the comparison screening module 630, and the positioning module 640 of the embodiment of fig. 6.

In particular, the storage module 810 is used to store various data called during the user's use in a database for repeated queries and checks.

It is understood that the obtaining module 610, the querying module 620, the comparing and screening module 630, the positioning module 640, the displaying module 710, and the storing module 810 may be combined into one module for implementation, or any one of them may be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present invention, at least one of the obtaining module 610, the querying module 620, the comparing and screening module 630, the positioning module 640, the displaying module 710, and the storing module 810 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or any other reasonable manner of integrating or packaging a circuit, or implemented as a suitable combination of three implementations of software, hardware, and firmware. Alternatively, at least one of the obtaining module 610, the querying module 620, the comparison screening module 630, and the locating module 640, the displaying module 710, and the storing module 810 may be at least partially implemented as a computer program module, which when executed by a computer, may perform the functions of the respective modules.

For details that are not disclosed in the embodiment of the apparatus of the present invention, please refer to the embodiment of the visual positioning method of indoor navigation described above for the details that are not disclosed in the embodiment of the apparatus of the present invention, because each module of the visual positioning apparatus of indoor navigation of the present invention can be used to implement the steps of the exemplary embodiment of the visual positioning method of indoor navigation described above in fig. 4.

The specific implementation of each module, unit and subunit in the visual positioning apparatus for indoor navigation provided by the embodiments of the present disclosure may refer to the content in the visual positioning method for indoor navigation, and will not be described herein again.

It should be noted that although several modules, units and sub-units of the apparatus for action execution are mentioned in the above detailed description, such division is not mandatory. Indeed, the features and functionality of two or more modules, units and sub-units described above may be embodied in one module, unit and sub-unit, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module, unit and sub-unit described above may be further divided into embodiments by a plurality of modules, units and sub-units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A visual positioning method for indoor navigation is characterized by comprising the following steps:

acquiring an indoor query image;

acquiring feature points of the query image;

obtaining descriptors of the feature points of the query image;

2. The method of claim 1, wherein querying an indoor map according to the description of the query image to obtain a set of candidate key images in the indoor map of the query image comprises:

3. The method of claim 2, wherein querying an indoor map according to the description of the query image to obtain a set of candidate key images in the indoor map of the query image comprises:

4. The method of claim 3, wherein querying an indoor map according to the description of the query image to obtain a set of candidate key images in the indoor map of the query image comprises:

5. The method of claim 4, wherein querying an indoor map according to the description of the query image to obtain a set of candidate key images in the indoor map of the query image comprises:

6. The method of claim 1, wherein the step of screening the feature points of the query image at an optimal and suboptimal ratio to obtain the screened feature points of the query image comprises:

7. The method of claim 6, wherein the step of screening the feature points of the query image at an optimal and suboptimal ratio to obtain the screened feature points of the query image comprises:

8. A visual positioning device for indoor navigation, comprising:

9. An electronic device, comprising:

one or more processors;

a storage device configured to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-8.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.