NZ743271B2

NZ743271B2 - Relocalization systems and methods

Info

Publication number: NZ743271B2
Application number: NZ743271A
Authority: NZ
Inventors: Tomasz J Malisiewicz; Andrew Rabinovich; Brigit Schroeder
Original assignee: Magic Leap Inc
Priority date: 2015-12-04
Filing date: 2016-12-05
Publication date: 2021-09-28

Abstract

The present disclosure relates to the localisation of pose-sensitive systems. One disclosed embodiment relates to a method of determining a pose of an image capture device. The method includes capturing an image using an image capture device. The method also includes generating a data structure corresponding to the captured image. The method further includes comparing the data structure with a plurality of known data structures to identify a most similar known data structure. The method further includes reading metadata corresponding to the most similar known data structure to determine a pose of the image capture device. The method further includes training a neural network by mapping a plurality of known images to the plurality of known data structures. The data structure is an N dimensional vector. The step of generating the data structure corresponding to the captured image comprises using a neural network to map the captured image to the N dimensional vector. Each known image of the plurality has respective metadata including pose data. The step of training the neural network comprises accessing a database of the known images annotated with the respective metadata, decreasing a first Euclidean distance between first and second known N dimensional vectors respectively corresponding to matching first and second known images in an N dimensional space, and increasing a second Euclidean distance between first and third known N dimensional vectors respectively corresponding to non-matching first and third known images in the N dimensional space. esponding to the captured image. The method further includes comparing the data structure with a plurality of known data structures to identify a most similar known data structure. The method further includes reading metadata corresponding to the most similar known data structure to determine a pose of the image capture device. The method further includes training a neural network by mapping a plurality of known images to the plurality of known data structures. The data structure is an N dimensional vector. The step of generating the data structure corresponding to the captured image comprises using a neural network to map the captured image to the N dimensional vector. Each known image of the plurality has respective metadata including pose data. The step of training the neural network comprises accessing a database of the known images annotated with the respective metadata, decreasing a first Euclidean distance between first and second known N dimensional vectors respectively corresponding to matching first and second known images in an N dimensional space, and increasing a second Euclidean distance between first and third known N dimensional vectors respectively corresponding to non-matching first and third known images in the N dimensional space.

Description

RELOCALIZATION S AND METHODS Cross-Reference to Related Application This application claims priority to U.S. Provisional ation Serial Number 62/263,529 filed on December 4, 2015 entitled “RELOCALIZATION SYSTEMS AND METHODS,” under attorney docket number 9512-30061.00 US.

The present application includes t matter similar to that described in U.S.

Utility Patent Application Serial No. ,042 filed on May 9, 2016 ed “DEVICES,METHODS AND SYSTEMS FOR BIOMETRIC USER RECOGNITION UTILIZING NEURAL NETWORKS,” under ey docket number ML.20028.00.

The contents of the aforementioned patent application are hereby expressly and fully incorporated by reference in their entirety, as though set forth in full.

The subject matter herein may be employed and/or ed with various systems, such as those wearable computing systems and components thereof designed by organizations such as Magic Leap, Inc. of Fort Lauderdale, Florida. The following documents are hereby expressly and fully incorporated by nce in their entirety, as though set forth in full: U.S. Patent Application Serial Number 14/641,376; U.S . Patent Application Serial Number 14/555,585 ; U.S . Patent Application Serial Number 14/205,126; U.S . Patent Application Serial Num ber 14/212,961; U.S . Patent Application Serial Number 14/690,401 ; U.S . Patent Application Serial Number 13/663,466; and U.S. Patent Application Serial Number 13/684,489.

Field of the Invention The present disclosure relates to devices, s and systems for localization of pose sensitive systems. In particular, the present disclosure relates to devices, methods and systems for relocalization of pose sensitive systems that have either lost and/or yet to established a system pose.

Background An increasing number of systems require pose information for the systems to function optimally. Examples of systems that require pose information f or optimal performance include, but are not limited to, c and mixed reality (MR) systems (i.e., l reality (VR) and/or augmented reality (AR) systems). Such systems can be collectively referred to as “pose sensitive” s. One example of pose information is l information along six degrees of freedom that locates and orients the pose sensitive system in three-dimensional space.

Pose ive systems may become “lost” (i.e., lose track of the system pose) after various events. Some of these events include: 1. Rapid camera motion (e.g., in an AR system worn by a sports participant); 2. Occlusion (e.g., by a person walking into a field of view); 3. Motion-blur (e.g., with rapid head rotation by an AR system user); 4. Poor lighting (e.g., blinking lights); 5. Sporadic system failures (e.g., power failures); and 6. Featureless nments (e.g., rooms with plain walls). Any of these event and many others can cally affect e-based tracking such as that employed by current simultaneous localization and mapping (“SLAM”) systems with robust tracking front-ends, thereby causing these systems to become lost.

Accordingly, relocalization (i.e., finding a ’s pose in a map when the system is “lost” in a space that has been mapped) is a challenging and key aspect of ime visual tracking. Tracking failure is a critical problem in SLAM systems and a system’s ability to recover (or relocalize) relies upon its ability to accurately recognize a location, which it has previously visited.

The problem of image based localization in robotics is ly ed to as the Lost Robot problem (or the Kidnapped Robot m). The Lost Robot problem is also related to both the Wake-up Robot problem and Loop Closure ion. The Wake-up Robot m involves a system being turned on for the first time. Loop Closure detection involves a system that is tracking successfully, revisiting a previously visited location. In Loop e detection, the image localization system must recognize that the system has visited the on before.

Such Loop Closure detections help prevent localization drift and are important when building 3D maps of large environments. Accordingly, pose sensitive system localization is useful in situations other than lost system scenarios.

MR systems (e.g., AR systems) have even higher localization requirements than typical robotic systems. The devices, methods and systems for localizing pose sensitive systems described and claimed herein can facilitate optimal function of all pose sensitive systems.

Summary In one embodiment directed to a method of determining a pose of an image e device, the method includes capturing an image using an image capture device. The method also includes generating a data structure corresponding to the captured image. The method further includes comparing the data structure with a ity of known data structures to identify a most similar known data ure. Moreover, the method includes reading metadata corresponding to the most similar known data structure to determine a pose of the image capture device.

In one or more ments, the data structure is a compact representation of the captured image. The data structure may be an N dimensional vector. The data structure may be a 128 dimensional vector.

In one or more embodiments, generating the data structure ponding to the captured image includes using a neural network to map the captured image to the N dimensional vector. The neural network may be a utional neural network.

[0012] In one or more embodiments, each of the plurality of known data structures is a respective known N dimensional vector in an N dimensional space.

Each of the plurality of known data structures may be a respective known 128 dimensional vector in a 128 dimensional space.

The data structure may be an N dimensional vector. Comparing the data structure with the plurality of known data ures to identify the most r known data structure may include determining respective Euclidean distances between the N dimensional vector and each respective known N dimensional vector.

Comparing the data structure with the plurality of known data structures to fy the most similar known data structure may also include identifying a known N dimensional vector having a smallest distance to the N dimensional vector as the most similar known data structure.

In one or more embodiments, the method also includes training a neural network by mapping a plurality of known images to the ity of known data structures. The neural network may be a convolutional neural k. Training the neural k may include modifying the neural network based on comparing a pair of known images of the plurality.

In one or more embodiments, training the neural network comprises modifying the neural network based on comparing a triplet of known images of the plurality. Each known image of the plurality may have respective metadata, including pose data. Training the neural network may include accessing a database of the known images annotated with the respective metadata. The pose data may encode a translation and a rotation of a camera corresponding to a known image.

In one or more embodiments, each known data structure of the plurality is a respective known N dimensional vector in an N dimensional space. A first known image of the triplet may be a matching image for a second known image of the triplet. A third known image of the triplet may be a tching image for the first known image of the triplet. A first ean distance between respective first and second pose data corresponding to the matching first and second known images may be less than a ined threshold. A second Euclidean distance between respective first and third pose data corresponding to the non-matching first and third known images may be more than the predefined old.

In one or more embodiments, training the neural network es decreasing a first Euclidean distance between first and second known N ional vectors respectively corresponding to the matching first and second known images in an N dimensional space. Training the neural network may also include increasing a second Euclidean distance between first and third known N dimensional vectors respectively corresponding to the non-matching first and third known images in the N dimensional space.

In one or more embodiments, the method also includes ing the data structure with the plurality of known data structures to identify the most similar known data structure in real time. The metadata corresponding to the most similar known data structure may include pose data ponding to the most similar known data structure. The method may also include determining a pose of the image capture device from the pose data in the metadata of the most similar known data structure.

In another embodiment, there is provided a method of determining a pose of an image capture device. The method comprises capturing an image using an image capture device, ting a data structure ponding to the captured image, comparing the data structure with a plurality of known data structures to identify a most similar known data structure, reading metadata corresponding to the most r known data structure to determine a pose of the image capture device, and training a neural network by mapping a plurality of known images to the plurality of known data structures. The data structure is an N dimensional vector. The step of generating the data structure corresponding to the captured image comprises using a neural network to map the captured image to the N dimensional vector. E ach known image of the plurality has respective metadata including pose data. The step of training the neural k comprises accessing a database of the known images annotated with the respective metadata, decreasing a first Euclidean distance between first and second known N dimensional vectors respectively corresponding to ng first and second known images in an N dimensional space, and increasing a second Euclidean distance n first and third known N dimensional vectors respectively ponding to non-matching first and third known images in the N dimensional space.

Brief Description of the Drawings

[0020] The drawings illustrate the design and utility of various embodiments of the present ion. It should be noted that the figures are not drawn to scale and that elements of similar structures or functions are represented by like reference numerals throughout the figures. In order to better iate how to obtain the recited and other advantages and objects of various embodiments of the invention, a more detailed ption of the present inventions briefly described above will be rendered by reference to specific embodiments thereof, which are illustrated in the accompanying drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which: Figure 1 is a schematic view of a query image and six known images for a localization/relocalization system, according to one embodiment; Figure 2 is a schematic view of an embedding of an image to a data structure, according to one embodiment;

[0023] Figure 3 is a tic view of a method for training a neural network, according to one embodiment; Figure 4 is a schematic view of data flow in a method for localizing/relocalizing a pose sensitive , according to one embodiment; Figure 5 is a flow chart ing a method for localizing/relocalizing a pose sensitive , according to one embodiment.

Detailed Description Various ments of the invention are directed to methods, systems, and articles of manufacture for localizing or relocalizing a pose ive system (e.g., an augmented reality (AR) system) in a single embodiment or in multiple embodiments. Other s, features, and advantages of the invention are described in the detailed description, figures, and claims.

Various embodiments will now be described in detail with nce to the drawings, which are provided as illustrative examples of the invention so as to enable those skilled in the art to practice the invention. Notably, the figures and the examples below are not meant to limit the scope of the present invention. Where certain elements of the present ion may be partially or fully implemented using known components (or methods or ses), only those portions of such known components (or methods or processes) that are necessary for an understanding of the present invention will be bed, and the ed descriptions of other portions of such known components (or methods or processes) will be omitted so as not to obscure the invention. Further, various embodiments encompass present and future known equivalents to the components ed to herein by way of illustration.

Localization/Relocalization Systems and Methods Various embodiments of augmented y display systems have been discussed in co-owned U.S. y Patent Application Serial Number 14/555,585 filed on November 27, 2014 under attorney docket number MLUS and entitled “VIRTUAL AND AUGMENTED REALITY SYSTEMS AND METHODS,” and co- owned U.S. Prov. Patent ation Serial Number 62/005,834 filed on May 30, 2014 under attorney docket number ML 30017.00 and entitled “METHODS AND SYSTEM FOR CREATING FOCAL PLANES IN VIRTUAL AND AUGMENTED REALITY,” the contents of the aforementioned U.S. patent applications are hereby expressly and fully incorporated herein by reference as though set forth in full.

Localization/relocalization systems may be implemented independently of AR systems, but many embodiments below are described in relation to AR s for illustrative purposes only.

Disclosed are devices, methods and systems for localizing/relocalizing pose sensitive systems. In one embodiment, the pose sensitive system may be a head-mounted AR display system. In other embodiments, the pose sensitive system may be a robot. s embodiments will be described below wit h respect to localization/relocalization of a head-mounted AR system, but it should be appreciated that the embodiments disclosed herein may be used independently of any existing and/or known AR system.

For instance, when an AR system “loses” its pose tracking after it experiences one of the disruptive events described above (e.g., rapid camera motion, occlusion, motion-blur, poor lighting, sporadic system es, featureless environments, and the like), the AR system performs a relocalization procedure according to one ment to blish the pose of the system, which is needed for optimal system performance. The AR system begins the relocalization procedure by capturing one or more images using one or more s d thereto. Next, the AR system compares a captured image with a plurality of known images to identify a known image that is the closest match to the captured image. Then, the AR system accesses metadata for the closest match known image including pose data, and reestablishes the pose of the system using the pose data of the closest match known image.

[0031] Figure 1 depicts a query image 110, which represents an image captured by the lost AR system. Figure 1 also depicts a plurality (e.g., six) of known images 112a-112f, against which the query image 110 is compared. The known images 112a-112f may have been recently captured by the lost AR system. In the embodiment depicted in Figure 1, known image 112a is the t match known image to the query image 110. Accordingly, the AR system will reestablish its pose using the pose data associated with known image 112a. The pose data may encode a translation and a rotation of a camera corresponding to the closest match known image 112a. r, comparing a large number (e.g., more than 10,000) of image pairs on the pixel-by-pixel basis is computationally intensive. This limitation renders a by-pixel comparison prohibitively inefficient for real time (e.g., 60 or more frames per second) pose sensitive system relocalization. Accordingly, Figure 1 only schematically depicts the image comparison for system relocalization.

According to one embodiment, the query image (e.g., query image 110) and the plurality of known images (e.g., known images 112a-112f) are ormed into data structures that are both easier to s and compare, and easier to store and organize. In ular, each image is “embedded” by projecting the image into a lower dimensional ld where triangle inequality is preserved.

Triangle lity is the geometric property wherein for any three points not on a line, the sum of any two sides is greater than the third side.

In one embodiment, the lower dimensional manifold is a data structure in the form of an N dimensional vector. In particular, the N dimensional vector may be a 128 dimensional vector. Such a 128 dimensional vector strikes an effective balance between size of the data structure and ability to analyze images represented by the data structure. Varying the number of dimensions of N dimensional s for an image based localization/relocalization method can affect the speed of similarity metric computation and end-to-end training ibed below). All other factors being equal, the lowest dimensional representation is preferred. Using 128 dimensional vectors results in a lean, yet robust embedding for image based localization/relocalization methods. Such vectors can be used with convolutional neural networks, rendering the localization/relocalization system improvable with new data, and efficiently functional on new data sets.

Figure 2 schematically depicts the embedding of an image 210 through a series of is/simplification/reduction steps 212. The image 210 may be a 120 pixel x 160 pixel image. The result of the operations in step 212 on the image 210 is an N ional vector 214 (e.g., 128 dimensional) representing the image 210.

While the embodiments described herein utilize a 128 dimensional vector as a data structure, any other data structure, including vectors with a different number of dimensions, can ent the images to be analyzed in localization/relocalization systems according to the embodiments herein.

For localization/relocalization, this compact entation of an image (i.e., an embedding) may be used to compare the similarity of one location to r by comparing the Euclidean ce between the N dimensional vectors. A network of known N dimensional vectors corresponding to known training images, trained with both indoor and outdoor on based datasets (described , may be configured to learn visual similarity (positive images) and dissimilarity (negative images). Based upon this learning process, the embedding is able to successfully encode a large degree of appearance change for a specific location or area in a relatively small data structure, making it an efficient representation of locality in a localization/relocalization system.

Network Training Networks must be trained before they can be used to efficiently embed images into data structures. Figure 3 schematically depicts a method for training a network 300 using image triplets 310, 312, 314, according to one embodiment. The network 300 may be a convolutional neural network 300. The network training system uses a query image 312, a ve (matching) image 310, and a ve (non-matching) image 314 for one cycle of training. The query and positive images 312, 310 in Figure 3 each depict the same object (i.e., a person), perhaps from different points of view. The query and negative images 312, 314 in Figure 3 depict different objects (i.e., people). The same network 310 learns all of the images 310, 312, 314, but is trained to make the scores 320, 322 of the two matching images 310, 312 as close as possible and the score 324 of the non-matching image 314 as different as possible from the scores 320, 322 of the two matching images 310, 312.

This training s is repeated with a large set of images.

[0038] When ng is complete, the network 300 maps different views of the same image close together and different images far apart. This network can then be used to encode images into a nearest neighbor space. When a newly captured image is analyzed (as described above), it is encoded (e.g., into an N dimensional vector). Then the localization/relocalization system can determine the distance to the captured image’s nearest otherencoded images. If it is near to some encoded s), it is considered to be a match for that s). If it is far from some encoded image, it is ered to be a non-match for that image. As used in this application, “near” and “far” include, but are not limited to, relative Euclidean distances between two poses and/or N dimensional vectors.

[0039] Learning the weights of the neural network (i.e., the training thm) includes comparing a triplet of known data structures of a plurality of known data structures. The triplet consists of a query image, positive image, and negative image. A first Euclidean distance between respective first and second pose data corresponding to the query and positive images is less than a predefined threshold, and a second Euclidean distance between respective first and third pose data corresponding to the query and negative images is more than the predefined threshold. The network produces a 128 dimensional vector for each image in the triplet, and an error term is non-zero if the negative image is closer (in terms of Euclidean distance) to the query image than the positive. The error is propagated through the network using a neural network opagation algorithm. The network can be trained by decreasing a first Euclidean ce between first and second 128 ional s corresponding to the query and positive images in an N dimensional space, and increasing a second Euclidean distance between first and third 128 dimensional vectors respectively ponding to the query and negative images in the N dimensional space. The final configuration of the network is achieved after passing a large number of triplets through the k.

It is desirable for an appearance based relocalization system generally to be invariant to s in viewpoint, illumination, and scale. The deep metric learning network described above is suited to solving the problem of appearance- invariant relocalization. In one embodiment, the triplet convolutional neural network model embeds an image into a lower dimensional space where the system can measure meaningful ces between images. Through the careful selection of triplets, consisting of three images that form an anchor-positive pair of similar images and an anchor-negative pair of dissimilar images, the convolutional neural network can be trained for a variety of locations, including changing locations.

While the training embodiment described above uses triplets of images, network training according to other embodiments, may e other pluralities of images (e.g., pairs and quadruplets). For image pair training, a query image may be sequentially paired with positive and negative images. For image quadruplet training, a quadruplet should include at least a query image, a positive image, and a negative image. The ing image may be an additional ve or negative image based on the intended application for which the network is being trained. For zation/relocalization, which typically involves more non-matches than matches, the fourth image in quadruplet training may be a negative image.

While the training embodiment described above uses a single convolutional neural network, other training ments may utilize multiple operatively coupled networks. In still other embodiments, the network(s) may be other types of neural networks with backpropagation.

Exemplary Network Architecture An exemplary neural network for use with localization/relocalization systems according to one embodiment has 3x3 convolutions and a single fully connected layer. This architecture allows the system to take age of emerging hardware acceleration for popular architectures and the ability to initialize from ImageNet pre-trained weights. This 3x3 convolutions architecture is sufficient for g a wide array of problems with the same network ecture.

This exemplary neural network architecture es 8 convolutional layers and 4 max pooling layers, followed by a single fully connected layer of size 128. A max pooling layer is disposed after every two convolutional blocks, ReLU is used for the non-linearity, and BatchNorm layers are disposed after every convolution. The final fully connected layer maps a blob of size [8x10x128] to a 128x1 vector, and a custom entiable malization provides the final embedding.

Localization/Relocalization Systems and Methods

[0045] Now that the training of the convolutional neural network according to one embodiment has been described, Figures 4 and 5 depict two similar methods 400, 500 of localizing/relocalizing a pose sensitive system according to two embodiments.

Figure 4 schematically s a query image 410, which is ed by a neural network 412 into a corresponding query data structure 414. The query image 410 may have been acquired by a lost pose sensitive system for use in relocalization. The neural network 412 may be a trained convolutional neural network (see 300 in Figure 3). The query data ure 414 may be a 128 dimensional vector.

[0047] The query data structure 414 corresponding to the query image 410 is compared to a database 416 of known data structures 18e. Each known data structures 418a-418e is associated in the database 416 with corresponding metadata 420a-420e, which includes pose data for the system which captured the known image corresponding to the known data ure 418. The result of the comparison is identification of the t neighbor (i.e., best match) to the query data structure 414 corresponding to the query image 410. The nearest neighbor is the known data structure (e.g., the known data structure 418b) having the shortest relative Euclidean distances to the query data structure 414.

After the nearest neighbor known data structure, 418b in this embodiment, has been identified, the associated metadata 420b is transferred to the system. The system can then use the pose data in the metadata 420b to localize/relocalize the previously lost pose sensitive system.

Figure 5 is a flow chart depicting a method 500 of image based localization/relocalization. At step 502, a pose sensitive system without pose information es an image. At step 504, the system compares the captured image with a plurality of known images. At step 506, the system identifies the known image that is the closest match to the captured image. At step 508, the system es pose metadata for the closest match known image. At step 510, the system generates pose information for itself from the pose metadata for the closest match known image.

Relocalization using a triplet utional neural network outperforms current lization methods in both accuracy and efficiency.

Image Based Mapping When a localizing/relocalizing system is used to form a map based on d images, the system obtains images of a location, encodes those pictures using the triplet network, and locates the system on the map based on a location corresponding to the t image(s) to the obtained images.

Various exemplary embodiments of the invention are described herein.

Reference is made to these examples in a non-limiting sense. They are provided to illustrate more broadly applicable aspects of the ion. Various s may be made to the invention bed and equivalents may be substituted without departing from the true spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation, material, composition of , s, process act(s) or step(s) to the objective(s), spirit or scope of the present invention. Further, as will be appreciated by those with skill in the art that each of the individual ions described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present inventions. All such modifications are intended to be within the scope of claims associated with this disclosure.

The invention es methods that may be performed using the subject devices. The methods may comprise the act of providing such a suitable device. Such provision may be performed by the end user. In other words, the “providing” act merely requires the end user obtain, access, approach, position, set- up, activate, power-up or otherwise act to provide the requisite device in the subject method. Methods recited herein may be carried out in any order of the recited events which is logically possible, as well as in the d order of events.

Exemplary aspects of the invention, er with details ing material selection and manufacture have been set forth above. As for other details of the present invention, these may be appreciated in connection with the abovereferenced s and publications as well as generally known or appreciated by those with skill in the art. The same may hold true with respect to method-based aspects of the invention in terms of additional acts as commonly or logically employed.

[0055] In addition, though the invention has been described in reference to several examples optionally incorporating various features, the invention is not to be limited to that which is described or indicated as contemplated with respect to each variation of the ion. Various changes may be made to the invention described and equivalents (whether recited herein or not included for the sake of some brevity) may be tuted without departing from the true spirit and scope of the invention.

In addition, where a range of values is provided, it is understood that every ening value, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention.

Also, it is contemplated that any optional feature of the inventive variations described may be set forth and claimed independently, or in combination with any one or more of the es described herein. Reference to a singular item, includes the possibility that there are plural of the same items present. More specifically, as used herein and in claims ated hereto, the singular forms “a,” “an,” “said,” and “the” e plural referents unless the specifically stated otherwise.

In other words, use of the articles allow for “at least one” of the subject item in the description above as well as claims associated with this disclosure. It is further noted that such claims may be d to exclude any optional element. As such, this ent is ed to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.

Without the use of such exclusive terminology, the term “comprising” in claims associated with this disclosure shall allow for the inclusion of any additional element--irrespective of whether a given number of elements are enumerated in such claims, or the addition of a feature could be regarded as transforming the nature of an element set forth in such claims. Except as ically defined herein, all technical and scientific terms used herein are to be given as broad a commonly tood meaning as possible while maintaining claim validity.

The breadth of the present invention is not to be limited to the examples provided and/or the subject specification, but rather only by the scope of claim language associated with this sure.

In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be t that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. For e, the above-described process flows are described with reference to a particular ordering of s actions. However, the ordering of many of the described s actions may be changed without affecting the scope or operation of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense.

Claims

1.

A method of determining a pose of an image capture device, comprising: ing an image using an image capture device; 5 generating a data structure corresponding to the captured image; comparing the data structure with a plurality of known data structures to fy a most similar known data structure; reading ta corresponding to the most r known data structure to determine a pose of the image capture device; and 10 training a neural network by mapping a plurality of known images to the plurality of known data structures, n the data ure is an N dimensional vector, wherein generating the data structure corresponding to the captured image comprises using a neural network to map the captured image to the N dimensional 15 vector, wherein each known image of the plurality has respective metadata including pose data, and n training the neural network comprises: accessing a database of the known images annotated with the 20 respective metadata; decreasing a first Euclidean distance between first and second known N dimensional vectors respectively corresponding to matching first and second known images in an N dimensional space; and increasing a second Euclidean distance between first and third known N dimensional vectors respectively corresponding to non-matching first and third known images in the N dimensional space. 5 2. The method of claim 1, wherein the data structure is a compact entation of the captured image.

3. The method of claim 1 or 2, wherein the neural network is a convolutional neural network.

4. The method of any one of claims 1 to 3, wherein the data structure is a 128 dimensional vector.

5. The method of claim 1, n each of the plurality of known data 15 ures is a respective known N dimensional vector in an N dimensional space.

6. The method of claim 5, wherein comparing the data structure with the plurality of known data structures to identify the most r known data structure comprises: 20 determining respective Euclidean distances n the N dimensional vector and each respective known N dimensional vector, and identifying a known N dimensional vector having a smallest distance to the N dimensional vector as the most similar known data structure.

7. The method of claim 1, wherein each of the plurality of known data structures is a respective known 128 dimensional vector in a 128 dimensional space.

8. The method of claim 1, wherein training the neural network comprises 5 modifying the neural network based on comparing a triplet of known images of the plurality.

9. The method of claim 8, wherein each known data structure of the plurality is a respective known N dimensional vector in an N dimensional space,

10 wherein a first known image of the triplet is a matching image for a second known image of the triplet, and wherein a third known image of the triplet is a non-matching image for the first known image of the triplet. 15 10. The method of any one of claims 1 to 9, wherein the pose data s a translation and a rotation of a camera corresponding to a known image.

11. The method of claim 1, wherein a first Euclidean distance n respective first and second pose data corresponding to the matching first and 20 second known images is less than a predefined threshold, and n a second Euclidean ce between tive first and third pose data corresponding to the non-matching first and third known images is more than the predefined threshold.

12. The method of any one of claims 1 to 11, further comprising comparing the data structure with the plurality of known data structures to fy the most similar known data structure in real time. 112*? 1126 $12d 112:: HG. 112i: 1?2a 2 i5 31’5 Space Feature w w parametar ter Newark Share Share 3 .e . ;;;; Conveiutianai PEG. Space image nomw iiiiii E3.353% Emﬁﬁmg momw$wwv mmgﬁmm Ema g; cmow 51’5 502 504 Ccmpare Captured Capture image with image with Known System images Access Page identify Known ta for image Cicsest Ci0$e$t Match Match t0 Captured Knawn image image Generate System Pose from Passe Metadata FEG. 5