GB2532461A

GB2532461A - System and method for position tracking in a head mounted display

Info

Publication number: GB2532461A
Application number: GB1420567.8A
Authority: GB
Inventors: David Wright Julian; Giacomo Robert Colosimo Nicholas; James Whiteford Christopher; Jean Page Heather; Robert Goodall Mark
Original assignee: BAE Systems PLC
Current assignee: BAE Systems PLC
Priority date: 2014-11-19
Filing date: 2014-11-19
Publication date: 2016-05-25
Also published as: GB201420567D0; WO2016079471A1

Abstract

Position tracking system for a head mounted display, comprising at least one image capture device (106, Figure 2) mounted on said head mounted display, processor to perform a calibration process in which it receives image data from image capture device/s in respect of a users environment 201, identifies discernible features 200 therein such as corners, objects and stores a three-dimensional representation of the environment in terms of the discernible features. The processor then receives image data from said at least one image capture device in respect of said users environment, and identifies one or more transformations between the discernible features, comparing one or more transformations with the three-dimensional representation of the users environment, and thereby tracking the position and/or orientation of the head mounted display (100) within the environment. The system may comprise a pair of image capture devices that capture respective images of the environment and the processor defines a depth map.

Description

SYSTEM AND METHOD FOR POSITION TRACKING IN A HEAD MOUNTED DISPLAY

This invention relates generally to a system and method for position tracking and, more particularly, to a system and method for position, orientation and/or motion tracking in respect of a head mounted display.

Head mounted displays are becoming increasingly commonly used in various technological fields, including augmented and mixed reality systems in which images of a user's real world environment are captured, rendered and placed within a three-dimensional (3D) virtual reality environment displayed on a screen within the head mounted display. In alternative "augmented reality" systems, known as see-through imaging systems, the head mounted display screen may be transparent or translucent such that the user's view of the external environment (through the screen) is organically incorporated into the 3D virtual reality environment displayed on the screen.

In both cases, in order to ensure that the virtual images or symbology displayed on the screen are congruent with the outside world, it is necessary to provide a positioning system that tracks the user's head and angle. Of course, for many applications, the congruence between the virtual reality and real world images or views must be very precise in order to be effective, and this requires the absolute position and angle of the user's head to be accurately tracked in real time, so that the 3D virtual reality images can be updated and manipulated in real time. Prior art motion tracking systems have been proposed for this purpose, which comprise a plurality of external cameras, mounted at various fixed locations within the user's environment, that track infrared dots on the headset. Such systems, whilst effective in achieving the required motion tracking and absolute position determination, require the use of a significant additional hardware that needs to be fixed at the required positions within the user's environment. The cameras required for such systems may be fixedly mounted on a large rig or may, alternatively, be provided separately to be mounted in or on elements of the environment infrastructure but, either way, the additional hardware required is large, bulky and costly, and takes significant time to mount and configure correctly. This also leads to a lack of flexibility in terms of the environments in which the head mounted display can be used. -2 -

Aspects of the present invention seek to address at least some of these issues.

In accordance with an aspect of the present invention, there is provided a position tracking system for a head mounted display, the system comprising at 5 least one image capture device mounted or mountable in or on said head mounted display, and a processor configured, in use, to: a) perform a calibration process in which it receives image data from said at least one image capture device in respect of a user's environment, identifies discernible features therein and stores a three-dimensional representation of said environment; and b) after said calibration process, receive image data from said at least one image capture device in respect of said user's environment, identify in said image data one or more transformations between said discernible features, compare said one or more transformations with said three-dimensional representation of said user's environment, and thereby track the position and/or orientation of said head mounted display within said environment.

In one exemplary embodiment, the system may comprise a pair of spatially separated image capture devices for capturing respective images of the real world environment in the vicinity of the user, said processor being configured to define a depth map using respective image frame pairs to produce three dimensional image data.

The processor may be configured to automatically identify predetermined discernible features within said captured images, and store data representative thereof, together with data representative of their relative locations within said user's environment.

The processor may be further configured to analyse said three dimensional representation of said user's environment in terms of said predetermined discernible features, determine if sufficient predetermined discernible features are included and, if not, provide an output indicative thereof. -3 -

The processor may, additionally or alternatively, be configured to identify predefined markers located within said user's environment and store data representative thereof, together with data representative of their relative locations within said environment.

In one exemplary embodiment, the system may further comprise a motion sensor, wherein said processor is configured to receive signals from said motion sensor, compare said signals with said identified transformations, and determine their correspondence.

Another aspect of the present invention extends to a mixed reality apparatus comprising a headset for placing over a user's eyes, in use, said headset including a screen, image capture means for capturing images of the real world environment in the vicinity of a user, and a processor configured to generate a selected three-dimensional virtual reality environment and blend images of said real world environment into said three-dimensional virtual reality environment to create a mixed reality environment and display said mixed reality environment on said screen, the processor being further configured, in use, to: a) perform a calibration process in which it receives image data from said at least one image capture device in respect of a user's environment, 20 identifies discernible features therein and stores a three-dimensional representation of said environment in terms of said discernible features; and b) after said calibration process, receive image data from said at least one image capture device in respect of said user's environment, identify in said image data one or more transformations between said discernible features, compare said one or more transformations with said three-dimensional representation of said user's environment, and thereby track the position and/or orientation of said head mounted display within said environment.

The apparatus may further comprise a motion sensor mounted in or on said headset, wherein said processor is further configured to receive signals 30 from said motion sensor, compare said signals with said identified transformations, and determine their correspondence. -4 -

The apparatus may comprise a pair of spatially separated image capture devices for capturing respective images of the real world environment in the vicinity of the user, said processor being configured to define a depth map using respective image frame pairs to produce three dimensional image data.

These and other aspects will become apparent from the following specific description in which embodiments of the present invention will be described, by way of examples only, and with reference to the accompanying drawings, in which: Figure 1 is a front perspective view of a headset for use in a mixed reality 10 system with which a position tracking system according to an exemplary embodiment of the present invention may be used; Figure 2 is a schematic block diagram of a mixed reality system with which a position tracking system according to an exemplary embodiment of the present invention may be used; and Figure 3 is a schematic view of a real world environment and the markers used by a position tracking system according to an exemplary embodiment of the present invention, during a calibration process, to generate a three dimensional model of the environment.

Referring to Figure 1 of the drawings, a head mounted display, in respect of which a position tracking system according to an exemplary embodiment of the present invention may be used, comprises a headset comprising a visor 10 having a pair of arms 12 hingedly attached at opposing sides thereof in order to allow the visor to be secured onto a user's head, over their eyes, in use, by placing the curved ends of the arms 12 over and behind the user's ears, in a manner similar to conventional spectacles. It will be appreciated that, whilst the headset is illustrated herein in the form of a visor, it may alternatively comprise a helmet for placing over a user's head, or even a pair of contact lenses or the like, for placing within a user's eyes, and the present invention is not intended to be in any way limited in this regard. Also provided on the headset, is a pair of image capture devices 14 for capturing images of the environment, such image capture devices being mounted as closely as possible aligned with the user's eyes, in use. -5 -

A mixed reality system, within which a position tracking system according to an exemplary embodiment of the invention may be employed, comprises a processor, which is communicably connected in some way to a screen which is provided inside the visor 10. Such communicable connection may be a hard wired electrical connection, in which case the processor and associated circuitry will also be mounted on the headset. However, in an alternative exemplary embodiment, the processor may be configured to wirelessly communicate with the visor, for example, by means of Bluetooth or similar wireless communication protocol, in which case, the processor need not be mounted on the headset but can instead be located remotely from the headset, with the relative allowable distance between them being dictated and limited only by the wireless communication protocol being employed. For example, the processor could be mounted on, or formed integrally with, the user's clothing, or instead located remotely from the user, either as a stand-alone unit or as an integral part of a larger control unit, for example.

Thus, referring to Figure 2 of the drawings, a mixed reality system of the type described above comprises, generally, a headset 100, incorporating a screen 102, a processor 104, and a pair of external digital image capture devices (only one shown) 106. The processor 104 is configured to generate a three dimensional virtual environment. The processor is further configured to receive image data, representative of the user's real world environment, from the image capture devices, and render and blend such image data into the three dimensional virtual environment, either selectively or in its entirety, as required by the application.

Digital video image frames of the user's real world environment 42, 50 are captured by the image capture devices provided on the headset 10, and two image capture devices are used in this exemplary embodiment to capture respective images such that the data representative thereof can be blended to produce a stereoscopic depth map which enables the processor to determine depth within the captured images without any additional infrastructure being required. The user can select, in some exemplary embodiments, portions or objects from the images to be blended into the virtual environment being displayed on the screen. In other embodiments, the user may initially be -6 -presented with a rendered and blended image of the entire real world environment and may, optionally, be able to select portions thereof to be removed from the displayed image. In all cases, the resultant displayed image is continuously updated as the user's field of view changes, either due to their own movement or movement of or within their environment.

The general concept of real time image blending for augmented reality is known, and several different techniques have been proposed. The present invention is not necessarily intended to be in any way limited in this regard. However, for completeness, one exemplary method for image blending will be briefly described. Thus, once an object has been selected from a real world image to be blended into the virtual environment, a threshold function may be applied in order to extract that object from the background image. Its relative location and orientation may also be extracted and preserved by means of marker data. Next, the image and marker data is converted to a binary image, possibly by means of adaptive thresholding (although other methods are known). The marker data and binary image are then transformed into a set of coordinates which match the location within the virtual environment in which they will be blended. Such blending is usually performed using black and white image data. Thus, if necessary, colour data sampled from the source image can be backward warped, using homography, to each pixel in the resultant virtual scene. All of these computational steps require minimal processing and time and can, therefore, be performed quickly and in real (or near real) time. Thus, if the selected object is moving, for example, a person, the corresponding image data within the virtual environment can be updated in real time.

In addition, a position tracking system is provided, to track the absolute position and orientation of the user's head relative to their real world environment, in order to update the displayed mixed reality environment in real time, as the user moves around, whilst ensuring accurate image coordination between the real world images and the virtual environment. Thus, in other words, the position tracking system is configured to enable absolute positioning coordination between the real and virtual environments.

As explained above, in prior art systems, the position tracking functionality is provided by large external rigs, which incorporate cameras for -7 -tracking infrared dots on the headset. In contrast, a position tracking system according to the present invention, once calibrated, uses discernible features or markers within the real world environment to track movement of the head mounted display therein.

In an exemplary embodiment of the invention, the position tracking system is integrated within the mixed reality system itself, and the processor 104 is configured accordingly. However, it will be appreciated that, in alternative exemplary embodiments, the position tracking system may be provided as a bolt-on or stand-alone application and the present invention is not necessarily intended to be limited in this regard.

Referring to Figure 3 of the drawings, a processor is configured to, initially, perform a calibration process in order to generate a "model" of the real world environment 201 in which the mixed reality system is to be used. Image data is required for this process and, in an exemplary embodiment, where a pair of image capture devices is already provided on the headset, the image data captured thereby can also be used for this purpose. However, it will be appreciated that dedicated image capture devices may be provided for this purpose, if necessary.

The calibration process may require the user, wearing the headset, to do a 360° turn within the real world environment 201, or otherwise performing an action that allows the image capture devices to capture sequential images of the entire space 201 to be monitored. The processor may be configured to automatically identify, using any suitable image recognition technique, discernible features 200 within the space 201, such as corners, objects, etc., and their relative locations therein (due to the sequential movement of the user during the calibration process), thereby enabling a three dimensional model of the space 201 to be generated and defined in terms of the detected discernible features 200 and their relative positions.

Following the calibration process, as the user moves around the defined space 201, the processor is configured to detect the transformation in the position of the above-mentioned discernible features within the captured images, and thereby to determine any changes in orientation and position of the -8 -headset. Since the cameras used in this exemplary embodiment of the invention provide stereoscopic image data, as described above, the system may thus be configured to determine not only vertical and horizontal translation of the discernible features 200, but also the depth translation.

In some cases, especially within a real world environment that does not have enough discernible features to enable an accurate model of the environment to be generated during the calibration process, it may be necessary to provide additional measures to ensure that there is no reduction in tracking quality. Since the fidelity of the tracking is directly related to virtual reality sickness and disorientation, this may, at least in some cases, be essential and the processor may be configured to assess the quality of the model generated during an initial calibration process and, if necessary, provide an indication that further features are required to achieve the required tracking fidelity.

Improved tracking fidelity may be achieved in a number of different ways.

For example, additional markers 202 may be provided within the space 201 which, for example, may take the form of self-adhesive labels or the like that the user can apply to selected areas of the space 201, thereby enabling the system to track them in the same way, and in addition to, the integral discernible features therein.

Accuracy can be further increased by combining the above-described visual tracking method with the output from an internal accelerometer and/or gyroscope provided in or on the head set. This would have the additional benefit of allowing the system to be configured to filter apparent errors between the accelerometer/gyroscope readings and the output from the visual tracking system, without significant adverse effect on the responsiveness of the system.

It will be apparent to a person skilled in the art from the foregoing that modifications and variations can be made to the described embodiments without departing from the scope of the invention as claimed. -9 -

Claims

CLAIMS1. A position tracking system for a head mounted display, the system comprising at least one image capture device mounted or mountable in or on said head mounted display, and a processor configured, in use, to: a) perform a calibration process in which it receives image data from said at least one image capture device in respect of a user's environment, identifies discernible features therein and stores a three-dimensional representation of said environment in terms of said discernible features; and b) after said calibration process, receive image data from said at least one image capture device in respect of said user's environment, identify in said image data one or more transformations between said discernible features, compare said one or more transformations with said three-dimensional representation of said user's environment, and thereby track the position and/or orientation of said head mounted display within said environment.
2. A system according to claim 1, comprising a pair of spatially separated image capture devices for capturing respective images of the real world environment in the vicinity of the user, said processor being configured to define a depth map using respective image frame pairs to produce three dimensional image data.
3. A system according to claim 1 or claim 2, wherein said processor is configured automatically identify predetermined discernible features within said captured images, and store data representative thereof, together with data representative of their relative locations within said user's environment.
4. A system according to claim 3, wherein said processor is further configured to analyse said three dimensional representation of said user's environment in terms of said predetermined discernible features, determine if sufficient predetermined discernible features are included and, if not, provide an output indicative thereof.
5. A system according to any of the preceding claims, wherein said processor is configured to identify predefined markers located within said user's environment and store data representative thereof, together with data representative of their relative locations within said environment.
6. A system according to any of the preceding claims, further comprising a motion sensor, wherein said processor is configured to receive signals from said motion sensor, compare said signals with said identified transformations, and determine their correspondence.
7. A mixed reality apparatus comprising a headset for placing over a user's eyes, in use, said headset including a screen, image capture means for capturing images of the real world environment in the vicinity of a user, and a processor configured to generate a selected three-dimensional virtual reality environment and blend images of said real world environment into said three-dimensional virtual reality environment to create a mixed reality environment and display said mixed reality environment on said screen, the processor being further configured, in use, to: a) perform a calibration process in which it receives image data from said at least one image capture device in respect of a user's environment, identifies discernible features therein and stores a three-dimensional representation of said environment in terms of said discernible features; and b) after said calibration process, receive image data from said at least one image capture device in respect of said user's environment, identify in said image data one or more transformations between said discernible features, compare said one or more transformations with said three-dimensional representation of said user's environment, and thereby track the position and/or orientation of said head mounted display within said environment.
8. Apparatus according to claim 7, further comprising a motion sensor mounted in or on said headset, wherein said processor is further configured to receive signals from said motion sensor, compare said signals with said identified transformations, and determine their correspondence.
9. Apparatus according to claim 7 or claim 8, comprising a pair of spatially separated image capture devices for capturing respective images of the real world environment in the vicinity of the user, said processor being configured to define a depth map using respective image frame pairs to produce three dimensional image data.