CA3058447A1

CA3058447A1 - A system for generating and displaying interactive mixed-reality video on mobile devices

Info

Publication number: CA3058447A1
Application number: CA3058447A
Authority: CA
Inventors: Milan Baic
Original assignee: Spacecard Inc
Current assignee: Spacecard Inc
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2021-04-10
Also published as: WO2021068052A1

Abstract

. , ABSTRACT
A mixed reality system for generating and displaying mixed-reality video allowing a user to view and interact with the real world through the use of target(s) and a wearable device.
The wearable device includes a personal electronic device with a camera configured to capture real video in order to determine the path of the target(s) based on the real video.
Implementations can enable users to perform gestures with the target each corresponding to a function. The wearable device also includes an optical instrument which functions to uniformly redirect light traversing the instrument in order to allow the mixed reality environment to be displayed to the user perpendicular to the user's field of vision while allowing the camera of the personal electronic device to capture the user's hands in a natural position.

Description

A SYSTEM FOR GENERATING AND DISPLAYING INTERACTIVE
MIXED-REALITY VIDEO ON MOBILE DEVICES
FIELD OF THE INVENTION
The present invention relates generally to a system for generating and displaying mixed-reality video, and more particularly to systems for generating and displaying mixed-reality video using a personal electronic device and low cost hardware to enable a user to ergonomically perform gestures to manipulate objects in a mixed reality environment.
BACKGROUND OF THE INVENTION
Mixed reality is the merging of real and virtual worlds to produce new environments and visualizations where physical and digital objects co-exist and interact in real time. Mixed reality is a hybrid of reality and virtual reality, encompassing both augmented reality and augmented virtuality via immersive technology. Mixed reality systems often include a headset, which incorporates display, and optionally other interaction technology into the framework of the headset, to present a mixed reality environment directly in front of the user's eyes.
Mixed reality systems are used in a variety of different applications. Common to all of these applications is that an environment with some level of virtualization is presented to the user. This environment may extend from nearly real (e.g. a live video stream overlaid with virtual objects) to almost entirely virtual (e.g. computer animated but influenced by real-world objects). Mixed reality not just overlays, but anchors virtual objects to real-world objects and allows the user to interact with combined virtual and real objects. The specific environment presented by the current mixed reality system may vary depending on the application.
In low cost mixed reality systems, the display and interaction technology is implemented using commercial off-the-shelf technology, such as tablet computers, smart phones or other personal electronic devices. These devices commonly include one or more cameras and provide sufficient computing power to adequately implement the mixed reality environment. The personal electronic device is often attached to or integrated into the headset of the mixed reality system. This allows the display of the personal electronic device to align with the user's eyes and the camera to align directly in front, such that images and video are recorded at the user's eye level (i.e. towards the horizon).

In order to blend virtuality with reality, physical elements (e.g., physical objects or people) are dynamically integrated into the mixed reality space. These elements are presented to the user by way of the personal electronic device such that the user can interact with physical elements in real time. Real-world sensor information (e.g., video, gyroscopes, etc.) is collected by the mixed reality system and is used to provide external inputs to control a mixed reality environment and provide context for the mixed reality view.
To enable interaction, mixed reality systems generally require controllers.
These controllers are operated by users and sensed by the mixed reality system.
Mixed reality systems often include sensors that measure position, motion, orientation and various other environmental conditions of certain tracked objects. In low cost mixed reality systems that use personal electronic devices, the primary sensor is typically the device's camera. Although cost effective, using the camera to measure environmental conditions, by way of the camera's video, creates ergonomic challenges. Assuming that the display of the personal electronic device is perpendicular to the user's line of sight, and by consequence the field of view of the camera is in line with the user's line of sight, the user is: 1) forced to operate the controllers in a raised position in order to operate the controller(s) in the field of view of the camera; or 2) is required to tilt their head downwards and operate the controllers in a natural position. Both cases create ergonomic issues through the use of the controllers.
SUMMARY OF THE INVENTION
The invention is a mixed reality system for a user to wear, to view and interact with mixed reality video. The system includes a headset, a plurality of gestures and a personal electronic device.
The headset is configured to be worn by the user, on the user's head, and includes a harness and optical instrument. The headset is further configured to receive the personal electronic device and maintain the display of the personal electronic device in the field of view of the optical instrument at an angled position. The headset's harness is configured to hold the personal electronic device proximate to the user's eyes. The harness may include an enclosure, having an open and closed position, to hold the personal electronic device.
The harness may further include a magnet that, when the enclosure is in the closed position, magnetically locks the enclosure such that the personal electronic device is held in the enclosure. The optical

2 instrument is located proximate to the user's eyes, between the user's eyes and the personal electronic device, and has a field of view. The optical instrument is configured to uniformly redirect light traversing the optical instrument to angle the field of view presented to the user.
The angle of redirection may be within a range of 10 to 50 degrees or a more narrow range of 20 to 40 degrees. The optical instrument may include a prism or a mirror. The optical instrument may even be a combination lens with a Fresnel prism in combination with a Fresnel lens. The harness may include an elastic material which forms a loop around the user's head creating tension in the elastic material to attach the headset to the individual's head.
Each of the plurality of predefined gestures corresponds with a function. A
user can manipulate the targets in the field of view of the optical instrument to execute predefined gestures. The gestures may correspond to virtual object manipulation. The plurality of targets may include a physical controller with recognizable patterns. The physical controller may attach magnetically to the headset. The recognizable patterns may:
= include curved two-dimensional shapes;
= be oriented in a triangle pattern on the physical controller; and = be of a plurality of predefined colors within a predefined color space.
The plurality of predefined colors are base colors within the predefined color space.
The predefined color space is preferably a hue-saturation-value, hue-saturation-intensity or hue-saturation-luminosity color space.
The personal electronic device has a camera, display, and a computer processor. The camera is configured to capture real video. The display is located in the field of view of the optical instrument and is configured to present to the user mixed reality video which includes real video captured from the camera and virtual objects.
The computer processor is configured to perform the following functions.
First, the computer processor is configured to process real video captured by the camera to identify one or more targets within the real video. Second, the computer processor is configured to analyze the movement of the one or more identified targets. Third, the computer processor is configured to determine whether the movement of the one or more targets matches any of the predefined gestures. Fourth, when the movement of the one or more targets matches one of the

3 predefined gestures, the computer processor is configured to modify the mixed reality video according to the function corresponding to the matched gesture.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a perspective view of an embodiment of the headset with the personal electronic device pocket in the closed position.
Figure 2 is a perspective view of the headset thereof with the personal electronic device pocket in the open position.
Figure 3 is a front elevation view of the headset thereof with the personal electronic device pocket in the closed position.
Figure 4 is a top view of the headset thereof with the personal electronic device pocket in the closed position.
Figure 5 is a back elevation view of the headset thereof with the personal electronic device pocket in the closed position.
Figure 6 is a side elevation view of the headset thereof with the personal electronic device pocket in the closed position, the side panels removed and the right controller in place.
Figure 7 is a side elevation view of the headset thereof with the right controller offset from its place on the side paneling.
Figure 8 is a front view of an embodiment of controllers.
Figure 9 is a perspective view of the headset in use. The controllers are held by the user. The stippled lines indicate the field of view of the user's mixed reality.
Figure 10 is a side view of the headset with the angles of the lens and personal electronic device relative to the facial plane indicated.
DETAILED DESCRIPTION OF THE INVENTION
The present disclosure describes systems and methods for implementing interactive mixed reality spaces.
The invention is a system and method for providing a low cost, ergonomic solution for providing interactive mixed reality spaces. Through connecting a personal electronic device,

4 the user may interact with the mixed reality space as it is implemented on the device. The headset worn by the user allows for the mixed reality space to be presented to the user. The headset is designed to allow the display of the personal electronic device to be visible immediately in front of the user's eyes. The headset is also configured to include an optical instrument. This instrument allows the camera of the personal electronic device to obtain real video of the controllers being held in a natural position while displaying the mixed reality environment perpendicular to the user's line of sight. The real video is then analyzed by the personal electronic device to detect gestures. Each gesture corresponds to a function. Once a gesture is detected, the associated function is performed by the personal electronic device and the mixed reality space is updated.
The personal electronic device includes a front mounted display, one or more rear facing cameras and one or more programmable processors. The user is presented with the mixed reality space on the display of the personal electronic device. The display is held in front of the user's eyes, immersing the user in the mixed reality space.
To keep the display in place, the display of the headset remains in a fixed position relative to the user's eyes. In effect, the headset attaches the personal electronic device to the user's head.
The headset includes two key components.
1) a harness configured to hold a personal electronic device proximate to the user's eyes; and 2) an optical instrument.
The headset is preferably constructed primarily of polypropylene.
Polypropylene is preferred due to its low cost and flexibility. However, the headset may be constructed out of any suitable material including wood, metal, plastics, fabrics, animal hide or any combination of materials. The headset may optionally include additional features and components. For instance, the headset may include detachable controllers.
The harness comprises an attachment mechanism that allows the harness to be worn by the user. In one embodiment, a headband suspension system is used in combination with a

5 pocket for receiving the personal electronic device. However, any suitable attachment mechanism may be used.
The headband suspension system may be a loop formed from an elastic material which allows the suspension system to be worn around the user's head. When worn by the user, the loop operates to fix the position of the pocket, the personal electronic device and the optical instrument in front of the user's eyes through tension. The loop stretches to encircle the user's head, and is easily adjustable should the user need to position the headset prior to or during engagement with the virtual space.
The pocket is an enclosure that may be formed from movable outer panel attached to two of the headset's surfaces. The movable outer panel may be fixed to one surface and removably attached through three magnetic closures to another. The pocket is used to hold a personal electronic device and is not necessarily fully enclosed. There are openings exposing the display and camera. The display openings found beneath the flap of the enclosure allow the user to immerse himself or herself in the virtual space when the headband suspension system is worn and the mixed reality system is in use. The camera opening allows for the camera of the personal electronic device to sense the environment and capture video in front of the user.
The mixed reality system makes use of an optical instrument. The optical instrument is located between the user's eyes and the display of the personal electronic device.
Alternatively, the optical instrument may be located in front of the camera of the personal electronic device such that the camera is located in the field of view of the optical instrument.
The purpose of the optical instrument, when used in the system, is to perform a non-dispersive adjustment to light traversing the instrument such that the field of view presented by the optical instrument appears to be displaced vertically. The optical instrument in a preferred .. embodiment is a combination lens with a Fresnel prism in combination with a Fresnel lens.
However, other optical instruments that perform non-dispersive adjustments to light traversing the instrument may be used. In alternative embodiments, the optical instrument may be a mirror or a lens with non-dispersive prism correction. The optical instrument preferably adjusts the angle of transmission of light traversing the optical instrument within the range of 10 to 50 degrees, and more preferably within the range of 20 to 40 degrees.

6 . .
The headset may optionally include additional components such as detachable controllers. In one embodiment, these controllers magnetically attached to the headset and are removable by the user. The controllers, when detached and in use by the user, may be tracked by the system allowing the user to perform gestures. The controllers have three patterns, representing keypoints, printed on their faces. Keypoints are points in an image that define what is interesting or what stands out in the image. These patterns, when captured through video, are tracked by software executing computer vision algorithms on the personal electronic device. The patterns are tracked by the software to detect and recognize gestures, with each gesture translating to a change or function in the mixed reality system. By executing a predetermined set of hand gestures with the controllers, the user is able to interact with the mixed reality space.
Figure 1 provides an illustration of a preferred embodiment of the headset from a side-angled, closed pocket perspective. The user's personal electronic device in the embodiment is secured in a pocket 100 at the front of the headset. The pocket can be easily opened and closed by the user to insert and remove their personal electronic device by lifting a flap 101. The flap 101 is attached to the outer face 102 of the pocket 100 which contains a camera hole 103. The camera hole 103 is positioned such that it will align with a commonly sized personal electronic device's camera when the device is placed in the pocket 100.
The side of the pocket 104 is not enclosed. The side panels of the headset include protrusions 105 which extend from the two panels 106 on either side, to form a barrier on either side of the pocket 100. The protrusions 105 extend across the space between the inner pocket face 104, and the outer pocket face 102 to form a barrier which assist in holding the personal electronic device in place. The side panels 106 are connected to a top panel 107, which is situated perpendicular to the pocket 100, and the side panels 106. The top panel 107 attaches magnetically to the flap 101 in order to close the pocket 100. Each side panel 106 is connected to a visor panel 108. Where each of the side panels 106 meet with the corresponding visor panels 108, there is a seam 109. The seam 109 is flexible and allows each of the visor panels 108 to angle outwards when worn. The visor panel 108 includes a magnetic backing which allows each controller 110 to easily attach and detach.

7 . .
The headband suspension system 111 is attached to each of the visor panels 108. The suspension system operates by creating elastic tension in the headband, such that the headset is held in place by the two attachment points on the visor panels. This tension also works to hold the display of the personal electronic device, which is attached to the headset in a fixed position relative to the user's eyes.
Figure 2 is a depiction of the preferred embodiment of the headset depicted by Figure 1 from a slight side angle with the pocket in an open position. Three magnets 200 are included as part of the interior of the flap 101. These magnets allow the flap to attach magnetically to the top panel 107. The flap 101 is connected to an outer panel 102 of the pocket 100. The flap 101 attaches to three magnets 201 of opposite polarity on the top panel 107 of the headset. This magnetic mechanism operates to allow the user to open and close the pocket 100 by gently lifting the flap 101, or placing it on the top panel 107.
The interior face of the outer panel 107 has a thin gel-like polymer 202. The gel-like polymer 202 is also found on the center of a narrow panel 203 between the inner 204 and outer panels 107 of the pocket 100. This gel-like polymer 202 provides a sticky surface to hold a personal electronic device in the pocket 100. The inner panel 204 of the pocket 100 has two rectangular-shaped holes 205. These holes correspond to the position of the user's eyes and allow the user to see the display on the personal electronic device.
Figure 3 provides a front view of the preferred embodiment of the headset depicted by Figure 1 with the pocket 100 in its closed position. A cutout 300 in the outer panel 107 allows the camera of the personal electronic device to capture video of the surroundings. The protrusions 105 of the side panel are also visible and extend at an angle in front of the device.
Figure 4 provides a top view of the preferred embodiment of the headset depicted by Figure 1 with headset with the pocket 100 in its closed position. Facial padding 400 extends across the top of a pair eyeholes 401 on the facial panel 402. Between the eyeholes 401 and the front pocket is the optical instrument 403. The facial padding 400 curves around the bottom outer edges of the eyeholes 401 along the sides of the facial panel 402. The facial padding 400 also lines the interior of each visor panel 108.
The facial padding 400 extends beyond the base of the facial panel 402 towards the center of a triangular cutout 404 along the center of the base of the facial panel 402. The

8 triangular cutout 404 is oriented such that the tip is pointing upwards towards the top panel 107. The facial padding 402 within the triangular cutout 406 (the "nose padding"), is shaped in two semi-circular extensions. The extensions rest against the sides of the user's nose, allowing for a more comfortable experience. The extensions are flexible and protects the bridge of the nose through the use of the small strip of padding.
Figure 5 is a rear view of the preferred embodiment of the headset depicted by Figure 1 with the personal electronic device pocket 100 in a closed position. The suspension system is constructed of an elastic material connected to the visor panels 108. The visor panels 501 are approximately one inch away from, and parallel to, the facial panel 402.
The suspension system 502 is positioned such that the bottom edge of the suspension system aligns with the top of the eyeholes 505 on the facial panel. The suspension system 502 is sized to comfortably fit a user's head. The suspension system forms a loop such that it can wrap behind the user's head. The suspension system, when worn by the user, maintains tension on the visor panels 108. This allows the headset to remain on their user's face securely.
Figure 6 is a side cross-sectional view of the preferred embodiment of the headset depicted by Figure 1 with the pocket 100 in the closed position. The optical instrument 403 is slightly angled away from eyeholes 401 in the facial panel 402 towards the pocket 100. The optical instrument 403 is the same shape as the facial panel 403 situated in line with the eyeholes 401. The optical instrument 403 attaches to the inside of the headset through the use of two mechanisms. First, the optical frame 600 attaches to the interior of the top panel 107.
The magnets 200, which also function to close the flap 101, are positioned about one inch from an inner pocket panel 204 of the pocket 100 in the center of the top panel 107. Second, the optical instrument 403 is attached using two screws 601 positioned towards the bottom of the optical instrument 403.
Figure 7 is an elevated side view of the preferred embodiment of the headset depicted by Figure 1 with the right controller 110 offset from its place on the right visor panel. There is a controller on each side of the headset. The controllers 110 attach to the magnetic visor panels 108. A triangular panel 700, connected to the side panel, is attached to the visor panel. The triangular panel attaches to the visor panel using a hook-and-loop fastener (also commonly referred to as velcro). One side of the hook-and-loop fastener is attached to the visor panel 180

9 . .
and the other corresponding piece of the hook-and-loop fastener is attached to the triangular panel 700.
Figure 8 depicts the inner face of the controllers for the preferred embodiment of the headset depicted by Figure 1. Both controllers 110 have a white face surrounding three keypoints 801. The keypoints are arranged in a triangular pattern shaped like an arrow head pointing towards the top of the controllers 110. The right and left controllers have alternating blue and red keypoints to facilitate low resolution processing of the mixed reality world. The left controller has a red keypoint at the top of the triangular pattern and a blue key point on each of the sides of the triangular pattern. Similarly, the right controller has a blue key point at the top of the triangular pattern and a red key point on each of the side of the pattern. The inner face of the both the left and right controllers also include a raised texture 801.
Operation of the System The system may operate in a single user mode and a group presentation mode. In single user mode, the user inserts their personal electronic device into the pocket of the headset, and wears the headset on their head to view and interact with mixed reality video.
This mode immerses the user in the mixed reality environment by converting the display of the mixed-reality device into a stereoscopic dual display that renders a mixed reality environment for each eye. The accelerometers and gyroscopes of the smartphone are used to track the user's head position. The user interacts with the mixed reality video by executing predefined hand gestures with the controllers. These gestures represent actions or functions to be performed in the virtual space.
In a group presentation mode, the system operates similarly to the single user mode except for two key differences. First, in group presentation mode, the display of the personal electronic device is broadcast to an external monitor. Second, the user may wear the headset in a location other than on the user's head (such as around the user's neck) while still using the rear-facing camera of the phone to track the user's hands in real-time. Using the system in this manner allows the user to broadcast their mixed reality environment, allowing others to view and participate in the experience.
In both the single and group presentation modes, the mixed reality system can broadly be described to operate by:

= presenting a virtual space to the user;
= operating controllers;
= sensing the environment;
= tracking the user controllers;
= detecting a gesture; and = updating the virtual space with the function corresponding to the gesture.
Presenting a virtual space to the user The virtual space is implemented in software, executing on the one or more programmable processors of the personal electronic device. In one embodiment, the virtual space is implemented using the Unity game engine, a suite of visual development tools and reusable software components. The Unity game engine facilitates the creation of virtual spaces in both 2D and 3D. For 3D spaces, the Unity lD game engine allows specification of texture compression, mipmaps, and resolution settings for each platform that the game engine supports. The Unity game engine also provides support for bump mapping, reflection mapping, parallax mapping, screen space ambient occlusion, dynamic shadows using shadow maps, render-to-texture and full-screen post-processing effects.
The software uses computer vision processing algorithms that make use of the camera and flash light or torch of the personal electronic device to process and track the left and right controllers in real-time and in varied lighting conditions. Controllers in sight of the camera (i.e.
within the interaction area) and detected by the software, are represented by virtual "hands"
with reticles that the user can then use to interact with the virtual space in real-time. This technology converts the camera into a virtual reality sensor.
Once the controllers are in sight of the camera or within the interaction area, the "virtual hands" are displayed in the virtual reality space on the personal electronic device. The user is then able to interact with virtual objects through the use of their virtual hands. These hands are recreated virtually in a Cartesian plane as two 'pointers' positioned in 3 dimensional space and offer the user the following 6 degrees of freedom:
1. moving forward and backward on the X-axis;

. ,.
2. moving left and right on the Y-axis;
3. moving up and down on the Z-axis;
4. tilting side to side on the X-axis;
5. tilting forward and backward on the Y-axis; and 6. turning left and right on the Z-axis.
In group presentation mode, the display of the personal electronic device is broadcast to an external monitor. In a preferred embodiment, this is done using a wireless protocol such as Wifi, 5G or AirPlay protocol.
Operating Controllers The user operates the controllers in response to information provided by the virtual environment. The user may perform a set of gestures, each corresponding to a function within the virtual space. These gestures are performed by the user using one or more controllers. In a preferred embodiment, these controllers are physical with blue and red keypoints. However, other types of controllers may be used with other embodiments of the invention. A controller may be any target with recognizable patterns and may even be the user's hand.
The system is designed such that the interaction zone, the zone where the personal electronic device may detect the user controllers, allows the user to operate the controllers in a natural position. The natural position is the posture the body assumes when it is at rest. The natural position is characterized by the following:
= The user's controllers (or targets) are predominantly held below the shoulders in a relaxed position = The user's head should be pulled back so that the user's ears are in line with the user's shoulders;
= The user's chin should be slightly tucked in so that the top of the head points upwards;
= The user's shoulders should be pulled back and down;
= The user's aims should be relaxed by the sides; and . .
= The user's hand should be rotated to a relaxed position.
Operating the controllers in a natural position is associated with reduced strain on the user's muscles, tendons, ligaments and joints. This serves to improve user comfort in operating the system.
Figure 9 illustrates the user operating the controllers in the preferred embodiment depicted by Figure 1. The user 900 is operating two controllers 110 in the interaction zone 901 to perform gestures. The personal electronic device is capturing real video of the interaction zone, through the camera hole 103, to track the controllers detect gestures.
The relative angle of the interaction zone with respect to the user's facial plane enables users to perform gestures in a natural position. The user is not required to perform gestures in above their shoulders or otherwise contort their body to perform gestures.
Sensing the environment The mixed reality system makes use of various sensors provided by the personal electronic device. These sensors include cameras, accelerometers and gyroscopes.
The front facing camera is used to detect whether the personal electronic device has been placed in the pocket. The user, when first initializing the system, must load the app on the personal electronic device and provide certain information. Once the user has provided the necessary information to initialize or initiate the app, the app directs the user to place the personal electronic device in the pocket of the headset. The software then captures video from the front camera and, when the software detects low light conditions indicating that the device is securely in the pocket, the app completes the initialization process and presents the virtual space to the user.
The rear facing camera of the personal electronic device captures real video from the environment and makes the video available for the processor of the personal electronic device to identify and track the user's controllers. The rear facing camera must capture video in a variety of different lighting environments and ensure that each frame of the video is exposed properly. In order to ensure adequate video quality, the system may optionally manually control the personal electronic device's native camera sensor settings. In a preferred embodiment, an ultra-high shutter speed (i.e. 1/800t1i of a second or faster) with high ISO
ranges (i.e. IS01000 or higher) and manual focus is used.
Specifically, the rear facing camera captures video of the interaction zone enabling the software to detect and track the location of the controllers. Figure 10 illustrates the relative angle of the interaction zone relative to the user's facial plane in the preferred embodiment depicted by Figure 1. Figure 10 depicts the interaction zone 900, which is offset from the user's field of view by an angle 1000. The pocket 100 is angled downwards so that the camera of the personal electronic device, when operating from within the pocket, can capture video of the user manipulating the controllers in a natural hand position. However, the optical instrument 403 adjusts the incident angle of the display so that the display appears to be oriented along the user's facial plane 1001.
The gyroscope and accelerometer of the personal electronic device are used to measure the orientation and any rotation of the personal electronic device. The gyroscope can monitor and adjust the display position, orientation, direction, angular motion and rotation. The accelerometer senses acceleration, tilt, tilt angle, incline, rotation, vibration and collision. This information is collected and used by the software to help determine the position and orientation of the personal electronic device and to make any necessary adjustments in the mixed reality space. Changes to the position and orientation of the personal electronic device represents changes to the user's perspective. For instance, the rotation of the personal electronic device may represent the user turning their head in a particular direction.
The position and orientation of objects within the mixed reality space may be adjusted differently depending on their frame of reference. In the mixed reality space, objects may be stationary, move with the user, maintain a fixed position in the user's field of view, or may be held in the user's virtual hands. When head tracking with the gyroscope and accelerometer, the system has to correct for the angle that the camera of the personal electronic device is offset from a horizontal plane so that even if the personal electronic device is naturally angled below the horizontal plane, the system adjusts the orientation of the mixed reality space so that it appears that the user is looking straight ahead. Without this correction the user would view the mixed reality space as if they were looking downwards.

. .
Tracking the user controllers The invention tracks the controller(s) using real video from the environment.
An algorithm executes on the processor of the personal electronic device to process the real video and identify the controllers. The algorithm roughly operates by: 1) capturing a frame of real video; 2) locating keypoints in the frame of real video; and 3) matching keypoints to identify controllers from keypoint pairs.
The algorithm first captures a frame of real video from the camera and converts the frame to the hue, saturation and lightness (HSL) color space. Use of the HSL
color space is preferred to the use of the red, green, blue color space for identifying single color keypoints since the image lightness is separated from the color information (i.e. hue and saturation). The separation of lightness from the color information allows for simplified thresholding rules that use only the saturation and hue.
Next, the algorithm locates keypoints in the frame of real video by thresholding to create a binary image. Initially a simple binary thresholding algorithm is used with preset ranges. These ranges define certain values within the color space that corresponds to the keypoints (i.e. red and blue keypoints). One binary image is created for red keypoints and another is created for blue keypoints.
Finally, a "match search" is performed to identify controllers from keypoint pairs. The binary output of each threshold is used to search for the red and blue keypoints. Secondary keypoints are then grouped into potential pairs if they are within a certain distance apart.
Finally, primary points that are within threshold distance from matched secondary pairs are grouped to form a complete match. Once a search finds a controller, subsequent searches will search only the area where the red keypoints are to be expected to be to improve performance and reduce noise.
Should the match search algorithm fail to detect keypoints, the software relies on a structure from motion computer vision algorithm to estimate the key point locations and identify the controller position. Structure from motion is a photogrammetric range imaging technique for estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals. In a preferred embodiment, the algorithms are implemented using the OpenCVTm computer vision and machine learning software library.

Once the controllers are detected, the tracking algorithm uses the shape and size of the keypoints to determine the position and rotation of the controllers using geometry. The position is tracked across the video frames to determine the velocity and acceleration of the controllers. In a preferred embodiment, a smoothing operation is performed to reduce jitter.
Detecting Gestures Gesture recognition occurs from within the Unity game engine based on the 3D
position determined by the software. The tracking algorithm detects and tracks the following gestures via the rear-facing camera(s): tap (or click) intensity; twist to grab; rotate/orbit;
scale/zoom; and swipe from edge.
The tap (or click) intensity gesture is detected by measuring the positional delta along the Z axis. This enables the algorithm to detect a fast movement along the Z
axis in real time.
Z movements that accelerate in the positive direction and suddenly slow down represent a click or 'tap'. If the virtual hand reticle is hovering over a button or object during this state, the object is tapped or clicked. How fast the user taps, increases or decreases the Z delta in a given time frame can changes the intensity of the tap. For example, when hitting a virtual button or drum, the user can change the intensity of their drum tap with higher acceleration. This increase in intensity could correspond to the volume of a drum tap sound effect.
The twist to gab gesture is detected by measuring the rotation of a virtual hand reticle around the Z axis. This twist motion places a virtual object within the vicinity of the virtual hand in a 'drag' state that enables the user to pick up and move the object.
The twist to grab gesture is equivalent to a drag-n-drop with a mouse cursor, the algorithm has thresholds in the 'roll' axis that trigger the drag state once the rotation threshold is reached on top of an object (e.g. 35 degrees or more).
The rotate and orbit gestures are detected by dragging a virtual hand reticle over an object. The drag gesture can be used to rotate the object in 3D space. The drag gesture can also be used to orbit around a space by being performed on top of virtual orbit 'anchors' similar to orbiting around an object in 3D modelling software with one hand.
The scale and zoom gestures are detected by dragging two virtual hands simultaneously towards or away from each other over a virtual object. When the distance between the two . .
'drag' state hands changes over the virtual object, the change in distance of the 'drag' state hands can change the size or scale of that object in 3D space similar to pinching and zooming on a multi-touch display.
The swipe from edge gesture can be detected by dragging a virtual hand over the edge of the interaction area. All 4 edges of the interaction area can be used to summon various programmable gestures similar to 'hot spots' or 'hot corners' on a desktop or laptop computer.
For example, if the user moves their hand up in the interaction where either hand moves off the top of the interaction area, and then back down, a menu can be summoned with this gesture similar to a notifications centre swipe down on an i0S operating system device. As another example, if the user moves their hands out of view from the top right corner of the interaction area, a 'back' gesture can be summoned to initiate a back sequence to the previous screen -similar to the back button in mobile app.
Group Presentation Mode The system may include a group presentation mode enabling the user to operate the system while allowing others to view and participate in the mixed reality experience. Instead of being worn over the face and the user's eyes, the visor may be worn around or hanging from the user's neck to avoid obstructing the user's vision. In this configuration, the personal electronic device remains in the headset's pocket and should be positioned around or above the user's chest area to place the headset in an optimal position for tracking gestures.
The mixed reality space is rendered on the personal electronic device using virtual cameras. Both the single-user and the group presentation mode's virtual cameras operate simultaneously in the virtual environment with only one virtual camera being rendered at a time.
The first mode, single-user mode, renders a stereoscopic image through a viewport rendered on the display of the personal electronic device. When the user views the image in this mode through the headset's optical instrument, the stereoscopic image converges into a single image. In this mode, there are dedicated virtual cameras, one for each eye, that move in accordance to the micro-electro-mechanical systems (MEMs) sensors (e.g.
accelerometer, gyroscope, etc.) of the personal electronic device to mimic head movement to recreate a sense of virtual presence within a virtual environment. This mode is best suited for single-user mixed reality experiences.
The second mode, the group presentation mode, renders the same environment as the first mode but does not split the display of the personal electronic device into a stereoscopic image. Rather, the mode displays a full screen singular image that spans the display. A group presentation mode is useful when the mixed-reality environment is to be viewed through a traditional display, and not as a stereoscopic image. However, the user's ability to navigate the mixed-reality environment is limited when the headset is worn around the neck (or otherwise as the case may be) since there is no way for the system to register the user looking up or down.
Thus, in a group presentation mode, the camera moves in accordance to the smartphone MEMs sensors but only along the horizontal axis to mimic a torso 'swivel' to recreate a sense of virtual presence within a virtual environment as seen through an external display.
Gesture tracking uses the full field of view of the personal electronic device's camera as the interaction zone for real time target tracking. Typically a camera captures images at a horizontally wide aspect ratio of either 4:3 or 16:9. This aspect ratio may be translated to ensure that the 4 corners of the interaction zone correlate with the 4 corners of the rendering mode.
Otherwise, the interaction zone will not match the user's line of sight.
In a group presentation mode, when the user wears the headset around their neck or otherwise, they cannot view the display of the personal electronic device but instead rely on a mirrored image of the display transmitted to an external display. To mirror the mixed-reality environment group presentation mode can use various protocols including "Screen Mirroring via Wireless Protocol", "Screen Mirroring via Wired Protocol" and "Controller Host and Client via Wireless Protocol."
Screen Mirroring via Wireless Protocol is used to mirror the image from the personal electronic device's display in real time using: 1) Wi-Fi or a Bluetooth network; 2) a Chromecast or AirPlay receiver; and 3) an external display to render the image. Likewise, Screen Mirroring via Wired Protocol is used to mirror the image from the personal electronic device's display in real time using 1) a cable to connect the personal electronic device to an external computing device; and 2) an external display to render the image.
Controller Host and Client via Wireless Protocol is another way the system can extend the mixed reality environment to an external device. Controller Host and Client via Wireless Protocol does not mirror the actual smartphone display but rather broadcasts gestures performed on the smartphone to a host application via a Wi-Fi or Bluetooth0 network. This mode is akin to wireless controller protocols that broadcast coordinates and gesture states via a network on an external device such as a laptop or tablet computer.
At least some of the elements of the systems described herein may be implemented by software, or a combination of software and hardware. For example, all the described processing functions may be implemented in software that runs on one or more programmable processors.
Elements of the system that are implemented via software may be written in a high-level procedural language such as object oriented programming or a scripting language.
Accordingly, the program code may be written in C, C++, J++, or any other suitable programming language and may comprise modules or classes, as is known to those skilled in object oriented programming. At least some of the elements of the system that are implemented via software may be implemented using prewritten software libraries or written in assembly language, machine language or firmware as needed. In any case, the prop-am code can be stored on storage media or on a computer readable medium that is readable by a general or special purpose programmable computing device having a processor, an operating system and the associated hardware and software that is necessary to implement the functionality of at least one of the embodiments described herein. The program code, when read by the computing device, configures the computing device to operate in a new, specific and predefined manner in order to perform at least one of the methods described herein.
While the teachings described herein are in conjunction with various embodiments for illustrative purposes, it is not intended that the teachings be limited to such embodiments. On the contrary, the teachings described and illustrated herein encompass various alternatives, modifications, and equivalents, without departing from the described embodiments, the general scope of which is defined in the appended claims. Except to the extent necessary or inherent in the processes themselves, no particular order to steps or stages of methods or processes described in this disclosure is intended or implied. In many cases the order of process steps may be varied without changing the purpose, effect, or import of the methods described.

. .
Where, in this document, a list of one or more items is prefaced by the expression "such as" or "including", is followed by the abbreviation "etc.", or is prefaced or followed by the expression "for example", or "e.g.", this is done to expressly convey and emphasize that the list is not exhaustive, irrespective of the length of the list. The absence of such an expression, or another similar expression, is in no way intended to imply that a list is exhaustive. Unless otherwise expressly stated or clearly implied, such lists shall be read to include all comparable or equivalent variations of the listed item(s), and alternatives to the item(s), in the list that a skilled person would understand would be suitable for the purpose that the one or more items are listed.
The words "comprises" and "comprising", when used in this specification and the claims, are to used to specify the presence of stated features, elements, integers, steps or components, and do not preclude, nor imply the necessity for, the presence or addition of one or more other features, elements, integers, steps, components or groups thereof As used herein, the term "exemplary" or "example" means "serving as an example, instance, or illustration," and should not be construed as preferred or advantageous over other configurations disclosed herein.
As used herein, the terms "about", "approximately", and "substantially" are meant to cover variations that may exist in the upper and lower limits of the ranges of values, such as variations in properties, parameters, and dimensions. In one non-limiting example, the terms "about", "approximately", and "substantially" may be understood to mean plus or minus 10 percent or less.
Unless defined otherwise, all technical and scientific terms used herein are intended to have the same meaning as commonly understood by one of ordinary skill in the art.
The scope of the claims that follow is not limited by the embodiments set forth in the description. The claims should be given the broadest purposive construction consistent with the description and figures as a whole.

Claims

What is claimed is:

1. A mixed reality system for a user to wear and to view and interact with mixed reality video, the system comprising:
a headset configured to be worn by the user on the user's head, the headset comprising:
a harness configured to hold a personal electronic device proximate to the user's eyes; and an optical instrument proximate to the user's eyes, between the user's eyes and the personal electronic device, the optical instrument having a field of view;
a plurality of predefined gestures, each of the gestures corresponding to a function;
the personal electronic device comprising:
a camera configured to capture real video;
a display located in the field of view of the optical instrument, the display being configured to present to the user mixed reality video comprising real video captured by the camera and virtual objects;
a computer processor configured to:
process the real video captured by the camera to identify one or more targets within the real video;
analyze movement of the one or more targets;
determine whether the movement matches any of the predefined gestures; and when the movement matches one of the predefined gestures, modify the mixed reality video according to the function corresponding to the matched gesture, wherein the headset is configured to receive the personal electronic device and maintain the display of the personal electronic device in the field of view of the optical instrument in an angled position, and wherein the optical instrument is configured to uniformly redirect light traversing the optical instrument to angle the field of view presented to the user.

2. The mixed reality system of claim 1, wherein the optical instrument comprises a prism.

. ,.

3. The mixed reality system of claim 2, wherein the optical instrument is a combination lens with a Fresnel prism in combination with a Fresnel lens.

4. The mixed reality system of claim 1, wherein the optical instrument comprises a mirror.

5. The mixed reality system of claim 1, wherein the optical instrument is configured to uniformly redirect light traversing the optical instrument at an angle of more than 10 degrees but less than 50 degrees.

6. The mixed reality system of claim 5, wherein the optical instrument is configured to uniformly redirect light traversing the optical instrument at an angle of more than 20 degrees but less than 40 degrees.

7. The mixed reality system of claim 1, wherein the target is manipulable by the user in the field of view of the optical instrument to execute the predefined gestures

8. The mixed reality system of claim 1, wherein the target comprises a physical controller with recognizable patterns.

9. The mixed reality system of claim 8, wherein the physical controller is the user's hand.

10. The mixed reality system of claim 8, wherein the recognizable patterns comprise curved two-dimensional shapes.

11. The mixed reality system of claim 8, wherein the recognizable patterns are oriented in a triangle pattern on the physical controller.

12. The mixed reality system of claim 8, wherein the recognizable patterns are of a plurality of predefined colors within a predefined color space.

13. The mixed reality system of claim 8, wherein the physical controller attaches magnetically to the headset.

14. The mixed reality system of claim 1, wherein at least one of the plurality of predefined gestures corresponds to virtual object manipulation.

15. The mixed reality system of claim 1, wherein the harness comprises an elastic material, wherein the elastic material forms a loop around the individual's head creating tension in the elastic material to attach the headset to the individual's head.

16. The mixed reality system of claim 1, wherein the harness comprises an enclosure, having an open position and a closed position, wherein the enclosure is sized to fit an electronic device.

17. The mixed reality system of claim 16, wherein the harness comprises a magnet that, when the enclosure is in the closed position, magnetically locks the enclosure such that the personal electronic device is held in the enclosure.

18. The mixed reality system of claim 1, wherein the headset is attached to a headband suspension system comprising an elastic material.

19. A mixed reality system for a user to wear and to view and interact with mixed reality video, the system comprising:
a headset configured to be worn by the user on the user's head, the headset comprising:
an optical instrument having a field of view; and a harness configured to hold a personal electronic device proximate to and between the user's eyes and the optical instrument; and a plurality of predefined gestures, each of the gestures corresponding to a function;
the personal electronic device comprising:

. ..
a camera located in the field of view of the optical instrument, the camera being configured to capture real video;
a display configured to present to the user mixed reality video comprising real video captured by the camera and virtual objects;
a computer processor configured to:
process the real video captured by the camera to a identify one or more targets within the real video;
analyze movement of the one or more targets;
determine whether the movement matches any of the predefined gestures; and when the movement matches one of the predefined gestures, modify the mixed reality video according to the function corresponding to the matched gesture, wherein the headset is configured to receive the personal electronic device and maintain the camera of the personal electronic device in the field of view of the optical instrument, and wherein the optical instrument is configured to uniformly redirect light traversing the optical instrument to angle the field of view presented to the camera of the personal electronic device.

20. The mixed reality system of claim 19, wherein the optical instrument comprises a mirror.

21. The mixed reality system of claim 19, wherein the optical instrument comprises a prism.

22. The mixed reality system of claim 19, wherein the optical instrument is configured to uniformly redirect light traversing the optical instrument at an angle of more than 10 degrees but less than 50 degrees.

. I

23. The mixed reality system of claim 22, wherein the optical instrument is configured to uniformly redirect light traversing the optical instrument at an angle of more than 20 degrees but less than 40 degrees.

24. The mixed reality system of claim 19, wherein the target is manipulable by the user in the field of view of the optical instrument to execute the predefined gestures.

25. The mixed reality system of claim 19, wherein the target comprises a physical controller with recognizable patterns.

26. The mixed reality system of claim 19, wherein the physical controller is the user's hand.

27. The mixed reality system of claim 25, wherein the recognizable patterns comprise curved two-dimensional shapes.

28. The mixed reality system of claim 25, wherein the recognizable patterns are oriented in a triangle pattern on the physical controller.

29. The mixed reality system of claim 25 wherein the recognizable patterns are of a plurality of predefined colors within a predefined color space.

30. The mixed reality system of claim 25, wherein the physical controller attaches magnetically to the headset.

31. The mixed reality system of claim 19, wherein at least one of the plurality of predefined gestures corresponds to virtual object manipulation.

32. The mixed reality system of claim 19, wherein the harness comprises an elastic material, wherein the elastic material forms a loop around the individual's head creating tension in the elastic material to attach the headset to the individual's head.

. .

33. The mixed reality system of claim 19, wherein the harness comprises an enclosure, having an open position and a closed position, wherein the enclosure is sized to fit an electronic device.

34. The mixed reality system of claim 33, wherein the harness comprises a magnet that, when the enclosure is in the closed position, magnetically locks the enclosure such that the personal electronic device is held in the enclosure.

35. The mixed reality system of claim 19, wherein the headset is attached to a headband suspension system comprising an elastic material.