WO2018112643A1 - System and method for providing virtual reality interface - Google Patents

System and method for providing virtual reality interface Download PDF

Info

Publication number
WO2018112643A1
WO2018112643A1 PCT/CA2017/051568 CA2017051568W WO2018112643A1 WO 2018112643 A1 WO2018112643 A1 WO 2018112643A1 CA 2017051568 W CA2017051568 W CA 2017051568W WO 2018112643 A1 WO2018112643 A1 WO 2018112643A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
environment
gaze
hotspot
target
Prior art date
Application number
PCT/CA2017/051568
Other languages
French (fr)
Inventor
Junquan ZUO
Peng Wang
Pan Pan
Chunlong Yang
Yiting LONG
Original Assignee
Eyexpo Technology Corp.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eyexpo Technology Corp. filed Critical Eyexpo Technology Corp.
Priority to CA3047844A priority Critical patent/CA3047844A1/en
Publication of WO2018112643A1 publication Critical patent/WO2018112643A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/0093Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00 with means for monitoring data relating to the user, e.g. head-tracking, eye-tracking
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/017Head mounted
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/41407Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • H04N21/42206User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
    • H04N21/42222Additional components integrated in the remote control device, e.g. timer, speaker, sensors for detecting position, direction or movement of the remote control, microphone or battery charging device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4722End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting additional data associated with the content
    • H04N21/4725End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting additional data associated with the content using interactive regions of the image, e.g. hot spots
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6587Control parameters, e.g. trick play commands, viewpoint selection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/858Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot
    • H04N21/8586Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot by using a URL
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01PMEASURING LINEAR OR ANGULAR SPEED, ACCELERATION, DECELERATION, OR SHOCK; INDICATING PRESENCE, ABSENCE, OR DIRECTION, OF MOVEMENT
    • G01P13/00Indicating or recording presence, absence, or direction, of movement
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/0101Head-up displays characterised by optical features
    • G02B2027/0138Head-up displays characterised by optical features comprising image capture systems, e.g. camera
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/0179Display position adjusting means not related to the information to be displayed
    • G02B2027/0187Display position adjusting means not related to the information to be displayed slaved to motion of at least a part of the body of the user, e.g. head, eye

Definitions

  • the present disclosure relates to Virtual Reality (VR) interface and more particularly, to systems, methods, and computer-readable media for providing an interface for a VR environment.
  • VR Virtual Reality
  • VR Virtual Reality
  • UI/UX user interface/user experience
  • a method of interfacing a user with a Virtual Reality (VR) environment comprises detecting, by a motion detection sensor, a gaze of the user directed towards a target in the VR environment, the motion detection sensor being coupled to a mobile phone mounted to the head of the user; triggering a voice control mode to receive a voice command from the user when the gaze of the user is detected to be directed towards the target in the VR environment; and processing the voice command received from the user.
  • VR Virtual Reality
  • a method of interfacing a user with a Virtual Reality (VR) environment comprises obtaining a VR environment; rendering the VR environment on a display associated with a computing device; receiving a user input to define a hotspot in the VR environment, detecting and determining if a position of a focus of the user is at the hotspot; and when the position of the focus of the user is determined at the hotspot, activating a function associated with the hotspot.
  • VR Virtual Reality
  • a non-transitory computer readable memory containing instructions for execution by a processor, the instructions when executed by the processor perform a method of interfacing a user with a Virtual Reality (VR) environment.
  • the method comprises: detecting, by a motion detection sensor, a gaze of the user directed towards a target in the VR environment, the motion detection sensor being coupled to a mobile phone mounted to the head of the user; triggering a speech recognition mode to receive a voice command from the user when the gaze of the user is detected to be directed towards the target in the VR environment; and processing the voice command received from the user.
  • FIG. 1 is a schematic depiction of a system for interfacing a user with a VR environment
  • FIG. 2 is an architecture for implementing a speech recognition controller, according to one embodiment
  • FIG. 3 A is an example of the user triggering the speech recognition controller
  • FIG. 3B is an example of the user giving a voice command after the speech recognition controller is triggered
  • FIG. 3C is an example of the result of the user giving the voice command
  • FIG. 4 is an example of a user interface providing a virtual tour of a real estate property
  • FIG. 5 is an example of the user interface showing the inside of the real estate property
  • FIG. 6 is a method for providing a method of interfacing a user with a VR environment according to one embodiment of the description
  • FIG. 7 is a method for providing a method of interfacing a user with a VR environment according to another embodiment of the description
  • a general aspect of several embodiments of the description relates to providing a Virtual Reality (VR) navigation system to provide a user interface/user experience (UI/UX) that enhances and improves user interaction with the VR environment.
  • VR Virtual Reality
  • UI/UX user interface/user experience
  • the VR navigation system provides a user with a speech recognition controller that can be triggered by a particular user behavior.
  • a speech recognition controller When the user gazes into the top area of a VR environment (e.g., an area equal to or above 60 or 75 degree above the horizon in the 360 degree space), the speech recognition controller can be activated and voice commands can be received and processed to control the VR environment.
  • a static Field-of-View (FoV) of the VR environment can be enhanced by providing at least one interactive hotspot that can be edited, animated or linked to other views.
  • the user is allowed to edit and customize the VR environment.
  • the VR navigation system may provide preset templates which can be used easily by the user.
  • the generated and/or edited VR environment can be uploaded or shared through a network to other user(s).
  • a user may use a computing device for purposes of interfacing with a computer-simulated, VR environment.
  • a computing device can be, but not limited to, a personal computer (PC), such as a laptop, desktop, etc., or a mobile device, such as a Smartphone, tablet, etc..
  • PC personal computer
  • a Smartphone may be, but not limited to, an iPhone running iOS, an Android running the Android operating system, or a Windows phone running the Windows operating system.
  • the mobile device can be used in connection with a VR headset or other head-mounted VR device for the user to view the VR environment.
  • the mobile device and the VR headset each has a screen and when used in combination provides the user with a dual-screen system for viewing of the VR environment.
  • the VR headset can be designed so as to allow the user to couple and decouple the mobile device with the VR headset.
  • the mobile device can be inserted in the VR headset and held in place by certain fastening device.
  • the mobile device displayed image may be split into two, one for each eye. The result can be a stereoscopic ("3D") image with a wide FoV.
  • a non-limiting example of such VR headset is the Google CardboardTM headset or similar head mounts built out of simple, low-cost components.
  • the VR environment can be generated based on stitching a sequence of images into a composite image.
  • the composite image provides the user with a larger (e.g., panoramic) FoV than each individual image alone.
  • the sequence of images can be captured by an image capturing unit coupled to the mobile device, retrieved locally from a memory coupled to the computing device, or remotely via the network.
  • the computer-generated, VR environment can be a view of a real-world environment including, but not limited to, a virtual tour of a geographic location or site (such as an exhibition site, a mining site, a theme park, etc.), a real estate property, a real life experience such as a shopping experience, a medical procedural, etc..
  • the computing device may be operably coupled to a server via a network, such as an internet, LAN, WAN, and/or cellular network, which retrieves data from a content repository connected to the server.
  • a network such as an internet, LAN, WAN, and/or cellular network, which retrieves data from a content repository connected to the server.
  • the method may be performed online where data are retrieved or updated real-time through the server from the content repository.
  • the method may be performed offline without connection to the network, and data be retrieved locally from the memory.
  • the computing device may be operably coupled via the network, such as a wireless, Bluetooth and/or cellular network, to other computing devices, such as a personal computer (PC), e.g., a laptop, desktop, etc., or a mobile device, e.g., a Smartphone, tablet, etc..
  • the generated and/or edited VR environment can be shared through the network to other users.
  • the VR environment can be viewed within a browser environment hosted by the computing device in the form of a WEB page.
  • a WEB page may contain a Hypertext Markup Language (HTML) document and/or an Internet Programming Language, for example, JAVATM Scripts.
  • the computing device can execute a VR application implementing the UI/UX.
  • the computing device includes a display for displaying the VR environment.
  • the computing device is coupled to a sound capturing unit for capturing or receiving at least one voice command from the user and an image capturing unit for image capturing.
  • One or more audio outputs can be coupled to the computing device for assisting user interaction with the VR environment and/or providing additional information about the VR environment.
  • the computing device further includes a processing unit or processor connected to the display, audio output, the sound capturing unit and the image capturing unit.
  • the position of the focus of the user can be represented by a cursor shown on the display.
  • the cursor can be moved by the user moving a pointing device (such as a hand-held mouse trackball, stylus, touchpad, etc.) coupled to the personal computer or mobile device.
  • the position of the focus of the user can be tracked by tracking the gaze of the user.
  • One or more motion detection sensors coupled to the mobile device can provide input to the VR environment for tracking the gaze of the user.
  • the one or more motion detection sensors coupled to the mobile device can detect or measure the linear acceleration (e.g., the tilt) of the mobile device and can take the form of, but are not limited to, a gyroscope and/or an accelerometer, such as a G sensor.
  • Other motion sensing devices or elements such as a magnetometer, an orientation sensor (e.g., a theodolite), etc., may also be coupled to the mobile device and used for tracking the gaze of the user.
  • any movement of the head of the user can be captured by the motion detection sensor coupled to the mobile device.
  • the movement of the head of the user can be translated into movement of the gaze of the user and used to determine and analyze the focus of the user.
  • the VR headset may include another mobile computing device and/or functionalities of the mobile device, such as a touch screen, a motion detection sensor, etc.
  • the VR headset can connect to the network using one or more wired and/or wireless communications protocols.
  • FIG. 1 illustrates an example of an environment 100 for implementing aspects in accordance with an embodiment of the present invention.
  • the environment 100 can include a mobile device 102, which can be a mobile communication device or similar portable or handheld device, Smartphone or tablet type device.
  • the mobile device 102 comprises at least a processing unit or processor 104 and a memory 106 coupled to the processing unit 104 and a networking interface
  • a microphone or sound capturing unit 108 is provided for capturing or receiving at least one voice command from the user.
  • the mobile device 102 can also be coupled to an image capturing unit 118 for capturing one or more images.
  • the image capturing unit 118 can be a built-in camera, or an external lens, such as a fisheye lens, which can be removably attached to the mobile device 102.
  • At least one motion detection sensor 110 is provided for detecting or measuring the motion of the mobile device.
  • the processing unit 104 renders a VR environment on a display 112 providing the user with a VR experience.
  • the mobile device 102 may be equipped with other input device(s) (not shown) to enable user input, such as a physical keyboard, a virtual keyboard (e.g. touchscreen), etc.
  • the processing unit 104 can provide specific information and/or interaction tailored for the user, based on the user's input captured by the input devices including the sound capturing unit 108 and motion detection sensor 110.
  • One or more speakers or audio outputs 116 are provided for assisting user interaction with the VR environment and/or providing additional information about the VR environment.
  • the mobile device 102 can be used in connection with a VR headset or head-mounted VR device 103.
  • the VR headset or head-mounted VR device 103 includes a screen which when used in connection with the display 112 of the mobile device provides a dual-screen system for viewing of the VR environment.
  • the mobile device 102 can be operatively coupled to the VR headset 103, e.g., by inserting the mobile device 102 to the VR headset 103 and held in place by certain fastening device.
  • the environment 100 can also include a personal computer 122, which can be a laptop, desktop or any other computing device capable of hosting a browser or executing a software application for VR environment.
  • the personal computer 122 comprises at least a processing unit or processor 124 and a memory 126 coupled to the processing unit 124 and a networking interface 134.
  • a microphone or sound capturing unit 128 is provided for capturing or receiving at least one voice command from the user.
  • a pointing device 130 (such as a hand-held mouse trackball, stylus, touchpad, etc.) is coupled to the personal computer 122 to enable the user to move the cursor shown on a display 132.
  • the personal computer 122 may be equipped with other input device(s) (not shown) to enable user input, such as a physical keyboard, a virtual keyboard (e.g. touchscreen), etc.
  • the processing unit 124 renders a VR environment on the display 132 providing the user with a VR experience.
  • the processing unit 124 can also provide specific information or interaction tailored for the user, based on the user's input captured by the input devices including the sound capturing unit 128 and the pointing device 130.
  • One or more speakers or audio outputs 136 are provided for assisting user interaction with the VR environment and/or providing additional information about the VR environment.
  • the mobile device 102 and/or the personal computer 122 may be operably coupled to a server 142 and a content repository 140 via a network 138.
  • the network 138 can include one or more networks for facilitating communication between the mobile device 102, the personal computer 122, and the server 142, such as but not limited to wireless networks, cellular networks, intranet, Internet, local area networks, or any other such network or combination thereof.
  • the content repository 140 stores data for use of rendering the VR environment to the user and also stores the UI/UX for use with the VR environment.
  • the content repository 140 may provide an application or software for execution on the mobile device 102 or the personal computer 122. In some embodiments, the method may be performed online where data are retrieved or updated real-time from the content repository 140 through the server 142.
  • the mobile device 102 and/or the personal computer 122 may also be operably coupled to another one or more computing devices 144, such as a PC, e.g., a laptop, desktop, etc., or a mobile device, e.g., a Smartphone, tablet, etc., through the network 138 for sharing of the computer-generated, VR environment with other users through the network 138.
  • a PC e.g., a laptop, desktop, etc.
  • a mobile device e.g., a Smartphone, tablet, etc.
  • Speech is a natural and easy method of interaction that people utilize every day. As well, speech can provide rich commands and context information to enhance communication.
  • speech recognition technology is embedded in the VR navigation system. Compared to most existing VR interaction solutions, the VR navigation system according to the embodiments of the description does not require the use of extra devices, such as a trackpad or vision sensor mounted to a VR headset. Instead, the VR navigation system provides enriched user interaction by way of receiving and processing speech commands by the user. According to the embodiments of the description, the user is able to give one or more voice commands to the VR navigation system to for example, switch between different views and/or communicate with the system to obtain additional information or content.
  • the speech recognition controller is activated throughout the time the VR navigation system is being used, the accuracy of the system may be dropped down. This is particularly concerning as speech recognition performance in a mobile device can be easily affected by noise or other non-ideal recording environment.
  • the speech recognition system in mobile browsers have limited functions. As well, when the mobile device is mounted to the head, no controller or buttons can be pressed without the use of an external device, which makes it difficult to improve the functionality of the user interface.
  • a particular user behavior can be used to trigger the speech recognition controller in order to inform the VR navigation system when the speech recognition controller should be activated.
  • the top area in the VR environment is the least gazed area by the user.
  • the top area refers to an area equal to or above 60 or 75 degree above the horizon in the 360 degree space in a VR environment and can be used as a switch to activate the speech recognition controller. It should however be understood that the top area can refer to an area of different angles above or below horizon in the 360 degree space. As well, other area that is considered less gazed by the user can be used to trigger the speech recognition controller.
  • FIG. 2 illustrates an architecture for implementing the gaze behavior triggered speech recognition controller, according to one embodiment of the description.
  • a VR environment e.g., WEB page
  • the user can insert the mobile device in the VR headset and wears it on the head.
  • the VR environment can be generated based on for example, a composite image stitched from images captured by an image capturing unit, or retrieved from a memory either locally or remotely. Any movement of the head of the user can then be captured by the motion detection sensor 110 coupled to the mobile device.
  • a motion detection module 154 receives input from the motion detection sensor 110 and analyses the movement of the head of the user.
  • a gaze analyzer 156 translates the movement of the head of the user into movement of the gaze of the user and determines whether the gaze of the user is directed towards a target in the VR environment.
  • the start position of the user head can be projected to a default location in the 360 space, and the movement of the gaze can be obtained based on for example, the amount of tilt of the head of the user measured by the motion detection sensor 110 from the start position. When the amount of tilt of the user head is above a certain threshold, it can be determined that the gaze of the user is directed towards the target in the VR environment.
  • the position of the gaze of the user may be shown as a cursor on the display 112 to provide visual cues to the user. When the user moves the head, the cursor is moved correspondingly.
  • the user can move the head which is detected by the motion detection sensor 110 and in turn moves the cursor to an interactive spot (e.g., a door of a room).
  • the gaze of the user can be received by the system and trigger an action associated with the interactive spot.
  • the user can be presented a view of the scene the user would like to see.
  • the user may not be given any option to navigate the VR environment in a customized manner.
  • the gaze analyzer 156 when the gaze of the user is detected to be directed towards the target in the VR environment, the gaze analyzer 156 enables a speech recognition controller 150 (i.e., a voice control mode) to receive one or more voice commands through the sound capturing unit 108.
  • the speech recognition controller may be triggered as soon as the gaze is detected to come within the target area in the VR environment to facilitate fast processing of the user command.
  • the VR navigation system can enable the user to send a quick signal by looking up to indicate that the user would like to trigger the voice control mode.
  • the speech recognition controller may be triggered when the gaze is detected to stay within the target area for a default amount of time (e.g., a couple seconds) to provide a longer response time, for example, based on the user behavior learned by the system.
  • the voice command processor 152 then performs speech/voice recognition of the sound received from the sound capturing unit 108.
  • the gaze analyzer 156 disables or turns off the speech recognition controller 150 (i.e., a voice control off mode or a gaze-only mode) and no voice command will be input through the sound capturing unit 108.
  • the speech recognition controller 150 i.e., a voice control off mode or a gaze-only mode
  • the user can give a voice command (e.g., of 3-4 seconds long), and then the user may look down to turn off the speech recognition input.
  • the VR navigation system can continue to provide an output associated with the user's voice command, for example, to provide information or explanation associated with the user's question.
  • Computer program 158 receives inputs from the gaze analyzer 156 and voice command processor 152.
  • the computer program 158 customizes the output of the VR environment based on the user input received from the gaze analyzer 156 and the voice command processor 152.
  • the visual aspect of the VR environment is output to the display 112, and the audio aspect of the VR environment is output to the audio output 116
  • FIG. 3A is an example of the user triggering a speech recognition controller.
  • the speech recognition controller When the user looks up to the top area of the 360 degree space of e.g., more than 60 degree, the speech recognition controller will be activated and the user will be provided with cues to give voice command(s).
  • FIG. 3B is an example of the user giving a voice command after the speech recognition controller is triggered.
  • the user provides a voice command "Go to the kitchen"; the system processes the voice command and follows the order that the user has given.
  • FIG. 3C illustrates the result of the system processing the voice command, e.g., the view is changed to the kitchen environment.
  • the overall VR experience can be more convenient and fast.
  • a particular user behavior e.g., looking up
  • the voice recognition would not be kept on the entire time, but only when the user need it. It greatly increases the accuracy of the voice recognition, and helps the user to interact with the VR environment in an easier way.
  • the system can provide customized view(s) to the user and the user can communicate with the VR navigation system. For example, the user can ask questions related to the VR environment and be provided with relevant information and/or content.
  • the system can take the noise or conversation as communication with the system and the noise or conversation can become interference to the processing of the VR navigation system. It is discovered that the user in the VR environment gazes the least into the top area of the 360 space (partially because users may assume that the top area is the least likely area to contain useful information). By capturing the user's behavior of looking up, the system can assume the user's deliberate intention to trigger voice recognition and the accuracy of system performance can be increased. As well, the less used area in the 360 space can be used in the VR navigation system.
  • the functionalities provided according to various embodiments of the description can be provided by many units already included in the existing devices, such as the motion detection sensor 110, the sound capturing unit 108, 128, etc., as described. Therefore, the VR navigation system can save the user the cost of external devices and provide a UI/UX that enhances and improves user interaction with the VR environment.
  • the VR speech recognition technology can be implemented in Android operating system, an iOS operating system, or the like, or on a PC platform. According to some embodiments of the description, the described functions are added to the HTML5 speech recognition Application Program Interface (API). Interactive and Animated hotspot in static panorama environment
  • Static panoramic images provided in the traditional UI/UX are limited in their content and do not provide any user interaction.
  • a static panoramic FoV of the VR environment can be enhanced by providing at least one interactive hotspot that can be changed and/or animated for interaction with the user.
  • a panorama image can be provided with at least one hotspot with different associated states.
  • animation and/or different state(s) of the hotspot can be triggered and/or displayed.
  • the VR navigation system can configure the fridge in a room as the hotspot in a real estate property VR environment, and when the user gazes at the fridge, the door of the fridge can open and the content of the fridge can be shown.
  • the different states in this case can include the fridge being fully closed, the fridge being half open, the fridge being fully open, etc..
  • a simple animation of the fridge being opened can be displayed.
  • Other examples of the different states can include a real estate property during the day, or during the night, the garage being open, or close, etc.
  • the user can be provided with more information and content.
  • the VR environment interface is calibrated when inserting at least one hotspot into the VR environment, to ensure that the different states and/or animation be displayed seamlessly. For example, when it is intended to show different states of a certain object, if the different states of the object are not aligned, the user can observe a shift of the various objects in the scene. To avoid user misperception, a number of reference points can be used to align the images and/or animation. In accordance with one embodiment, quintuple reference points are used which provide a superior result in turns of system performance and accuracy. [0063] Interactive and/or animated hotspots inserted into the 360 degree VR environment improve and increase the hierarchy of the VR environment. Users can easily find out which part of the environment is the highlight or important part.
  • the VR navigation system allows the user to edit the VR environment, such as adding a hotspot, connecting different views, embedding multimedia contents, embedding 3D models, embedding GoogleTM maps, or the like.
  • the VR navigation system provides preset templates which can be used easily by the user.
  • the user can simply drag and drop a selected template into the VR environment view. For instance, a hotspot can be generated when the user clicks on a hotspot button, and the user can drag it to adjust its position in the 360 space of the VR environment to create a corresponding hotspot.
  • the VR navigation system enables the user to define at least one region in the VR environment to associate with a hotspot.
  • a corresponding added function can be activated, such as connecting to a different view, playing multimedia contents such as an audio or video content, displaying a picture, displaying a 3D model, displaying a GoogleTM map.
  • the computing device is connected via the network (e.g., via Bluetooth or other wireless, cellular or other networks) to other computing devices, the generated VR environment can be shared and displayed in the other computing devices.
  • the computing device can send a link, e.g., a WEB link, representing the generated VR environment to the other computing devices and the VR environment can be viewed by the users in the form of a WEB page.
  • the VR environment interface can be optimized for the mobile environment, where the VR environment view created can be shared through a WEB link and other user can view it using a web browser of their own device.
  • the graphics processing unit (GPU) of an average Smartphone would have difficulties rendering such high-resolution images along with all the possible embedded multi-media UI/UX
  • the generated VR environment view can have a mobile version with a reduced resolution and optimized data types for the UI/UX. This also reduces the amount of loading time and data usage, which significantly improves the overall user experience.
  • the VR navigation system includes learning abilities to analyze popular interactive information which is used most frequently by the user(s).
  • the UI/UX can then move the most frequently used information closer to where a user starts the navigation of the VR environment, to facilitate user interaction with the popular elements.
  • the VR navigation system can analyze the user's behavior and/or operation (e.g., how quickly the user looks up to trigger the voice recognition mode, or how fast the user speaks, etc) and adjust itself accordingly to suit the user's need. For example, if some user is slow in looking up, the system can provide a longer response time to provide sufficient time for the user to complete the operation.
  • the UI/UX can be adjusted to provide the information more suitable and faster for the user.
  • the system can create a number of threads including a main thread and a local thread.
  • the local thread can be used to load and render the 3D model of the VR environment, and the main thread can be used for user interaction.
  • the local thread completes the 3D model rendering, it reports to the main thread of the completion and the main thread can output the 3D model to the display.
  • the 3D model can be provided with a faster loading speed and in a better quality.
  • the user interaction would not be interrupted or disturbed by the loading of the 3D model.
  • FIG. 4 is a user interface providing a virtual tour of a real estate property in accordance with one example of the description
  • FIG. 5 is an example of the user interface showing the inside of the real estate property in accordance with one example of the description.
  • FIG. 6 is a method 600 for providing a method of interfacing a user with a VR environment according to one embodiment of the description.
  • the motion detection sensor mounted to the head of the user detects (602) a gaze of the user directed towards a target in the VR environment.
  • the gaze of the user can be detected by detecting (603) a tilt of the head of the user and analyzing the tilt of the head of the user to determine a position of the gaze of the user.
  • a voice control mode is triggered (606) and a voice command can be received (608) from the user. For example, when the tilt of the head of the user is determined to be above a certain threshold (605), it can be determined that the gaze of the user is directed towards the target.
  • the voice command is recognized (610) and converted to text command.
  • the command then can be processed (612).
  • the voice control mode can be turned off (614) and no voice command will be input to the system.
  • FIG. 7 is a method 700 for providing a method of interfacing a user with a VR environment according to another embodiment of the description.
  • a VR environment is obtained (702) and rendered (704) on the display associated with a computing device.
  • the VR environment can be obtained based on stitching a sequence of images into a composite image.
  • a user input is received (706) to define a hotspot in the VR environment. This can include receiving a user input to define at least one region associated with the hotspot.
  • a position of a focus of the user is detected and determined (708), as described above. When the position of the focus of the user is at the hotspot, a function associated with the hotspot can be activated (710) to enhance user interaction with the VR environment.
  • the function associated with the hotspot can include connecting (710a) to a different view associated with the VR environment, or playing (710b) multimedia contents such as an audio or video content associated with the hotspot.
  • Other functions can be added and activated, such as displaying a picture, displaying a 3D model, displaying a GoogleTM map, etc..
  • the generated and/or edited VR environment can be shared (712) by way of a WEB link with other users through the network.
  • the method, acts or operations may be programmed or coded as computer-readable instructions and recorded electronically, magnetically or optically on a fixed or non-transitory computer-readable medium, computer-readable memory, machine-readable memory or computer program product.
  • the computer-readable memory or computer-readable medium comprises instructions in code which when loaded into a memory and executed on a processor of a computing device cause the computing device to perform one or more of the foregoing method(s).
  • a computer-readable medium can be any means that contain, store, communicate, propagate or transport the program for use by or in connection with the instruction execution system, apparatus or device.
  • the computer-readable medium may be electronic, magnetic, optical, electromagnetic, infrared or any semiconductor system or device.
  • computer executable code to perform the methods disclosed herein may be tangibly recorded on a computer-readable medium including, but not limited to, a CD-ROM, a DVD, RAM, ROM, EPROM, Flash Memory or any suitable memory card, etc.
  • the method may also be implemented in hardware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Optics & Photonics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A system, method, and computer-readable memory for interfacing a user with a Virtual Reality (VR) environment are described. The method comprises detecting, by a motion detection sensor, a gaze of the user directed towards a target in the VR environment, the motion detection sensor being coupled to a mobile phone mounted to the head of the user; triggering a voice control mode to receive a voice command from the user when the gaze of the user is detected to be directed towards the target in the VR environment; and processing the voice command received from the user.

Description

SYSTEM AND METHOD FOR PROVIDING VIRTUAL REALITY
INTERFACE
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of priority based on United States Application No. 62/438,646, filed on December 23, 2016, entitled, "SYSTEM AND METHOD FOR VIRTUAL REALITY INTERFACE", United States Application No. 62/565,251, filed on September 29, 2017, entitled, "SYSTEM AND METHOD FOR CREATING A VIRTUAL REALITY ENVIRONMENT", and United States Application No. 62/565,217, filed on September 29, 2017, entitled, "MOBILE DEVICE-ASSISTED CREATION OF VIRTUAL REALITY ENVIRONMENT", the disclosure of all of which is hereby incorporated by reference herein.
TECHNICAL FIELD
[0002] The present disclosure relates to Virtual Reality (VR) interface and more particularly, to systems, methods, and computer-readable media for providing an interface for a VR environment.
BACKGROUND
[0003] Virtual Reality (VR) technologies have been gaining interest in an increasing amount of areas. There has also been increasing interest in providing a user with a VR environment through the use of equipment that the user may already possess for other reasons, such as through the user's own personal computer (e.g., laptop, desktop, etc) or mobile device (e.g., Smartphone, tablet, etc). However, there are many limitations to traditional user interface/user experience (UI/UX) designs for these VR environments. Therefore there exists a need for an improved way of providing a user with an interface for a VR environment.
SUMMARY
[0004] The following presents a summary of some aspects or embodiments of the disclosure in order to provide a basic understanding of the disclosure. This summary is not an extensive overview of the disclosure. It is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. Its sole purpose is to present some embodiments of the disclosure in a simplified form as a prelude to the more detailed description that is presented later.
[0005] In accordance with an aspect of the present disclosure there is provided a method of interfacing a user with a Virtual Reality (VR) environment. The method comprises detecting, by a motion detection sensor, a gaze of the user directed towards a target in the VR environment, the motion detection sensor being coupled to a mobile phone mounted to the head of the user; triggering a voice control mode to receive a voice command from the user when the gaze of the user is detected to be directed towards the target in the VR environment; and processing the voice command received from the user.
[0006] In accordance with another aspect of the present disclosure there is provided a method of interfacing a user with a Virtual Reality (VR) environment. The method comprises obtaining a VR environment; rendering the VR environment on a display associated with a computing device; receiving a user input to define a hotspot in the VR environment, detecting and determining if a position of a focus of the user is at the hotspot; and when the position of the focus of the user is determined at the hotspot, activating a function associated with the hotspot.
[0007] In accordance with another aspect of the present disclosure there is provided a non-transitory computer readable memory containing instructions for execution by a processor, the instructions when executed by the processor perform a method of interfacing a user with a Virtual Reality (VR) environment. The method comprises: detecting, by a motion detection sensor, a gaze of the user directed towards a target in the VR environment, the motion detection sensor being coupled to a mobile phone mounted to the head of the user; triggering a speech recognition mode to receive a voice command from the user when the gaze of the user is detected to be directed towards the target in the VR environment; and processing the voice command received from the user.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] These and other features of the disclosure will become more apparent from the description in which reference is made to the following appended drawings. [0009] FIG. 1 is a schematic depiction of a system for interfacing a user with a VR environment;
[0010] FIG. 2 is an architecture for implementing a speech recognition controller, according to one embodiment;
[0011] FIG. 3 A is an example of the user triggering the speech recognition controller;
[0012] FIG. 3B is an example of the user giving a voice command after the speech recognition controller is triggered;
[0013] FIG. 3C is an example of the result of the user giving the voice command;
[0014] FIG. 4 is an example of a user interface providing a virtual tour of a real estate property;
[0015] FIG. 5 is an example of the user interface showing the inside of the real estate property;
[0016] FIG. 6 is a method for providing a method of interfacing a user with a VR environment according to one embodiment of the description;
[0017] FIG. 7 is a method for providing a method of interfacing a user with a VR environment according to another embodiment of the description;
DETAILED DESCRIPTION
[0018] A general aspect of several embodiments of the description relates to providing a Virtual Reality (VR) navigation system to provide a user interface/user experience (UI/UX) that enhances and improves user interaction with the VR environment.
[0019] In some embodiments, the VR navigation system provides a user with a speech recognition controller that can be triggered by a particular user behavior. When the user gazes into the top area of a VR environment (e.g., an area equal to or above 60 or 75 degree above the horizon in the 360 degree space), the speech recognition controller can be activated and voice commands can be received and processed to control the VR environment. [0020] In some embodiments, a static Field-of-View (FoV) of the VR environment can be enhanced by providing at least one interactive hotspot that can be edited, animated or linked to other views. In some embodiments, the user is allowed to edit and customize the VR environment. The VR navigation system may provide preset templates which can be used easily by the user. The generated and/or edited VR environment can be uploaded or shared through a network to other user(s).
[0021] It will be apparent, that the present embodiments may be practiced without some or all of the specific details. The other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present embodiments.
[0022] In some implementations, a user may use a computing device for purposes of interfacing with a computer-simulated, VR environment. Such a computing device can be, but not limited to, a personal computer (PC), such as a laptop, desktop, etc., or a mobile device, such as a Smartphone, tablet, etc.. A Smartphone may be, but not limited to, an iPhone running iOS, an Android running the Android operating system, or a Windows phone running the Windows operating system.
[0023] In some implementations, the mobile device can be used in connection with a VR headset or other head-mounted VR device for the user to view the VR environment. In these implementations, the mobile device and the VR headset each has a screen and when used in combination provides the user with a dual-screen system for viewing of the VR environment. The VR headset can be designed so as to allow the user to couple and decouple the mobile device with the VR headset. The mobile device can be inserted in the VR headset and held in place by certain fastening device. The mobile device displayed image may be split into two, one for each eye. The result can be a stereoscopic ("3D") image with a wide FoV. A non-limiting example of such VR headset is the Google Cardboard™ headset or similar head mounts built out of simple, low-cost components.
[0024] In some implementations, the VR environment can be generated based on stitching a sequence of images into a composite image. The composite image provides the user with a larger (e.g., panoramic) FoV than each individual image alone. The sequence of images can be captured by an image capturing unit coupled to the mobile device, retrieved locally from a memory coupled to the computing device, or remotely via the network. The computer-generated, VR environment can be a view of a real-world environment including, but not limited to, a virtual tour of a geographic location or site (such as an exhibition site, a mining site, a theme park, etc.), a real estate property, a real life experience such as a shopping experience, a medical procedural, etc..
[0025] In some implementations, the computing device may be operably coupled to a server via a network, such as an internet, LAN, WAN, and/or cellular network, which retrieves data from a content repository connected to the server. In some embodiments, the method may be performed online where data are retrieved or updated real-time through the server from the content repository. In some embodiments, the method may be performed offline without connection to the network, and data be retrieved locally from the memory.
[0026] The computing device may be operably coupled via the network, such as a wireless, Bluetooth and/or cellular network, to other computing devices, such as a personal computer (PC), e.g., a laptop, desktop, etc., or a mobile device, e.g., a Smartphone, tablet, etc.. The generated and/or edited VR environment can be shared through the network to other users. In some implementations, the VR environment can be viewed within a browser environment hosted by the computing device in the form of a WEB page. A WEB page may contain a Hypertext Markup Language (HTML) document and/or an Internet Programming Language, for example, JAVA™ Scripts. In some implementations, the computing device can execute a VR application implementing the UI/UX.
[0027] In some embodiments, the computing device includes a display for displaying the VR environment. In some embodiments, the computing device is coupled to a sound capturing unit for capturing or receiving at least one voice command from the user and an image capturing unit for image capturing. One or more audio outputs (e.g., speaker) can be coupled to the computing device for assisting user interaction with the VR environment and/or providing additional information about the VR environment. The computing device further includes a processing unit or processor connected to the display, audio output, the sound capturing unit and the image capturing unit. [0028] In the VR environment, the position of the focus of the user can be represented by a cursor shown on the display. In some implementations, the cursor can be moved by the user moving a pointing device (such as a hand-held mouse trackball, stylus, touchpad, etc.) coupled to the personal computer or mobile device.
[0029] When the user wears the VR headset inserted with the mobile device, the position of the focus of the user can be tracked by tracking the gaze of the user. One or more motion detection sensors coupled to the mobile device can provide input to the VR environment for tracking the gaze of the user.
[0030] The one or more motion detection sensors coupled to the mobile device can detect or measure the linear acceleration (e.g., the tilt) of the mobile device and can take the form of, but are not limited to, a gyroscope and/or an accelerometer, such as a G sensor. Other motion sensing devices or elements, such as a magnetometer, an orientation sensor (e.g., a theodolite), etc., may also be coupled to the mobile device and used for tracking the gaze of the user.
[0031] Once the user inserts the mobile device in the VR headset and wears it on the head, any movement of the head of the user can be captured by the motion detection sensor coupled to the mobile device. The movement of the head of the user can be translated into movement of the gaze of the user and used to determine and analyze the focus of the user.
[0032] In some implementations, the VR headset may include another mobile computing device and/or functionalities of the mobile device, such as a touch screen, a motion detection sensor, etc. In these implementations, the VR headset can connect to the network using one or more wired and/or wireless communications protocols.
[0033] FIG. 1 illustrates an example of an environment 100 for implementing aspects in accordance with an embodiment of the present invention.
[0034] The environment 100 can include a mobile device 102, which can be a mobile communication device or similar portable or handheld device, Smartphone or tablet type device. The mobile device 102 comprises at least a processing unit or processor 104 and a memory 106 coupled to the processing unit 104 and a networking interface
114. A microphone or sound capturing unit 108 is provided for capturing or receiving at least one voice command from the user. The mobile device 102 can also be coupled to an image capturing unit 118 for capturing one or more images. The image capturing unit 118 can be a built-in camera, or an external lens, such as a fisheye lens, which can be removably attached to the mobile device 102. At least one motion detection sensor 110 is provided for detecting or measuring the motion of the mobile device. The processing unit 104 renders a VR environment on a display 112 providing the user with a VR experience. The mobile device 102 may be equipped with other input device(s) (not shown) to enable user input, such as a physical keyboard, a virtual keyboard (e.g. touchscreen), etc. The processing unit 104 can provide specific information and/or interaction tailored for the user, based on the user's input captured by the input devices including the sound capturing unit 108 and motion detection sensor 110. One or more speakers or audio outputs 116 are provided for assisting user interaction with the VR environment and/or providing additional information about the VR environment.
[0035] The mobile device 102 can be used in connection with a VR headset or head-mounted VR device 103. The VR headset or head-mounted VR device 103 includes a screen which when used in connection with the display 112 of the mobile device provides a dual-screen system for viewing of the VR environment. When in use, the mobile device 102 can be operatively coupled to the VR headset 103, e.g., by inserting the mobile device 102 to the VR headset 103 and held in place by certain fastening device.
[0036] The environment 100 can also include a personal computer 122, which can be a laptop, desktop or any other computing device capable of hosting a browser or executing a software application for VR environment. The personal computer 122 comprises at least a processing unit or processor 124 and a memory 126 coupled to the processing unit 124 and a networking interface 134. A microphone or sound capturing unit 128 is provided for capturing or receiving at least one voice command from the user. A pointing device 130 (such as a hand-held mouse trackball, stylus, touchpad, etc.) is coupled to the personal computer 122 to enable the user to move the cursor shown on a display 132. The personal computer 122 may be equipped with other input device(s) (not shown) to enable user input, such as a physical keyboard, a virtual keyboard (e.g. touchscreen), etc. The processing unit 124 renders a VR environment on the display 132 providing the user with a VR experience. The processing unit 124 can also provide specific information or interaction tailored for the user, based on the user's input captured by the input devices including the sound capturing unit 128 and the pointing device 130. One or more speakers or audio outputs 136 are provided for assisting user interaction with the VR environment and/or providing additional information about the VR environment.
[0037] The mobile device 102 and/or the personal computer 122 may be operably coupled to a server 142 and a content repository 140 via a network 138. The network 138 can include one or more networks for facilitating communication between the mobile device 102, the personal computer 122, and the server 142, such as but not limited to wireless networks, cellular networks, intranet, Internet, local area networks, or any other such network or combination thereof. The content repository 140 stores data for use of rendering the VR environment to the user and also stores the UI/UX for use with the VR environment. The content repository 140 may provide an application or software for execution on the mobile device 102 or the personal computer 122. In some embodiments, the method may be performed online where data are retrieved or updated real-time from the content repository 140 through the server 142.
[0038] The mobile device 102 and/or the personal computer 122 may also be operably coupled to another one or more computing devices 144, such as a PC, e.g., a laptop, desktop, etc., or a mobile device, e.g., a Smartphone, tablet, etc., through the network 138 for sharing of the computer-generated, VR environment with other users through the network 138.
Speech Recognition Controller based on User Behavior
[0039] Speech is a natural and easy method of interaction that people utilize every day. As well, speech can provide rich commands and context information to enhance communication. According to some embodiments of the description, speech recognition technology is embedded in the VR navigation system. Compared to most existing VR interaction solutions, the VR navigation system according to the embodiments of the description does not require the use of extra devices, such as a trackpad or vision sensor mounted to a VR headset. Instead, the VR navigation system provides enriched user interaction by way of receiving and processing speech commands by the user. According to the embodiments of the description, the user is able to give one or more voice commands to the VR navigation system to for example, switch between different views and/or communicate with the system to obtain additional information or content.
[0040] However, if the speech recognition controller is activated throughout the time the VR navigation system is being used, the accuracy of the system may be dropped down. This is particularly concerning as speech recognition performance in a mobile device can be easily affected by noise or other non-ideal recording environment. Currently, the speech recognition system in mobile browsers have limited functions. As well, when the mobile device is mounted to the head, no controller or buttons can be pressed without the use of an external device, which makes it difficult to improve the functionality of the user interface.
[0041] According to some embodiments of the description, a particular user behavior can be used to trigger the speech recognition controller in order to inform the VR navigation system when the speech recognition controller should be activated. [0042] It is discovered that the top area in the VR environment is the least gazed area by the user. According to some implementations of the description, the top area refers to an area equal to or above 60 or 75 degree above the horizon in the 360 degree space in a VR environment and can be used as a switch to activate the speech recognition controller. It should however be understood that the top area can refer to an area of different angles above or below horizon in the 360 degree space. As well, other area that is considered less gazed by the user can be used to trigger the speech recognition controller.
[0043] FIG. 2 illustrates an architecture for implementing the gaze behavior triggered speech recognition controller, according to one embodiment of the description. [0044] Once the user opens, on the mobile device, a VR environment (e.g., WEB page), the user can insert the mobile device in the VR headset and wears it on the head. The VR environment can be generated based on for example, a composite image stitched from images captured by an image capturing unit, or retrieved from a memory either locally or remotely. Any movement of the head of the user can then be captured by the motion detection sensor 110 coupled to the mobile device. A motion detection module 154 receives input from the motion detection sensor 110 and analyses the movement of the head of the user. A gaze analyzer 156 translates the movement of the head of the user into movement of the gaze of the user and determines whether the gaze of the user is directed towards a target in the VR environment. The start position of the user head can be projected to a default location in the 360 space, and the movement of the gaze can be obtained based on for example, the amount of tilt of the head of the user measured by the motion detection sensor 110 from the start position. When the amount of tilt of the user head is above a certain threshold, it can be determined that the gaze of the user is directed towards the target in the VR environment. [0045] The position of the gaze of the user may be shown as a cursor on the display 112 to provide visual cues to the user. When the user moves the head, the cursor is moved correspondingly.
[0046] If the user would like to enter a different scene in the VR environment, for example, the user can move the head which is detected by the motion detection sensor 110 and in turn moves the cursor to an interactive spot (e.g., a door of a room). The gaze of the user can be received by the system and trigger an action associated with the interactive spot. As a result, the user can be presented a view of the scene the user would like to see. However, in a conventional design, the user may not be given any option to navigate the VR environment in a customized manner. [0047] According to some embodiments of the present description, when the gaze of the user is detected to be directed towards the target in the VR environment, the gaze analyzer 156 enables a speech recognition controller 150 (i.e., a voice control mode) to receive one or more voice commands through the sound capturing unit 108. In some implementations, the speech recognition controller may be triggered as soon as the gaze is detected to come within the target area in the VR environment to facilitate fast processing of the user command. For example, the VR navigation system can enable the user to send a quick signal by looking up to indicate that the user would like to trigger the voice control mode. In other implementations, the speech recognition controller may be triggered when the gaze is detected to stay within the target area for a default amount of time (e.g., a couple seconds) to provide a longer response time, for example, based on the user behavior learned by the system. The voice command processor 152 then performs speech/voice recognition of the sound received from the sound capturing unit 108.
[0048] When the gaze of the user is detected to be directed away from the target in the VR environment, the gaze analyzer 156 disables or turns off the speech recognition controller 150 (i.e., a voice control off mode or a gaze-only mode) and no voice command will be input through the sound capturing unit 108. For example, after looking up, the user can give a voice command (e.g., of 3-4 seconds long), and then the user may look down to turn off the speech recognition input. But the VR navigation system can continue to provide an output associated with the user's voice command, for example, to provide information or explanation associated with the user's question.
[0049] Computer program 158 receives inputs from the gaze analyzer 156 and voice command processor 152. The computer program 158 customizes the output of the VR environment based on the user input received from the gaze analyzer 156 and the voice command processor 152. The visual aspect of the VR environment is output to the display 112, and the audio aspect of the VR environment is output to the audio output 116
[0050] FIG. 3A is an example of the user triggering a speech recognition controller. When the user looks up to the top area of the 360 degree space of e.g., more than 60 degree, the speech recognition controller will be activated and the user will be provided with cues to give voice command(s).
[0051] FIG. 3B is an example of the user giving a voice command after the speech recognition controller is triggered. The user provides a voice command "Go to the kitchen"; the system processes the voice command and follows the order that the user has given. FIG. 3C illustrates the result of the system processing the voice command, e.g., the view is changed to the kitchen environment.
[0052] By embedding voice recognition in the VR navigation system, the overall VR experience can be more convenient and fast. A particular user behavior (e.g., looking up) can be used as an activator to turn on the speech or voice recognition function. As a result, the voice recognition would not be kept on the entire time, but only when the user need it. It greatly increases the accuracy of the voice recognition, and helps the user to interact with the VR environment in an easier way.
[0053] In the conventional UI/UX of the VR environment, going from one view to another view requires the user to gaze at a certain location in a sequential way programmed in the VR environment. Such a user experience can be tedious for the user because processing needs to happen sequentially and the user cannot skip any steps. According to the embodiments of the description, by way of enabling voice recognition, the system can provide customized view(s) to the user and the user can communicate with the VR navigation system. For example, the user can ask questions related to the VR environment and be provided with relevant information and/or content.
[0054] From the system's perspective, it is difficult to know when to receive the user's voice. When the sound is not intended but nonetheless received by the system, the system can take the noise or conversation as communication with the system and the noise or conversation can become interference to the processing of the VR navigation system. It is discovered that the user in the VR environment gazes the least into the top area of the 360 space (partially because users may assume that the top area is the least likely area to contain useful information). By capturing the user's behavior of looking up, the system can assume the user's deliberate intention to trigger voice recognition and the accuracy of system performance can be increased. As well, the less used area in the 360 space can be used in the VR navigation system.
[0055] The functionalities provided according to various embodiments of the description can be provided by many units already included in the existing devices, such as the motion detection sensor 110, the sound capturing unit 108, 128, etc., as described. Therefore, the VR navigation system can save the user the cost of external devices and provide a UI/UX that enhances and improves user interaction with the VR environment.
[0056] The VR speech recognition technology can be implemented in Android operating system, an iOS operating system, or the like, or on a PC platform. According to some embodiments of the description, the described functions are added to the HTML5 speech recognition Application Program Interface (API). Interactive and Animated hotspot in static panorama environment
[0057] Static panoramic images provided in the traditional UI/UX are limited in their content and do not provide any user interaction.
[0058] In some embodiments, a static panoramic FoV of the VR environment can be enhanced by providing at least one interactive hotspot that can be changed and/or animated for interaction with the user.
[0059] In some implementations, a panorama image can be provided with at least one hotspot with different associated states. When the users moves the cursor to the hotspot (either by way of moving the pointing device, or gazing at the hotspot), animation and/or different state(s) of the hotspot can be triggered and/or displayed.
[0060] For example, the VR navigation system can configure the fridge in a room as the hotspot in a real estate property VR environment, and when the user gazes at the fridge, the door of the fridge can open and the content of the fridge can be shown. The different states in this case can include the fridge being fully closed, the fridge being half open, the fridge being fully open, etc.. Alternatively, when the user gazes at the fridge in a room, a simple animation of the fridge being opened can be displayed.
[0061] Other examples of the different states can include a real estate property during the day, or during the night, the garage being open, or close, etc. By providing at least one interactive hotspot that can be changed and/or animated for interaction with the user, the user can be provided with more information and content.
[0062] In accordance with the embodiments of the description, the VR environment interface is calibrated when inserting at least one hotspot into the VR environment, to ensure that the different states and/or animation be displayed seamlessly. For example, when it is intended to show different states of a certain object, if the different states of the object are not aligned, the user can observe a shift of the various objects in the scene. To avoid user misperception, a number of reference points can be used to align the images and/or animation. In accordance with one embodiment, quintuple reference points are used which provide a superior result in turns of system performance and accuracy. [0063] Interactive and/or animated hotspots inserted into the 360 degree VR environment improve and increase the hierarchy of the VR environment. Users can easily find out which part of the environment is the highlight or important part.
[0064] In some embodiments, the VR navigation system allows the user to edit the VR environment, such as adding a hotspot, connecting different views, embedding multimedia contents, embedding 3D models, embedding Google™ maps, or the like.
[0065] In some embodiments, the VR navigation system provides preset templates which can be used easily by the user. In order to activate these functions, the user can simply drag and drop a selected template into the VR environment view. For instance, a hotspot can be generated when the user clicks on a hotspot button, and the user can drag it to adjust its position in the 360 space of the VR environment to create a corresponding hotspot.
[0066] In some embodiments, the VR navigation system enables the user to define at least one region in the VR environment to associate with a hotspot. When the user defined region is activated (by way of, for example, moving the cursor into the defined region, pressing the defined region, or gazing at the hotspot), a corresponding added function can be activated, such as connecting to a different view, playing multimedia contents such as an audio or video content, displaying a picture, displaying a 3D model, displaying a Google™ map. [0067] When the computing device is connected via the network (e.g., via Bluetooth or other wireless, cellular or other networks) to other computing devices, the generated VR environment can be shared and displayed in the other computing devices. For example, the computing device can send a link, e.g., a WEB link, representing the generated VR environment to the other computing devices and the VR environment can be viewed by the users in the form of a WEB page.
[0068] When the computing device is a mobile device such as a Smartphone, the VR environment interface according to some embodiments of the description can be optimized for the mobile environment, where the VR environment view created can be shared through a WEB link and other user can view it using a web browser of their own device. Considering the graphics processing unit (GPU) of an average Smartphone would have difficulties rendering such high-resolution images along with all the possible embedded multi-media UI/UX, the generated VR environment view can have a mobile version with a reduced resolution and optimized data types for the UI/UX. This also reduces the amount of loading time and data usage, which significantly improves the overall user experience.
[0069] In some implementations of the description, the VR navigation system includes learning abilities to analyze popular interactive information which is used most frequently by the user(s). The UI/UX can then move the most frequently used information closer to where a user starts the navigation of the VR environment, to facilitate user interaction with the popular elements. As well, the VR navigation system can analyze the user's behavior and/or operation (e.g., how quickly the user looks up to trigger the voice recognition mode, or how fast the user speaks, etc) and adjust itself accordingly to suit the user's need. For example, if some user is slow in looking up, the system can provide a longer response time to provide sufficient time for the user to complete the operation.
[0070] By embedding the learning abilities in the VR navigation system, the UI/UX can be adjusted to provide the information more suitable and faster for the user.
[0071] According to some embodiments of the description, the system can create a number of threads including a main thread and a local thread. The local thread can be used to load and render the 3D model of the VR environment, and the main thread can be used for user interaction. When the local thread completes the 3D model rendering, it reports to the main thread of the completion and the main thread can output the 3D model to the display. By loading a large 3D model in an asynchronized way, the 3D model can be provided with a faster loading speed and in a better quality. As well, the user interaction would not be interrupted or disturbed by the loading of the 3D model.
[0072] FIG. 4 is a user interface providing a virtual tour of a real estate property in accordance with one example of the description; FIG. 5 is an example of the user interface showing the inside of the real estate property in accordance with one example of the description. [0073] FIG. 6 is a method 600 for providing a method of interfacing a user with a VR environment according to one embodiment of the description. The motion detection sensor mounted to the head of the user detects (602) a gaze of the user directed towards a target in the VR environment. The gaze of the user can be detected by detecting (603) a tilt of the head of the user and analyzing the tilt of the head of the user to determine a position of the gaze of the user.
[0074] When the gaze of the user is detected (604) to be directed towards the target in the VR environment, a voice control mode is triggered (606) and a voice command can be received (608) from the user. For example, when the tilt of the head of the user is determined to be above a certain threshold (605), it can be determined that the gaze of the user is directed towards the target. The voice command is recognized (610) and converted to text command. The command then can be processed (612). When the gaze of the user is detected (604) to be away from the target in the VR environment, the voice control mode can be turned off (614) and no voice command will be input to the system.
[0075] FIG. 7 is a method 700 for providing a method of interfacing a user with a VR environment according to another embodiment of the description. A VR environment is obtained (702) and rendered (704) on the display associated with a computing device. The VR environment can be obtained based on stitching a sequence of images into a composite image. A user input is received (706) to define a hotspot in the VR environment. This can include receiving a user input to define at least one region associated with the hotspot. A position of a focus of the user is detected and determined (708), as described above. When the position of the focus of the user is at the hotspot, a function associated with the hotspot can be activated (710) to enhance user interaction with the VR environment. For example, the function associated with the hotspot can include connecting (710a) to a different view associated with the VR environment, or playing (710b) multimedia contents such as an audio or video content associated with the hotspot. Other functions can be added and activated, such as displaying a picture, displaying a 3D model, displaying a Google™ map, etc.. The generated and/or edited VR environment can be shared (712) by way of a WEB link with other users through the network. [0076] Any of the methods disclosed herein may be implemented in hardware, software, firmware or any combination thereof. Where implemented as software, the method, acts or operations may be programmed or coded as computer-readable instructions and recorded electronically, magnetically or optically on a fixed or non-transitory computer-readable medium, computer-readable memory, machine-readable memory or computer program product. In other words, the computer-readable memory or computer-readable medium comprises instructions in code which when loaded into a memory and executed on a processor of a computing device cause the computing device to perform one or more of the foregoing method(s).
[0077] A computer-readable medium can be any means that contain, store, communicate, propagate or transport the program for use by or in connection with the instruction execution system, apparatus or device. The computer-readable medium may be electronic, magnetic, optical, electromagnetic, infrared or any semiconductor system or device. For example, computer executable code to perform the methods disclosed herein may be tangibly recorded on a computer-readable medium including, but not limited to, a CD-ROM, a DVD, RAM, ROM, EPROM, Flash Memory or any suitable memory card, etc. The method may also be implemented in hardware.
[0078] Although the present invention has been described with reference to particular means, materials and embodiments, from the foregoing description, one skilled in the art can easily ascertain the essential characteristics of the present invention and various changes and modifications can be made to adapt the various uses and characteristics without departing from the spirit and scope of the present invention as described above and as set forth in the attached claims.

Claims

WHAT IS CLAIMED IS:
A method of interfacing a user with a Virtual Reality (VR) environment, comprising:
detecting, by a motion detection sensor, a gaze of the user directed towards a target in the VR environment, the motion detection sensor being coupled to a mobile phone mounted to the head of the user;
triggering a voice control mode to receive a voice command from the user when the gaze of the user is detected to be directed towards the target in the VR environment; and
processing the voice command received from the user.
The method according to claim 1, wherein the target is an area in the VR environment at equal to or above 60 degree in relation to the horizon.
The method according to claim 1, further comprising:
detecting, by the motion detection sensor, the gaze of the user is away from the target in the VR environment;
tuming off the voice control mode when the gaze of the user is detected to be directed away from the target in the VR environment.
The method according to claim 1, wherein detecting the gaze of the user directed towards the target in the VR environment includes:
detecting, by the motion detection sensor, a degree of a tilt of the head of the user; and
analyzing the tilt of the head of the user to determine whether the tilt of the head of the user is above a certain threshold.
The method according to claim 4, wherein the mobile device is operatively coupled to a VR headset.
The method according to claim 5, wherein the motion detection sensor is a G sensor.
7. The method according to claim 4, wherein the position of the gaze of the user is associated with a cursor in the VR environment.
8. A non-transitory computer readable memory containing instructions for execution by a processor, the instructions when executed by the processor perform a method of interfacing a user with a Virtual Reality (VR) environment, the method comprising:
detecting, by a motion detection sensor, a gaze of the user directed towards a target in the VR environment, the motion detection sensor being coupled to a mobile phone mounted to the head of the user;
triggering a speech recognition mode to receive a voice command from the user when the gaze of the user is detected to be directed towards the target in the VR environment; and
processing the voice command received from the user.
9. A method of interfacing a user with a Virtual Reality (VR) environment, comprising:
obtaining a VR environment;
rendering the VR environment on a display associated with a computing device;
receiving a user input to define a hotspot in the VR environment, detecting and determining if a position of a focus of the user is at the hotspot; and
when the position of the focus of the user is determined at the hotspot, activating a function associated with the hotspot.
10. The method according to claim 9, wherein receiving a user input to define a hotspot in the VR environment includes receiving a user input to define a region associated with the hotspot.
11. The method according to claim 9, wherein activating a function associated with the hotspot includes connecting to a different view associated with the VR environment.
12. The method according to claim 9, wherein activating a function associated with the hotspot includes playing multimedia contents associated with the hotspot.
13. The method according to claim 9, wherein the VR environment is rendered in a WEB browser.
14. The method according to claim 12, further comprising sharing, via a network, a WEB link associated with the VR environment with another user.
15. The method according to claim 9, wherein the computing device is a mobile device.
PCT/CA2017/051568 2016-12-23 2017-12-21 System and method for providing virtual reality interface WO2018112643A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3047844A CA3047844A1 (en) 2016-12-23 2017-12-21 System and method for providing virtual reality interface

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201662438646P 2016-12-23 2016-12-23
US62/438,646 2016-12-23
US201762565217P 2017-09-29 2017-09-29
US201762565251P 2017-09-29 2017-09-29
US62/565,217 2017-09-29
US62/565,251 2017-09-29

Publications (1)

Publication Number Publication Date
WO2018112643A1 true WO2018112643A1 (en) 2018-06-28

Family

ID=62624095

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2017/051568 WO2018112643A1 (en) 2016-12-23 2017-12-21 System and method for providing virtual reality interface

Country Status (2)

Country Link
CA (1) CA3047844A1 (en)
WO (1) WO2018112643A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2617420A (en) * 2021-09-01 2023-10-11 Apple Inc Voice trigger based on acoustic space

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120295708A1 (en) * 2006-03-06 2012-11-22 Sony Computer Entertainment Inc. Interface with Gaze Detection and Voice Input
US20130127980A1 (en) * 2010-02-28 2013-05-23 Osterhout Group, Inc. Video display modification based on sensor input for a see-through near-to-eye display
WO2014018693A1 (en) * 2012-07-27 2014-01-30 Gatan, Inc. Ion beam sample preparation apparatus and methods
US20140184550A1 (en) * 2011-09-07 2014-07-03 Tandemlaunch Technologies Inc. System and Method for Using Eye Gaze Information to Enhance Interactions
US20150348327A1 (en) * 2014-05-30 2015-12-03 Sony Computer Entertainment America Llc Head Mounted Device (HMD) System Having Interface With Mobile Computing Device for Rendering Virtual Reality Content
US20160262608A1 (en) * 2014-07-08 2016-09-15 Krueger Wesley W O Systems and methods using virtual reality or augmented reality environments for the measurement and/or improvement of human vestibulo-ocular performance

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120295708A1 (en) * 2006-03-06 2012-11-22 Sony Computer Entertainment Inc. Interface with Gaze Detection and Voice Input
US20130127980A1 (en) * 2010-02-28 2013-05-23 Osterhout Group, Inc. Video display modification based on sensor input for a see-through near-to-eye display
US20140184550A1 (en) * 2011-09-07 2014-07-03 Tandemlaunch Technologies Inc. System and Method for Using Eye Gaze Information to Enhance Interactions
WO2014018693A1 (en) * 2012-07-27 2014-01-30 Gatan, Inc. Ion beam sample preparation apparatus and methods
US20150348327A1 (en) * 2014-05-30 2015-12-03 Sony Computer Entertainment America Llc Head Mounted Device (HMD) System Having Interface With Mobile Computing Device for Rendering Virtual Reality Content
US20160262608A1 (en) * 2014-07-08 2016-09-15 Krueger Wesley W O Systems and methods using virtual reality or augmented reality environments for the measurement and/or improvement of human vestibulo-ocular performance

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2617420A (en) * 2021-09-01 2023-10-11 Apple Inc Voice trigger based on acoustic space
GB2617420B (en) * 2021-09-01 2024-06-19 Apple Inc Voice trigger based on acoustic space

Also Published As

Publication number Publication date
CA3047844A1 (en) 2018-06-28

Similar Documents

Publication Publication Date Title
US11262835B2 (en) Human-body-gesture-based region and volume selection for HMD
US11093045B2 (en) Systems and methods to augment user interaction with the environment outside of a vehicle
US9483113B1 (en) Providing user input to a computing device with an eye closure
US9983687B1 (en) Gesture-controlled augmented reality experience using a mobile communications device
US9774780B1 (en) Cues for capturing images
AU2010366331B2 (en) User interface, apparatus and method for gesture recognition
CN107430856B (en) Information processing system and information processing method
US9395764B2 (en) Gestural motion and speech interface control method for 3d audio-video-data navigation on handheld devices
JPWO2018142756A1 (en) Information processing apparatus and information processing method
US9665249B1 (en) Approaches for controlling a computing device based on head movement
WO2018112643A1 (en) System and method for providing virtual reality interface
US10585485B1 (en) Controlling content zoom level based on user head movement
US20240112383A1 (en) Generating user interfaces in augmented reality environments
US20230384928A1 (en) Ar-based virtual keyboard
CN115338858A (en) Intelligent robot control method, device, server, robot and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17885187

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3047844

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17885187

Country of ref document: EP

Kind code of ref document: A1