WO2014111947A1 - Gesture control in augmented reality - Google Patents

Gesture control in augmented reality Download PDF

Info

Publication number
WO2014111947A1
WO2014111947A1 PCT/IL2014/050073 IL2014050073W WO2014111947A1 WO 2014111947 A1 WO2014111947 A1 WO 2014111947A1 IL 2014050073 W IL2014050073 W IL 2014050073W WO 2014111947 A1 WO2014111947 A1 WO 2014111947A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
images
hand
camera
vector
Prior art date
Application number
PCT/IL2014/050073
Other languages
French (fr)
Inventor
Meir Morag
Assaf Gad
Eran Eilat
Original Assignee
Pointgrab Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pointgrab Ltd. filed Critical Pointgrab Ltd.
Publication of WO2014111947A1 publication Critical patent/WO2014111947A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Definitions

  • the present invention relates to the field of gesture based control of electronic devices. Specifically, the invention relates to using touchless gestures in augmented reality processing.
  • a user's view of the real world is enhanced with virtual computer-generated graphics.
  • These graphics are spatially registered so that they appear aligned with the real world from the perspective of the viewing user.
  • the spatial registration can make a virtual object appear to be located on a real surface such as a real world patch of grass or tree.
  • Augmented reality processing of video sequences may be performed in order to also provide real-time information about one or more objects that appear in the video sequences.
  • objects that appear in video sequences may be identified so that supplemental information (i.e., augmented information) can be displayed to a user about the objects in the video sequences.
  • the supplemental information may include graphical or textual information overlayed on the frames of the video sequence so that objects are identified, defined, or otherwise described to a user.
  • Augmented reality systems have previously been implemented using head- mounted displays that are worn by the users.
  • a video camera captures images of the real world in the direction of the user's gaze, and augments the images with virtual graphics before displaying the augmented images on the head-mounted display.
  • US publication number 2012/0154619 describes an augmented reality system which includes a video device having two different cameras; one to capture images of the world outside the user and one to capture images of the user's eyes.
  • the images of the eyes provide information about areas of interest to the user with respect to the images captured by the first camera and a probability map may be generated based on the images of the user's eyes to prioritize objects from the first camera regarding display of augmented reality information.
  • Alternative augmented reality display techniques exploit large spatially aligned optical elements, such as transparent screens, holograms, or video-projectors to combine the virtual graphics with the real world.
  • a user may interact with the displayed reality using indirect interaction devices, such as a mouse or stylus that can monitor the movements of the user to control an onscreen object.
  • indirect interaction devices such as a mouse or stylus that can monitor the movements of the user to control an onscreen object.
  • the user may feel detached from the augmented reality environment and the feel of naturally interacting with the environment may be spoiled.
  • Interaction with a touch screen may also be used to interact with displayed reality.
  • a user may touch a touch sensitive screen of a cellular telephone or other mobile device which is displaying images obtained by a camera of the mobile device, to cause graphics to appear on the display at the location of the interaction with the touch sensitive screen.
  • Augmented reality can be used in a game environment.
  • the AppTagTM application is an application that enables to use an infra-red (IR) beam gun to target other players in an augmented reality game.
  • the application can work with an attached smart device with a camera to obtain images of the real world and the IR beam gun is used to "shoot" at real world objects.
  • This game requires using a special IR beam gun and requires inconveniently attaching another device (which includes a camera) to the gun.
  • Embodiments of the invention provide an enhanced real-time experience to the user with respect to video sequences that are captured and displayed in real-time without having to interact with a touch screen or any other interaction devices.
  • a first camera may be configured to capture images of the real world, the images being displayed on the display, and a second camera configured to capture images of the user.
  • Images of the user may be processed to identify a user's gesture and the display showing images of the real world may be controlled based on the identified gesture.
  • images of the real world may be processed to detect a distinct object (e.g., images of a human head or body) and to determine a (possibly approximate) location of the distinct object relative to the first camera.
  • a user's gesture may be identified in the images from the second camera and the user's gesture may be translated to movement of a graphical object such as an arrow or paintball ammunition on the display so that the graphical object coincides with the human head or body detected in the real world images.
  • a user or group of users may play virtual paintball or other virtual war games or any other types of virtual games without having to physically interact with or use any special interaction device but rather by using natural gestures.
  • processing the images of the second camera includes obtaining a sequence of images which include a user, typically a body part of the user, such as the user's hand.
  • a specific image the user's hand is located at a specific X, Y coordinate of the image.
  • This X,Y coordinate is determined and is used to determine a direction of a vector (a vector may be for example a set or geometric entity endowed with magnitude and direction, e.g., a numerical component for a distance and a numerical component for an angle).
  • the location of the user e.g., the user's hand
  • the Z axis relative to the camera imaging the user
  • a device e.g., a display of a device
  • a graphical object displayed by the device may then be controlled based on the vector.
  • a Z coordinate may be a coordinate relative to the camera, imager, or device to be manipulated such as a television screen (typically but not necessarily relative to a right angle to the plane of such device) and the X and Y coordinates represent a position within a plane at a distance of the Z coordinate, the plane typically (but not necessarily) being perpendicular to the device.
  • Embodiments of this method of constructing a vector may also be used to control a device based on a user's gestures even without real world images, for example, in a game or other application where a user may move graphical objects on a pre-programmed display.
  • a method for computer vision based control of a device including obtaining a first sequence of images, the images comprising a user's hand; determining X,Y coordinates of the user's hand in an image within the sequence of images, the X,Y coordinates determining a direction of a vector; determining a Z coordinate of the user's hand to determine a magnitude of the vector; and controlling a device based on the vector.
  • the sequence of images may be obtained using a 2D (two dimensional) camera or imager and the X, Y and Z coordinates may be relative to the 2D camera.
  • determining the X,Y and Z locations of the user's hand includes detecting a first posture of the user's hand and determining the X,Y and Z coordinates of the hand in the first posture.
  • the device may be controlled based on the detection of the first posture of the user's hand and based on the vector.
  • Controlling the device may include interacting with an object displayed on the device based on the vector.
  • interacting with the displayed object includes setting a direction of movement of the object on the device's display based on the direction of the vector and setting the velocity of the object on the device's display based on the magnitude of the vector.
  • the method includes obtaining a second sequence of images, the images comprising a target; determining a location of the target in an image within the sequence of images; and controlling the device based on the vector and on the location of the target.
  • the images of the first sequence may include images of the user and images of the second sequence may include real world images.
  • the method includes detecting a shape of the user's hand and causing the object to move on the display of the device, based on the detection of the shape of the user's hand.
  • an augmented reality system includes a first camera or imager for obtaining images of a real world; a second camera or imager for obtaining images of a user; a display for displaying images of the real world and for displaying a user controlled graphical object; and a processor for identifying user gestures from the images of the user and for controlling the graphical object based on the user gesture.
  • the first and second camera are located on a single device. According to some embodiments the first and second cameras are configured for obtaining opposing fields of view.
  • the processor for identifying user gestures identifies a location of a gesturing hand on X,Y coordinates of the second camera and a coordinate on the Z axis relative to the second camera and controls the graphical object based on the X,Y, Z coordinates.
  • the processor may create a display of a trajectory based on the user gestures.
  • the processor is configured to identify a target in the real world images and to estimate a location of the target on a set of coordinates of the first camera.
  • the processor may identify a meeting point between the trajectory and the estimated location of the target and may issue an alert for the user based on the identification of the meeting point.
  • the processor is for identifying a shape of a hand and the controlling of the graphical object may be based on the identification of the shape of the hand and on the user gesture.
  • FIGs. 1A and IB schematically illustrate an augmented reality game using gestures according to embodiments of the invention
  • FIG. 1C schematically illustrates a device operable according to embodiments of the invention, for example, a device on which the augmented reality game can be played;
  • FIG. 2 schematically illustrates a computer game controlled by user gestures, according to embodiments of the invention.
  • FIG. 3 schematically illustrates a method for controlling a device, according to embodiments of the invention.
  • Embodiments according to embodiments of the invention are demonstrated through an augmented reality game schematically illustrated in Figs. 1A and IB.
  • the embodiment of the game exemplified in Fig. 1A includes shooting virtual paintballs at an opponent (e.g., at the opponent's head) using a mobile device as a hand-held "sling-shot".
  • the mobile device may include two cameras (other numbers may be used), one configured to image the real world (e.g., a back camera on a mobile phone) and one configured to image the user (e.g., a front camera on a mobile telephone, a camera which faces the user during normal use of the telephone).
  • the device is capable of controlling simultaneous operation of the back and front cameras.
  • a user (160) holding the mobile device (170) may image an opponent (162) with the back camera (not shown), the image of the opponent being displayed to the user on the mobile device's display (174).
  • the user (160) may then use his hand (161) to gesture, an action which will be imaged by the front camera (not shown).
  • Images captured by the back camera are typically displayed to the user (160) on the same or close to the same plane (e.g., the plane of display 174) of the front camera.
  • the user (160) may direct a shooting, pitching or flinging movement of his hand (161) (or other body part) at the opponent's head (164) shown in the images displayed to the user.
  • This movement of the user's hand may control or move a graphical object (not shown), such as an arrow or paintball ammunition, on the display, in the direction of the opponent's head (164) (in the displayed image).
  • a graphical object such as an arrow or paintball ammunition
  • An animated trajectory (not shown) of the graphical object may be added on the display.
  • a successful "hit” triggers coloring or another indication on the opponent's head in the image of the opponent displayed to the user.
  • an indication of the hit or a report of the hit may also be sent to the hit opponent and/or other players.
  • a still image of the hit e.g., a picture of the opponent with an added colored mark on the opponent's head (or other body part) may be generated and saved and/or sent to the opponent and other players.
  • a networked management system can control the game and may send and receive messages to and from each player in a multi-player game.
  • users register for a game through the management system (180).
  • the users may register and further communicate through networks such as a cellular network, social network, etc.
  • networks such as a cellular network, social network, etc.
  • the users may be required to enter identifying details such as their height, head size etc. and/or identifying details relating to the mobile device they will be using in the game.
  • identifying details such as their height, head size etc. and/or identifying details relating to the mobile device they will be using in the game.
  • mobile telephones of a known manufacturer have known dimensions and thus details of the manufacturer and model of the mobile phone being used by a user may be used to identify the mobile telephone or the human holding the telephone as a target in the images of the real world.
  • a user may use his mobile device held up against the background of his head to enable calibration and calculation of the user's head size (compared to the known dimensions of the mobile device).
  • the mobile device includes a processor capable of identifying the shape of human head (or other human body parts or other predefined shapes that may be imaged in the real world). By using the (known or predetermined) average dimensions of the identified head shape and by using the angular size of the head image, an approximate distance of the head from the back and front cameras may be calculated, hence providing an approximate three dimensional (3D) location of the head shape. The approximate location may be used for detecting "hits" and for calculating the animated trajectory of the graphical object.
  • the mobile device includes a processor capable of identifying a sticker or tag or other recognizable element worn by the opponent (e.g., an augmented reality sticker similar to stickers produced by Elipse Analysis and Design sticker). Identifying the sticker or tag facilitates the detection of "hits” and calculating the animated trajectory of the graphical object. Additional parameters used for detecting "hits" and for calculating the animated trajectory of the graphical object, may be parameters of the user's hand movement, which is imaged by the front camera. Examples of these parameters will be further described with reference to Fig. 3.
  • a method for user interaction with a real scene may be described by following steps carried out in two camera systems.
  • a real scene is imaged and displayed to a user (110).
  • a target in the real scene is identified (112) (e.g., a human form or parts of a human body may be identified by known object recognition algorithms or by having the target wear an augmented reality sticker or tag or by other suitable methods) and the location of the target in the first camera's image coordinate system (typically the 3D location relative to the first camera) of the target can be estimated (114), for example by comparing the known average size of the target in real life (e.g., the circumference of an average human head can be between 50 and 60 cm or a sticker or tag may have a known size) to the angular size of the target in the image and thus estimating the distance of the target from the camera imaging it.
  • the known average size of the target in real life e.g., the circumference of an average human head can be between 50 and 60 cm or a sticker or tag may have a known size
  • a second camera system the user is imaged (120).
  • the user's hand or other body part
  • the user's hand may be identified (e.g., by using shape recognition algorithms) and tracked (e.g., by determining optical flow or other known tracking methods).
  • a location of the hand in one or more images, on the second camera's image coordinate system, is calculated (122).
  • the two image coordinate systems may be aligned or registered to create a unified coordinate system and a trajectory of a virtual object being manipulated by the user's hand (or other body part) may be calculated based on the hand location and target location in the unified coordinate system (124).
  • the images from the first and second cameras are aligned or registered (e.g., by coordinate transformation) and a meeting point between the estimated location of the target (estimated in step 114) and the trajectory of the virtual object (calculated in step 124) can then be identified. If the trajectory coincides with the location of the target (130) then the user may be notified (140) (e.g., by a graphic indication or message or other alert appearing on the user's display). If no meeting point was found (the trajectory does not coincide with the location of the target then no alert appears on the user's display or the user may be alerted to the fact that he "missed" (150). Alternatively, users may be notified by how much they missed or they may be advised of parameters relating to their throw so that they may improve their aim next time.
  • a network e.g. a cellular network
  • different users may be identified based on unique user identification details (for example, which may be entered by the user when registering through a game management application) or unique characteristics of their mobile device (e.g., based on their GPS location or based on a unique identifier (ID) (e.g. an RF ID) or based on known mobile device specification or design which may be entered, for example, by the user when registering for a game) and notice of a "hit" on a specific user may be sent to that user based on this identification.
  • ID unique identifier
  • RF ID radio access point of mobile device
  • communication between mobile devices of players such as by IR, Bluetooth or other wireless communication techniques
  • Other methods of identifying a hit player may be used.
  • the game described above may be played using any electronic device that has or that is connected to an electronic display and camera.
  • a mobile device is used, enabling the user(s) to move easily and quickly, however, other less mobile devices may be used also.
  • FIG. 1C An example of a device operable according to embodiments of the invention is schematically illustrated in Fig. 1C.
  • the device 10 may be a specifically designed device or may be a common device such as a mobile telephone which may run an appropriate application.
  • the device 10 may include for example a display 11 and two cameras, 12 and 13. (Other numbers of cameras or imagers may be used.) Cameras 12 and 13 are located on the device and/or configured such that while the user holds the device with the display 11 facing him, camera 12 captures a field of view (FOV) which includes the user's body (according to one embodiment, the user's free hand 18, rather than the hand holding the device) and camera 13 captures a FOV of the world outside of the user.
  • FOV field of view
  • camera 12 captures a FOV which is directly or almost directly opposite, or facing in a substantially reverse direction, the FOV captured by camera 13, however, the cameras may be positioned or located in other configurations. For example the cameras may face in opposite (e.g., 180 degrees difference in the direction in which they are pointed) or substantially opposite directions.
  • the cameras 12 and 13 are 2D cameras embedded in device 10. According to other embodiments three dimensional cameras may used to obtain images of the user and/or of the outside world. According to other embodiments a plurality of two dimensional cameras may be used to obtain the images. The plurality of 2D cameras may be positioned relative to each other to obtain a stereoscopic view of the user's hand and/or of the outside world.
  • Display 11 shows the user the FOV captured by camera 13 (typically including a target 14) and may show the user animated additions, such as an animated trajectory 15 of a virtual object and graphical indications of "hits" 16.
  • the device may include buttons 17 and switches for operation of the device such as ON/OFF, send, volume, etc.
  • the device 10 typically includes processors for operating the game, as described above.
  • processor 101 may be configured for carrying out embodiments of the invention by for example being connected to a memory (e.g., memory 102) and carrying out instructions or executing software stored on the memory.
  • a memory e.g., memory 102
  • Embodiments of the invention may include a non-transitory computer readable storage medium including or storing a computer program or computer executable instructions which, when executed by a processor in a computing system cause the processor to perform methods described herein.
  • FIG. 2 Another game operable according to embodiments of the invention is described in Fig. 2.
  • the environment (210) displayed to the user is typically computer generated.
  • the user may control one or more of the computer generated objects by gesturing.
  • a system capable of supporting a game such as described with reference to Fig. 2 and Figs. 1A and IB typically includes an image sensor or camera to obtain image data of a field of view (FOV).
  • the image data is sent to a processor to perform image analysis to detect and track a user's hand from the image data and to detect postures and gestures of the user's hand.
  • a posture or shape of the user's hand can be detected or identified by using an algorithm which calculates Haar-like features in a Viola- Jones object detection framework, to detect a hand shape.
  • the image sensor may be associated with a storage device for storing image data.
  • the storage device may be integrated within the image sensor or may be external to the image sensor.
  • image data may be stored in the processor, for example in a cache memory.
  • more than one processor may be used by the system.
  • the game may be operated on any electronic device that has or that is connected to an electronic display, e.g., television (TV), DVD player, PC, mobile phone, camera, or on an electronic device available with an integrated standard 2D camera.
  • a camera is an external accessory to the device.
  • An external camera may include a processor and appropriate algorithms for gesture/posture control.
  • more than one 2D camera is provided to enable obtaining 3D information.
  • the system includes a 3D and/or stereo camera.
  • Processors may be integral to the image sensor or may be in separate units. Alternatively, a processor may be integrated within the device. According to other embodiments a first processor may be integrated within the image sensor and a second processor may be integrated within the device.
  • Communication between the image sensor and the processor and/or between the processor and the device may be through a wired or wireless link, such as through IR communication, radio transmission, Bluetooth technology and other suitable communication routes and protocols.
  • the image sensor may be a standard 2D camera such as a webcam or other standard video capture device, typically installed on PCs, smart phones or other electronic devices.
  • the image sensor can be IR sensitive.
  • the system may include a stereo camera.
  • the processor can apply image analysis algorithms, such as motion detection and shape recognition algorithms to identify and further track the user's hand.
  • a user may control a computer generated object on a display by touchlessly (e.g., not touching a game controller or screen) gesturing within the field of view of the camera connected to the computer.
  • Parameters of the user's hand movement e.g., as discussed with reference to Fig. 3 are calculated and translated to movement of a selected object.
  • the object is an arrow (202) or some other type of ammunition.
  • Other objects may be similarly controlled.
  • a trajectory (204) of the shot arrow may be animated on the display (200). Hits may give the user points, which may be displayed on screen (e.g., display 200) during the game.
  • an icon (206) or other graphical representation may appear on screen to reassure the user that his hand is within FOV of the camera.
  • the icon may indicate to the user also if the system is in a mode of translating his hand movements to "throwing" or not (e.g., based on the shape of the icon).
  • a user's hand movement is translated to movement of a graphical object (such as in Figs. 1A and IB and Fig. 2) based on identification of a pre-determined posture or shape of the user's hand. For example, a user may move an open hand (all fingers extended) in view of a camera to control a cursor or other functions of a device whereas, when the user closes his fingers to make a fist or grab-like posture or changes the shape of his hand to any other pre-determined posture, movement of the user's hand will be translated to control a display in games as described above.
  • hand gestures for controlling the device may include touch gestures.
  • the display e.g., display 11
  • a user may apply a touch gesture on a location on the display to signify selecting a graphical object and initiating an event in which further touchless gestures of the user's hand are translated to other manipulations of the graphical object.
  • a touch-based pinch gesture may signify "select" while a further touchless gestures may cause movement of the selected graphical object.
  • a method for controlling a device is schematically illustrated in Fig. 3.
  • the method includes obtaining a sequence of images, the images comprising a user's hand or other body part (310); determining X,Y coordinates of the user's hand at a specific location within an image from the sequence of images (320) to calculate a direction of a vector (330).
  • a Z-coordinate of the user's hand at the specific location is also determined (340) to calculate a magnitude of the vector (350) and the vector is used to control the device (360).
  • controlling the device includes setting a direction of movement of the object on the device's display based on the direction of the vector and setting the velocity of the object on the device's display based on the magnitude of the vector.
  • a user playing virtual paintball may move his hand (according to one embodiment, while in a predetermined posture) back to stretch a virtual bow or slingshot and then may apply a hand gesture or posture such as opening the palm to signify release of the virtual arrow or other ammunition.
  • the trajectory of the virtual arrow being shot by the user is based on a vector calculated based on the location of the user's hand on the X and Y axes and a location on the Z axis (typically relative to the camera imaging the hand) when the hand release gesture or posture is detected.
  • movement of a hand may be defined as having a beginning (when a hand changes from static to moving) and an end (when the hand changes from mobile to static).
  • a “select event” may be determined to occur at the beginning of the movement and a “release event” may be determined to occur at the end of the movement.
  • the beginning and end of hand movements may be detected based on the shape of the user's hand (e.g., using one posture of the hand to signify starting a hand movement (and a "select event") and another posture to signify ending the hand movement (and a "release event")) or based on a time period lapsed or on motion parameters (e.g., based on the speed of the motion detected where below a predetermined speed of movement the hand is defined "static" and above the predefined speed the hand is determined to be "mobile”) or by other suitable methods.
  • the X, Y, Z coordinates are determined upon detection of a change of posture of the hand or upon detection of a pre-determined posture of the hand.
  • a user closes the fingers (or one finger) of his hand into a grab-like or pinch-like posture to select and pull back a virtual arrow in a virtual bow
  • locations of the hand on the X and Y axes, while in the grab or pinch-like posture are used to calculate an estimated direction of movement of the virtual arrow whereas the location of the hand on the X, Y and Z axis once the user opens his fingers (e.g., to simulate letting go of the arrow) are used to determine the velocity (possibly the initial velocity) and/or the distance of travel of the virtual arrow from the X,Y location when the hand's posture changed.
  • the camera used to obtain images of the user is a 2D camera and the X,Y coordinates relate to a coordinate system of the image produced by the camera while the Z coordinates relate to locations relative to the 2D camera itself.
  • Z coordinates may be relative locations (e.g., closer or further away from the camera).
  • Detecting X, Y and Z coordinates may be done, for example, by using the known dimensions of the image frames and by detecting a pitch angle of the hand (e.g., by calculating the angle between the user's arm and a transverse axis of the hand or arm or between the hand and a virtual line connecting the hand and the display, or between a virtual line connecting the hand and the camera used to obtain the images of the hand).
  • a pitch angle of the hand e.g., by calculating the angle between the user's arm and a transverse axis of the hand or arm or between the hand and a virtual line connecting the hand and the display, or between a virtual line connecting the hand and the camera used to obtain the images of the hand.
  • the size or shape of the user's hand may be used in calculating the angle of pitch of the hand and/or to determine a coordinate of the hand.
  • Additional methods may be used for detecting X,Y, Z coordinates, for example, by detecting a transformation of movement of selected points/pixels from within images of a hand, determining changes of scale along X and Y axes from the transformations and determining movement along the Z axis from the scale changes or any other appropriate methods, for example, by using stereoscopy or 3D imagers.

Abstract

A system and method for computer vision based control of a device may include for example obtaining a first sequence of images, the images comprising a user's hand, determining X and Y coordinates of the user's hand in an image within the sequence of images, the X and Y coordinates determining a direction of a vector, determining a Z coordinate of the user's hand to determine a magnitude of the vector, and controlling the device based on the vector.

Description

GESTURE CONTROL IN AUGMENTED REALITY
FIELD OF THE INVENTION
[0001] The present invention relates to the field of gesture based control of electronic devices. Specifically, the invention relates to using touchless gestures in augmented reality processing.
BACKGROUND OF THE INVENTION
[0002] In an augmented reality system, a user's view of the real world is enhanced with virtual computer-generated graphics. These graphics are spatially registered so that they appear aligned with the real world from the perspective of the viewing user. For example, the spatial registration can make a virtual object appear to be located on a real surface such as a real world patch of grass or tree.
[0003] Augmented reality processing of video sequences may be performed in order to also provide real-time information about one or more objects that appear in the video sequences. With augmented reality processing, objects that appear in video sequences may be identified so that supplemental information (i.e., augmented information) can be displayed to a user about the objects in the video sequences. The supplemental information may include graphical or textual information overlayed on the frames of the video sequence so that objects are identified, defined, or otherwise described to a user.
[0004] Augmented reality systems have previously been implemented using head- mounted displays that are worn by the users. A video camera captures images of the real world in the direction of the user's gaze, and augments the images with virtual graphics before displaying the augmented images on the head-mounted display.
[0005] US publication number 2012/0154619 describes an augmented reality system which includes a video device having two different cameras; one to capture images of the world outside the user and one to capture images of the user's eyes. The images of the eyes provide information about areas of interest to the user with respect to the images captured by the first camera and a probability map may be generated based on the images of the user's eyes to prioritize objects from the first camera regarding display of augmented reality information. [0006] Alternative augmented reality display techniques exploit large spatially aligned optical elements, such as transparent screens, holograms, or video-projectors to combine the virtual graphics with the real world.
[0007] A user may interact with the displayed reality using indirect interaction devices, such as a mouse or stylus that can monitor the movements of the user to control an onscreen object. However, using such interaction devices the user may feel detached from the augmented reality environment and the feel of naturally interacting with the environment may be spoiled.
[0008] Interaction with a touch screen may also be used to interact with displayed reality. For example, a user may touch a touch sensitive screen of a cellular telephone or other mobile device which is displaying images obtained by a camera of the mobile device, to cause graphics to appear on the display at the location of the interaction with the touch sensitive screen.
[0009] Augmented reality can be used in a game environment. For example, the AppTag™ application is an application that enables to use an infra-red (IR) beam gun to target other players in an augmented reality game. The application can work with an attached smart device with a camera to obtain images of the real world and the IR beam gun is used to "shoot" at real world objects. This game requires using a special IR beam gun and requires inconveniently attaching another device (which includes a camera) to the gun.
[0010] The existing augmented reality devices and applications do not enable simple and direct user interaction with real world images and cannot give the feeling of natural unaided interaction with the real world.
SUMMARY OF THE INVENTION
[0011] Embodiments of the invention provide an enhanced real-time experience to the user with respect to video sequences that are captured and displayed in real-time without having to interact with a touch screen or any other interaction devices.
[0012] According to one embodiment two cameras and a display may be used. A first camera may be configured to capture images of the real world, the images being displayed on the display, and a second camera configured to capture images of the user. Images of the user may be processed to identify a user's gesture and the display showing images of the real world may be controlled based on the identified gesture. [0013] For example, images of the real world may be processed to detect a distinct object (e.g., images of a human head or body) and to determine a (possibly approximate) location of the distinct object relative to the first camera. A user's gesture may be identified in the images from the second camera and the user's gesture may be translated to movement of a graphical object such as an arrow or paintball ammunition on the display so that the graphical object coincides with the human head or body detected in the real world images. This way a user or group of users may play virtual paintball or other virtual war games or any other types of virtual games without having to physically interact with or use any special interaction device but rather by using natural gestures.
[0014] According to one embodiment processing the images of the second camera includes obtaining a sequence of images which include a user, typically a body part of the user, such as the user's hand. In a specific image the user's hand is located at a specific X, Y coordinate of the image. This X,Y coordinate is determined and is used to determine a direction of a vector (a vector may be for example a set or geometric entity endowed with magnitude and direction, e.g., a numerical component for a distance and a numerical component for an angle). The location of the user (e.g., the user's hand) on the Z axis (relative to the camera imaging the user) is detected to determine a magnitude of the vector. A device (e.g., a display of a device) or a graphical object displayed by the device may then be controlled based on the vector. In one embodiment, a Z coordinate may be a coordinate relative to the camera, imager, or device to be manipulated such as a television screen (typically but not necessarily relative to a right angle to the plane of such device) and the X and Y coordinates represent a position within a plane at a distance of the Z coordinate, the plane typically (but not necessarily) being perpendicular to the device.
[0015] Embodiments of this method of constructing a vector may also be used to control a device based on a user's gestures even without real world images, for example, in a game or other application where a user may move graphical objects on a pre-programmed display.
[0016] Thus, according to one embodiment of the invention there is provided a method for computer vision based control of a device, the method including obtaining a first sequence of images, the images comprising a user's hand; determining X,Y coordinates of the user's hand in an image within the sequence of images, the X,Y coordinates determining a direction of a vector; determining a Z coordinate of the user's hand to determine a magnitude of the vector; and controlling a device based on the vector.
[0017] The sequence of images may be obtained using a 2D (two dimensional) camera or imager and the X, Y and Z coordinates may be relative to the 2D camera.
[0018] According to one embodiment determining the X,Y and Z locations of the user's hand includes detecting a first posture of the user's hand and determining the X,Y and Z coordinates of the hand in the first posture. Thus, the device may be controlled based on the detection of the first posture of the user's hand and based on the vector.
[0019] Controlling the device may include interacting with an object displayed on the device based on the vector. According to one embodiment interacting with the displayed object includes setting a direction of movement of the object on the device's display based on the direction of the vector and setting the velocity of the object on the device's display based on the magnitude of the vector.
[0020] According to one embodiment the method includes obtaining a second sequence of images, the images comprising a target; determining a location of the target in an image within the sequence of images; and controlling the device based on the vector and on the location of the target.
[0021] The images of the first sequence may include images of the user and images of the second sequence may include real world images.
[0022] According to one embodiment the method includes detecting a shape of the user's hand and causing the object to move on the display of the device, based on the detection of the shape of the user's hand.
[0023] Further, there is provided an augmented reality system. According to one embodiment of the invention the system includes a first camera or imager for obtaining images of a real world; a second camera or imager for obtaining images of a user; a display for displaying images of the real world and for displaying a user controlled graphical object; and a processor for identifying user gestures from the images of the user and for controlling the graphical object based on the user gesture. [0024] According to one embodiment the first and second camera are located on a single device. According to some embodiments the first and second cameras are configured for obtaining opposing fields of view.
[0025] According to one embodiment the processor for identifying user gestures identifies a location of a gesturing hand on X,Y coordinates of the second camera and a coordinate on the Z axis relative to the second camera and controls the graphical object based on the X,Y, Z coordinates.
[0026] The processor may create a display of a trajectory based on the user gestures.
[0027] According to some embodiments the processor is configured to identify a target in the real world images and to estimate a location of the target on a set of coordinates of the first camera. The processor may identify a meeting point between the trajectory and the estimated location of the target and may issue an alert for the user based on the identification of the meeting point.
[0028] According to one embodiment the processor is for identifying a shape of a hand and the controlling of the graphical object may be based on the identification of the shape of the hand and on the user gesture.
BRIEF DESCRIPTION OF THE FIGURES
[0029] The invention will now be described in relation to certain examples and embodiments with reference to the following illustrative figures so that it may be more fully understood. In the drawings:
[0030] Figs. 1A and IB schematically illustrate an augmented reality game using gestures according to embodiments of the invention;
[0031] Fig. 1C schematically illustrates a device operable according to embodiments of the invention, for example, a device on which the augmented reality game can be played;
[0032] Fig. 2 schematically illustrates a computer game controlled by user gestures, according to embodiments of the invention; and
[0033] Fig. 3 schematically illustrates a method for controlling a device, according to embodiments of the invention. DETAILED DESCRIPTION OF THE INVENTION
[0034] In the following description, various aspects of the present invention will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the present invention. However, it will also be apparent to one skilled in the art that the present invention may be practiced without the specific details presented herein. Furthermore, well known features may be omitted or simplified in order not to obscure the present invention.
[0035] Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as "processing," "computing," "calculating," "determining," or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical, such as electronic, quantities within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices.
[0036] Embodiments according to embodiments of the invention are demonstrated through an augmented reality game schematically illustrated in Figs. 1A and IB.
[0037] The embodiment of the game exemplified in Fig. 1A includes shooting virtual paintballs at an opponent (e.g., at the opponent's head) using a mobile device as a hand-held "sling-shot". The mobile device may include two cameras (other numbers may be used), one configured to image the real world (e.g., a back camera on a mobile phone) and one configured to image the user (e.g., a front camera on a mobile telephone, a camera which faces the user during normal use of the telephone). The device is capable of controlling simultaneous operation of the back and front cameras.
[0038] A user (160) holding the mobile device (170) may image an opponent (162) with the back camera (not shown), the image of the opponent being displayed to the user on the mobile device's display (174). The user (160) may then use his hand (161) to gesture, an action which will be imaged by the front camera (not shown). Images captured by the back camera are typically displayed to the user (160) on the same or close to the same plane (e.g., the plane of display 174) of the front camera. The user (160) may direct a shooting, pitching or flinging movement of his hand (161) (or other body part) at the opponent's head (164) shown in the images displayed to the user. This movement of the user's hand may control or move a graphical object (not shown), such as an arrow or paintball ammunition, on the display, in the direction of the opponent's head (164) (in the displayed image). An animated trajectory (not shown) of the graphical object may be added on the display.
[0039] According to one embodiment a successful "hit" triggers coloring or another indication on the opponent's head in the image of the opponent displayed to the user. When more than one player is involved an indication of the hit or a report of the hit may also be sent to the hit opponent and/or other players. A still image of the hit (e.g., a picture of the opponent with an added colored mark on the opponent's head (or other body part) may be generated and saved and/or sent to the opponent and other players.
[0040] According to one embodiment a networked management system (180) can control the game and may send and receive messages to and from each player in a multi-player game.
[0041] According to one embodiment users register for a game through the management system (180). The users may register and further communicate through networks such as a cellular network, social network, etc. At the time of registration the users may be required to enter identifying details such as their height, head size etc. and/or identifying details relating to the mobile device they will be using in the game. For example, mobile telephones of a known manufacturer have known dimensions and thus details of the manufacturer and model of the mobile phone being used by a user may be used to identify the mobile telephone or the human holding the telephone as a target in the images of the real world.
[0042] According to one embodiment a user may use his mobile device held up against the background of his head to enable calibration and calculation of the user's head size (compared to the known dimensions of the mobile device).
[0043] According to one embodiment the mobile device includes a processor capable of identifying the shape of human head (or other human body parts or other predefined shapes that may be imaged in the real world). By using the (known or predetermined) average dimensions of the identified head shape and by using the angular size of the head image, an approximate distance of the head from the back and front cameras may be calculated, hence providing an approximate three dimensional (3D) location of the head shape. The approximate location may be used for detecting "hits" and for calculating the animated trajectory of the graphical object.
[0044] According to some embodiments the mobile device includes a processor capable of identifying a sticker or tag or other recognizable element worn by the opponent (e.g., an augmented reality sticker similar to stickers produced by Elipse Analysis and Design sticker). Identifying the sticker or tag facilitates the detection of "hits" and calculating the animated trajectory of the graphical object. Additional parameters used for detecting "hits" and for calculating the animated trajectory of the graphical object, may be parameters of the user's hand movement, which is imaged by the front camera. Examples of these parameters will be further described with reference to Fig. 3.
[0045] Thus, as schematically illustrated in Fig. IB, a method for user interaction with a real scene according to one embodiment of the invention, may be described by following steps carried out in two camera systems.
[0046] In a first camera system a real scene is imaged and displayed to a user (110). A target in the real scene is identified (112) (e.g., a human form or parts of a human body may be identified by known object recognition algorithms or by having the target wear an augmented reality sticker or tag or by other suitable methods) and the location of the target in the first camera's image coordinate system (typically the 3D location relative to the first camera) of the target can be estimated (114), for example by comparing the known average size of the target in real life (e.g., the circumference of an average human head can be between 50 and 60 cm or a sticker or tag may have a known size) to the angular size of the target in the image and thus estimating the distance of the target from the camera imaging it.
[0047] In a second camera system the user is imaged (120). In the images of the user, the user's hand (or other body part) may be identified (e.g., by using shape recognition algorithms) and tracked (e.g., by determining optical flow or other known tracking methods). A location of the hand in one or more images, on the second camera's image coordinate system, is calculated (122). The two image coordinate systems may be aligned or registered to create a unified coordinate system and a trajectory of a virtual object being manipulated by the user's hand (or other body part) may be calculated based on the hand location and target location in the unified coordinate system (124).
[0048] According to one embodiment the images from the first and second cameras are aligned or registered (e.g., by coordinate transformation) and a meeting point between the estimated location of the target (estimated in step 114) and the trajectory of the virtual object (calculated in step 124) can then be identified. If the trajectory coincides with the location of the target (130) then the user may be notified (140) (e.g., by a graphic indication or message or other alert appearing on the user's display). If no meeting point was found (the trajectory does not coincide with the location of the target then no alert appears on the user's display or the user may be alerted to the fact that he "missed" (150). Alternatively, users may be notified by how much they missed or they may be advised of parameters relating to their throw so that they may improve their aim next time.
[0049] Several users may be connected through a network (e.g. a cellular network) so that they may all receive alerts relating to each other's actions.
[0050] According to one embodiment different users may be identified based on unique user identification details (for example, which may be entered by the user when registering through a game management application) or unique characteristics of their mobile device (e.g., based on their GPS location or based on a unique identifier (ID) (e.g. an RF ID) or based on known mobile device specification or design which may be entered, for example, by the user when registering for a game) and notice of a "hit" on a specific user may be sent to that user based on this identification. Alternatively, communication between mobile devices of players (such as by IR, Bluetooth or other wireless communication techniques) may be used to identify a hit player. Other methods of identifying a hit player may be used.
[0051 ] The game described above may be played using any electronic device that has or that is connected to an electronic display and camera. Preferably a mobile device is used, enabling the user(s) to move easily and quickly, however, other less mobile devices may be used also.
[0052] An example of a device operable according to embodiments of the invention is schematically illustrated in Fig. 1C.
[0053] The device 10 may be a specifically designed device or may be a common device such as a mobile telephone which may run an appropriate application. The device 10 may include for example a display 11 and two cameras, 12 and 13. (Other numbers of cameras or imagers may be used.) Cameras 12 and 13 are located on the device and/or configured such that while the user holds the device with the display 11 facing him, camera 12 captures a field of view (FOV) which includes the user's body (according to one embodiment, the user's free hand 18, rather than the hand holding the device) and camera 13 captures a FOV of the world outside of the user. According to one embodiment camera 12 captures a FOV which is directly or almost directly opposite, or facing in a substantially reverse direction, the FOV captured by camera 13, however, the cameras may be positioned or located in other configurations. For example the cameras may face in opposite (e.g., 180 degrees difference in the direction in which they are pointed) or substantially opposite directions.
[0054] According to one embodiment the cameras 12 and 13 are 2D cameras embedded in device 10. According to other embodiments three dimensional cameras may used to obtain images of the user and/or of the outside world. According to other embodiments a plurality of two dimensional cameras may be used to obtain the images. The plurality of 2D cameras may be positioned relative to each other to obtain a stereoscopic view of the user's hand and/or of the outside world.
[0055] Display 11 shows the user the FOV captured by camera 13 (typically including a target 14) and may show the user animated additions, such as an animated trajectory 15 of a virtual object and graphical indications of "hits" 16.
[0056] The device may include buttons 17 and switches for operation of the device such as ON/OFF, send, volume, etc.
[0057] The device 10 typically includes processors for operating the game, as described above.
[0058] For example, processor 101 may be configured for carrying out embodiments of the invention by for example being connected to a memory (e.g., memory 102) and carrying out instructions or executing software stored on the memory. Embodiments of the invention may include a non-transitory computer readable storage medium including or storing a computer program or computer executable instructions which, when executed by a processor in a computing system cause the processor to perform methods described herein.
[0059] Another game operable according to embodiments of the invention is described in Fig. 2.
[0060] In the game exemplified in Fig. 2 the environment (210) displayed to the user is typically computer generated. The user may control one or more of the computer generated objects by gesturing.
[0061] A system capable of supporting a game such as described with reference to Fig. 2 and Figs. 1A and IB typically includes an image sensor or camera to obtain image data of a field of view (FOV). The image data is sent to a processor to perform image analysis to detect and track a user's hand from the image data and to detect postures and gestures of the user's hand. For example, a posture or shape of the user's hand can be detected or identified by using an algorithm which calculates Haar-like features in a Viola- Jones object detection framework, to detect a hand shape.
[0062] The image sensor may be associated with a storage device for storing image data. The storage device may be integrated within the image sensor or may be external to the image sensor. According to some embodiments image data may be stored in the processor, for example in a cache memory.
[0063] According to some embodiments more than one processor may be used by the system.
[0064] The game may be operated on any electronic device that has or that is connected to an electronic display, e.g., television (TV), DVD player, PC, mobile phone, camera, or on an electronic device available with an integrated standard 2D camera. According to some embodiments a camera is an external accessory to the device. An external camera may include a processor and appropriate algorithms for gesture/posture control. According to some embodiments, more than one 2D camera is provided to enable obtaining 3D information. According to some embodiments the system includes a 3D and/or stereo camera.
[0065] Processors may be integral to the image sensor or may be in separate units. Alternatively, a processor may be integrated within the device. According to other embodiments a first processor may be integrated within the image sensor and a second processor may be integrated within the device.
[0066] Communication between the image sensor and the processor and/or between the processor and the device may be through a wired or wireless link, such as through IR communication, radio transmission, Bluetooth technology and other suitable communication routes and protocols.
[0067] According to one embodiment the image sensor may be a standard 2D camera such as a webcam or other standard video capture device, typically installed on PCs, smart phones or other electronic devices. According to some embodiments, the image sensor can be IR sensitive. According to other embodiments the system may include a stereo camera. The processor can apply image analysis algorithms, such as motion detection and shape recognition algorithms to identify and further track the user's hand.
[0068] Referring back to Fig. 2, a user may control a computer generated object on a display by touchlessly (e.g., not touching a game controller or screen) gesturing within the field of view of the camera connected to the computer. Parameters of the user's hand movement (e.g., as discussed with reference to Fig. 3) are calculated and translated to movement of a selected object. According to one embodiment the object is an arrow (202) or some other type of ammunition. Other objects may be similarly controlled.
[0069] A trajectory (204) of the shot arrow may be animated on the display (200). Hits may give the user points, which may be displayed on screen (e.g., display 200) during the game.
[0070] According to one embodiment an icon (206) or other graphical representation (e.g., an icon of a hand) may appear on screen to reassure the user that his hand is within FOV of the camera. The icon may indicate to the user also if the system is in a mode of translating his hand movements to "throwing" or not (e.g., based on the shape of the icon).
[0071] According to one embodiment a user's hand movement is translated to movement of a graphical object (such as in Figs. 1A and IB and Fig. 2) based on identification of a pre-determined posture or shape of the user's hand. For example, a user may move an open hand (all fingers extended) in view of a camera to control a cursor or other functions of a device whereas, when the user closes his fingers to make a fist or grab-like posture or changes the shape of his hand to any other pre-determined posture, movement of the user's hand will be translated to control a display in games as described above.
[0072] According to another embodiment hand gestures for controlling the device may include touch gestures. According to one embodiment the display (e.g., display 11) is touch sensitive and a user may apply a touch gesture on a location on the display to signify selecting a graphical object and initiating an event in which further touchless gestures of the user's hand are translated to other manipulations of the graphical object. For example, a touch-based pinch gesture may signify "select" while a further touchless gestures may cause movement of the selected graphical object.
[0073] A method for controlling a device, according to embodiments of the invention is schematically illustrated in Fig. 3. According to one embodiment the method includes obtaining a sequence of images, the images comprising a user's hand or other body part (310); determining X,Y coordinates of the user's hand at a specific location within an image from the sequence of images (320) to calculate a direction of a vector (330). A Z-coordinate of the user's hand at the specific location is also determined (340) to calculate a magnitude of the vector (350) and the vector is used to control the device (360).
[0074] According to one embodiment controlling the device includes setting a direction of movement of the object on the device's display based on the direction of the vector and setting the velocity of the object on the device's display based on the magnitude of the vector.
[0075] For example, a user playing virtual paintball (as in Fig. 1A and IB) or other computer games may move his hand (according to one embodiment, while in a predetermined posture) back to stretch a virtual bow or slingshot and then may apply a hand gesture or posture such as opening the palm to signify release of the virtual arrow or other ammunition. The trajectory of the virtual arrow being shot by the user is based on a vector calculated based on the location of the user's hand on the X and Y axes and a location on the Z axis (typically relative to the camera imaging the hand) when the hand release gesture or posture is detected.
[0076] According to one embodiment movement of a hand may be defined as having a beginning (when a hand changes from static to moving) and an end (when the hand changes from mobile to static). A "select event" may be determined to occur at the beginning of the movement and a "release event" may be determined to occur at the end of the movement. According to some embodiments the beginning and end of hand movements may be detected based on the shape of the user's hand (e.g., using one posture of the hand to signify starting a hand movement (and a "select event") and another posture to signify ending the hand movement (and a "release event")) or based on a time period lapsed or on motion parameters (e.g., based on the speed of the motion detected where below a predetermined speed of movement the hand is defined "static" and above the predefined speed the hand is determined to be "mobile") or by other suitable methods.
[0077] According to some embodiments the X, Y, Z coordinates are determined upon detection of a change of posture of the hand or upon detection of a pre-determined posture of the hand. Thus, if a user closes the fingers (or one finger) of his hand into a grab-like or pinch-like posture to select and pull back a virtual arrow in a virtual bow, locations of the hand on the X and Y axes, while in the grab or pinch-like posture, are used to calculate an estimated direction of movement of the virtual arrow whereas the location of the hand on the X, Y and Z axis once the user opens his fingers (e.g., to simulate letting go of the arrow) are used to determine the velocity (possibly the initial velocity) and/or the distance of travel of the virtual arrow from the X,Y location when the hand's posture changed.
[0078] According to one embodiment the camera used to obtain images of the user is a 2D camera and the X,Y coordinates relate to a coordinate system of the image produced by the camera while the Z coordinates relate to locations relative to the 2D camera itself. Z coordinates may be relative locations (e.g., closer or further away from the camera).
[0079] Detecting X, Y and Z coordinates may be done, for example, by using the known dimensions of the image frames and by detecting a pitch angle of the hand (e.g., by calculating the angle between the user's arm and a transverse axis of the hand or arm or between the hand and a virtual line connecting the hand and the display, or between a virtual line connecting the hand and the camera used to obtain the images of the hand). In some cases the size or shape of the user's hand (or change in size or shape of the user's hand in between images) may be used in calculating the angle of pitch of the hand and/or to determine a coordinate of the hand. Additional methods may be used for detecting X,Y, Z coordinates, for example, by detecting a transformation of movement of selected points/pixels from within images of a hand, determining changes of scale along X and Y axes from the transformations and determining movement along the Z axis from the scale changes or any other appropriate methods, for example, by using stereoscopy or 3D imagers.

Claims

1. A method for computer vision based control of a device, the method comprising: obtaining a first sequence of images, the images comprising a user's hand; determining X and Y coordinates of the user's hand in an image within the sequence of images, the X and Y coordinates determining a direction of a vector; determining a Z coordinate of the user's hand to determine a magnitude of the vector; and controlling a device based on the vector.
2. The method of claim 1 comprising obtaining a sequence of images using a 2D camera and wherein the X,Y and Z coordinates are relative to the 2D camera.
3. The method of claim 1 wherein determining the X,Y and Z locations of the user's hand comprises detecting a posture of the user's hand and determining the X,Y and Z coordinates of the hand in the first posture.
4. The method of claim 3 comprising controlling the device based on the
detection of the posture of the user's hand and based on the vector.
5. The method of claim 1 wherein controlling the device comprises interacting with an object displayed on the device based on the vector.
6. The method of claim 5 wherein interacting with the displayed object
comprises setting a direction of movement of the object on the device's display based on the direction of the vector and setting the velocity of the object on the device's display based on the magnitude of the vector.
7. The method of claim 6 comprising detecting a shape of the user's hand and causing the object to move on the display of the device, based on the detection of the shape of the user's hand.
8. The method of claim 5 comprising: obtaining a second sequence of images, the images comprising a target; determining a location of the target in an image within the sequence of images; and controlling the device based on the vector and on the location of the target.
9. The method of claim 8 wherein the images of the first sequence of images are images of the user and images of the second sequence of images are real world images.
10. An augmented reality system, the system comprising:
a first camera for obtaining images of a real world;
a second camera for obtaining images of a user;
a display for displaying images of the real world and for displaying a user controlled graphical object; and
a processor for identifying user gestures from the images of the user and for controlling the graphical object based on the user gesture.
11. The system of claim 10 wherein the first and second cameras are located on a single device.
12. The system of claim 11 wherein the first and second cameras are configured for obtaining opposing fields of view.
13. The system of claim 10 wherein the processor for identifying user gestures identifies a location of a gesturing hand on X,Y coordinates of the second camera and a coordinate on the Z axis relative to the second camera and controls the graphical object based on the X,Y, Z coordinates.
14. The system of claim 10 wherein the processor is to create a display of a trajectory based on the user gestures.
15. The system of claim 14 wherein the processor is for identifying a target in the real world images and to estimate a location of the target on a set of coordinates of the first camera.
16. The system of claim 15 wherein the processor is for identifying a meeting point between the trajectory and the estimated location of the target and for issuing an alert for the user based on the identification of the meeting point.
17. The system of claim 10 wherein the processor is for identifying a shape of a hand and wherein controlling the graphical object is based on the identification of the shape of the hand and on the user gesture.
PCT/IL2014/050073 2013-01-21 2014-01-21 Gesture control in augmented reality WO2014111947A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361754653P 2013-01-21 2013-01-21
US61/754,653 2013-01-21

Publications (1)

Publication Number Publication Date
WO2014111947A1 true WO2014111947A1 (en) 2014-07-24

Family

ID=51209097

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2014/050073 WO2014111947A1 (en) 2013-01-21 2014-01-21 Gesture control in augmented reality

Country Status (1)

Country Link
WO (1) WO2014111947A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8938124B2 (en) 2012-05-10 2015-01-20 Pointgrab Ltd. Computer vision based tracking of a hand
WO2016076951A1 (en) * 2014-11-14 2016-05-19 Qualcomm Incorporated Spatial interaction in augmented reality
CN108958475A (en) * 2018-06-06 2018-12-07 阿里巴巴集团控股有限公司 virtual object control method, device and equipment
US20200028843A1 (en) * 2018-07-17 2020-01-23 International Business Machines Corporation Motion Based Authentication
US10996814B2 (en) 2016-11-29 2021-05-04 Real View Imaging Ltd. Tactile feedback in a display system
WO2023093167A1 (en) * 2021-11-25 2023-06-01 荣耀终端有限公司 Photographing method and electronic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110242134A1 (en) * 2010-03-30 2011-10-06 Sony Computer Entertainment Inc. Method for an augmented reality character to maintain and exhibit awareness of an observer
US20120113223A1 (en) * 2010-11-05 2012-05-10 Microsoft Corporation User Interaction in Augmented Reality
US20120119991A1 (en) * 2010-11-15 2012-05-17 Chi-Hung Tsai 3d gesture control method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110242134A1 (en) * 2010-03-30 2011-10-06 Sony Computer Entertainment Inc. Method for an augmented reality character to maintain and exhibit awareness of an observer
US20120113223A1 (en) * 2010-11-05 2012-05-10 Microsoft Corporation User Interaction in Augmented Reality
US20120119991A1 (en) * 2010-11-15 2012-05-17 Chi-Hung Tsai 3d gesture control method and apparatus

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8938124B2 (en) 2012-05-10 2015-01-20 Pointgrab Ltd. Computer vision based tracking of a hand
WO2016076951A1 (en) * 2014-11-14 2016-05-19 Qualcomm Incorporated Spatial interaction in augmented reality
US20160140763A1 (en) * 2014-11-14 2016-05-19 Qualcomm Incorporated Spatial interaction in augmented reality
CN107077169A (en) * 2014-11-14 2017-08-18 高通股份有限公司 Spatial interaction in augmented reality
US9911235B2 (en) 2014-11-14 2018-03-06 Qualcomm Incorporated Spatial interaction in augmented reality
CN107077169B (en) * 2014-11-14 2020-04-28 高通股份有限公司 Spatial interaction in augmented reality
US10996814B2 (en) 2016-11-29 2021-05-04 Real View Imaging Ltd. Tactile feedback in a display system
CN108958475A (en) * 2018-06-06 2018-12-07 阿里巴巴集团控股有限公司 virtual object control method, device and equipment
US20200028843A1 (en) * 2018-07-17 2020-01-23 International Business Machines Corporation Motion Based Authentication
US10986087B2 (en) * 2018-07-17 2021-04-20 International Business Machines Corporation Motion based authentication
WO2023093167A1 (en) * 2021-11-25 2023-06-01 荣耀终端有限公司 Photographing method and electronic device

Similar Documents

Publication Publication Date Title
US11157725B2 (en) Gesture-based casting and manipulation of virtual content in artificial-reality environments
US9928650B2 (en) Computer program for directing line of sight
TWI722280B (en) Controller tracking for multiple degrees of freedom
US9495800B2 (en) Storage medium having stored thereon image processing program, image processing apparatus, image processing system, and image processing method
US8696458B2 (en) Motion tracking system and method using camera and non-camera sensors
JP5622447B2 (en) Information processing program, information processing apparatus, information processing system, and information processing method
EP2371434B1 (en) Image generation system, image generation method, and information storage medium
WO2014111947A1 (en) Gesture control in augmented reality
JP3530772B2 (en) Mixed reality device and mixed reality space image generation method
CN107646098A (en) System for tracking portable equipment in virtual reality
JP5690135B2 (en) Information processing program, information processing system, information processing apparatus, and information processing method
US11086475B1 (en) Artificial reality systems with hand gesture-contained content window
KR20140090159A (en) Information processing apparatus, information processing method, and program
US10921879B2 (en) Artificial reality systems with personal assistant element for gating user interface elements
JP2000350859A (en) Marker arranging method and composite reality really feeling device
US11043192B2 (en) Corner-identifiying gesture-driven user interface element gating for artificial reality systems
EP3066543B1 (en) Face tracking for additional modalities in spatial interaction
US20220362667A1 (en) Image processing system, non-transitory computer-readable storage medium having stored therein image processing program, and image processing method
US10852839B1 (en) Artificial reality systems with detachable personal assistant for gating user interface elements
US11557103B2 (en) Storage medium storing information processing program, information processing apparatus, information processing system, and information processing method
JP2021060627A (en) Information processing apparatus, information processing method, and program
CN113289336A (en) Method, apparatus, device and medium for tagging items in a virtual environment
US10948978B2 (en) Virtual object operating system and virtual object operating method
US11944897B2 (en) Device including plurality of markers
CN110036359B (en) First-person role-playing interactive augmented reality

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14740709

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14740709

Country of ref document: EP

Kind code of ref document: A1