WO2014111947A1

WO2014111947A1 - Gesture control in augmented reality

Info

Publication number: WO2014111947A1
Application number: PCT/IL2014/050073
Authority: WO
Inventors: Meir Morag; Assaf Gad; Eran Eilat
Original assignee: Pointgrab Ltd.
Priority date: 2013-01-21
Filing date: 2014-01-21
Publication date: 2014-07-24

Abstract

A system and method for computer vision based control of a device may include for example obtaining a first sequence of images, the images comprising a user's hand, determining X and Y coordinates of the user's hand in an image within the sequence of images, the X and Y coordinates determining a direction of a vector, determining a Z coordinate of the user's hand to determine a magnitude of the vector, and controlling the device based on the vector.

Description

GESTURE CONTROL IN AUGMENTED REALITY

FIELD OF THE INVENTION

[0001] The present invention relates to the field of gesture based control of electronic devices. Specifically, the invention relates to using touchless gestures in augmented reality processing.

BACKGROUND OF THE INVENTION

[0002] In an augmented reality system, a user's view of the real world is enhanced with virtual computer-generated graphics. These graphics are spatially registered so that they appear aligned with the real world from the perspective of the viewing user. For example, the spatial registration can make a virtual object appear to be located on a real surface such as a real world patch of grass or tree.

[0003] Augmented reality processing of video sequences may be performed in order to also provide real-time information about one or more objects that appear in the video sequences. With augmented reality processing, objects that appear in video sequences may be identified so that supplemental information (i.e., augmented information) can be displayed to a user about the objects in the video sequences. The supplemental information may include graphical or textual information overlayed on the frames of the video sequence so that objects are identified, defined, or otherwise described to a user.

[0004] Augmented reality systems have previously been implemented using head- mounted displays that are worn by the users. A video camera captures images of the real world in the direction of the user's gaze, and augments the images with virtual graphics before displaying the augmented images on the head-mounted display.

[0005] US publication number 2012/0154619 describes an augmented reality system which includes a video device having two different cameras; one to capture images of the world outside the user and one to capture images of the user's eyes. The images of the eyes provide information about areas of interest to the user with respect to the images captured by the first camera and a probability map may be generated based on the images of the user's eyes to prioritize objects from the first camera regarding display of augmented reality information. [0006] Alternative augmented reality display techniques exploit large spatially aligned optical elements, such as transparent screens, holograms, or video-projectors to combine the virtual graphics with the real world.

[0007] A user may interact with the displayed reality using indirect interaction devices, such as a mouse or stylus that can monitor the movements of the user to control an onscreen object. However, using such interaction devices the user may feel detached from the augmented reality environment and the feel of naturally interacting with the environment may be spoiled.

[0008] Interaction with a touch screen may also be used to interact with displayed reality. For example, a user may touch a touch sensitive screen of a cellular telephone or other mobile device which is displaying images obtained by a camera of the mobile device, to cause graphics to appear on the display at the location of the interaction with the touch sensitive screen.

[0009] Augmented reality can be used in a game environment. For example, the AppTag™ application is an application that enables to use an infra-red (IR) beam gun to target other players in an augmented reality game. The application can work with an attached smart device with a camera to obtain images of the real world and the IR beam gun is used to "shoot" at real world objects. This game requires using a special IR beam gun and requires inconveniently attaching another device (which includes a camera) to the gun.

[0010] The existing augmented reality devices and applications do not enable simple and direct user interaction with real world images and cannot give the feeling of natural unaided interaction with the real world.

SUMMARY OF THE INVENTION

[0011] Embodiments of the invention provide an enhanced real-time experience to the user with respect to video sequences that are captured and displayed in real-time without having to interact with a touch screen or any other interaction devices.

[0012] According to one embodiment two cameras and a display may be used. A first camera may be configured to capture images of the real world, the images being displayed on the display, and a second camera configured to capture images of the user. Images of the user may be processed to identify a user's gesture and the display showing images of the real world may be controlled based on the identified gesture. [0013] For example, images of the real world may be processed to detect a distinct object (e.g., images of a human head or body) and to determine a (possibly approximate) location of the distinct object relative to the first camera. A user's gesture may be identified in the images from the second camera and the user's gesture may be translated to movement of a graphical object such as an arrow or paintball ammunition on the display so that the graphical object coincides with the human head or body detected in the real world images. This way a user or group of users may play virtual paintball or other virtual war games or any other types of virtual games without having to physically interact with or use any special interaction device but rather by using natural gestures.

[0014] According to one embodiment processing the images of the second camera includes obtaining a sequence of images which include a user, typically a body part of the user, such as the user's hand. In a specific image the user's hand is located at a specific X, Y coordinate of the image. This X,Y coordinate is determined and is used to determine a direction of a vector (a vector may be for example a set or geometric entity endowed with magnitude and direction, e.g., a numerical component for a distance and a numerical component for an angle). The location of the user (e.g., the user's hand) on the Z axis (relative to the camera imaging the user) is detected to determine a magnitude of the vector. A device (e.g., a display of a device) or a graphical object displayed by the device may then be controlled based on the vector. In one embodiment, a Z coordinate may be a coordinate relative to the camera, imager, or device to be manipulated such as a television screen (typically but not necessarily relative to a right angle to the plane of such device) and the X and Y coordinates represent a position within a plane at a distance of the Z coordinate, the plane typically (but not necessarily) being perpendicular to the device.

[0015] Embodiments of this method of constructing a vector may also be used to control a device based on a user's gestures even without real world images, for example, in a game or other application where a user may move graphical objects on a pre-programmed display.

[0016] Thus, according to one embodiment of the invention there is provided a method for computer vision based control of a device, the method including obtaining a first sequence of images, the images comprising a user's hand; determining X,Y coordinates of the user's hand in an image within the sequence of images, the X,Y coordinates determining a direction of a vector; determining a Z coordinate of the user's hand to determine a magnitude of the vector; and controlling a device based on the vector.

[0017] The sequence of images may be obtained using a 2D (two dimensional) camera or imager and the X, Y and Z coordinates may be relative to the 2D camera.

[0018] According to one embodiment determining the X,Y and Z locations of the user's hand includes detecting a first posture of the user's hand and determining the X,Y and Z coordinates of the hand in the first posture. Thus, the device may be controlled based on the detection of the first posture of the user's hand and based on the vector.

[0019] Controlling the device may include interacting with an object displayed on the device based on the vector. According to one embodiment interacting with the displayed object includes setting a direction of movement of the object on the device's display based on the direction of the vector and setting the velocity of the object on the device's display based on the magnitude of the vector.

[0020] According to one embodiment the method includes obtaining a second sequence of images, the images comprising a target; determining a location of the target in an image within the sequence of images; and controlling the device based on the vector and on the location of the target.

[0021] The images of the first sequence may include images of the user and images of the second sequence may include real world images.

[0022] According to one embodiment the method includes detecting a shape of the user's hand and causing the object to move on the display of the device, based on the detection of the shape of the user's hand.

[0023] Further, there is provided an augmented reality system. According to one embodiment of the invention the system includes a first camera or imager for obtaining images of a real world; a second camera or imager for obtaining images of a user; a display for displaying images of the real world and for displaying a user controlled graphical object; and a processor for identifying user gestures from the images of the user and for controlling the graphical object based on the user gesture. [0024] According to one embodiment the first and second camera are located on a single device. According to some embodiments the first and second cameras are configured for obtaining opposing fields of view.

[0025] According to one embodiment the processor for identifying user gestures identifies a location of a gesturing hand on X,Y coordinates of the second camera and a coordinate on the Z axis relative to the second camera and controls the graphical object based on the X,Y, Z coordinates.

[0026] The processor may create a display of a trajectory based on the user gestures.

[0027] According to some embodiments the processor is configured to identify a target in the real world images and to estimate a location of the target on a set of coordinates of the first camera. The processor may identify a meeting point between the trajectory and the estimated location of the target and may issue an alert for the user based on the identification of the meeting point.

[0028] According to one embodiment the processor is for identifying a shape of a hand and the controlling of the graphical object may be based on the identification of the shape of the hand and on the user gesture.

BRIEF DESCRIPTION OF THE FIGURES

[0029] The invention will now be described in relation to certain examples and embodiments with reference to the following illustrative figures so that it may be more fully understood. In the drawings:

[0030] Figs. 1A and IB schematically illustrate an augmented reality game using gestures according to embodiments of the invention;

[0031] Fig. 1C schematically illustrates a device operable according to embodiments of the invention, for example, a device on which the augmented reality game can be played;

[0032] Fig. 2 schematically illustrates a computer game controlled by user gestures, according to embodiments of the invention; and

[0033] Fig. 3 schematically illustrates a method for controlling a device, according to embodiments of the invention. DETAILED DESCRIPTION OF THE INVENTION

[0034] In the following description, various aspects of the present invention will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the present invention. However, it will also be apparent to one skilled in the art that the present invention may be practiced without the specific details presented herein. Furthermore, well known features may be omitted or simplified in order not to obscure the present invention.

[0035] Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as "processing," "computing," "calculating," "determining," or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical, such as electronic, quantities within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices.

[0036] Embodiments according to embodiments of the invention are demonstrated through an augmented reality game schematically illustrated in Figs. 1A and IB.

[0037] The embodiment of the game exemplified in Fig. 1A includes shooting virtual paintballs at an opponent (e.g., at the opponent's head) using a mobile device as a hand-held "sling-shot". The mobile device may include two cameras (other numbers may be used), one configured to image the real world (e.g., a back camera on a mobile phone) and one configured to image the user (e.g., a front camera on a mobile telephone, a camera which faces the user during normal use of the telephone). The device is capable of controlling simultaneous operation of the back and front cameras.

[0038] A user (160) holding the mobile device (170) may image an opponent (162) with the back camera (not shown), the image of the opponent being displayed to the user on the mobile device's display (174). The user (160) may then use his hand (161) to gesture, an action which will be imaged by the front camera (not shown). Images captured by the back camera are typically displayed to the user (160) on the same or close to the same plane (e.g., the plane of display 174) of the front camera. The user (160) may direct a shooting, pitching or flinging movement of his hand (161) (or other body part) at the opponent's head (164) shown in the images displayed to the user. This movement of the user's hand may control or move a graphical object (not shown), such as an arrow or paintball ammunition, on the display, in the direction of the opponent's head (164) (in the displayed image). An animated trajectory (not shown) of the graphical object may be added on the display.

[0039] According to one embodiment a successful "hit" triggers coloring or another indication on the opponent's head in the image of the opponent displayed to the user. When more than one player is involved an indication of the hit or a report of the hit may also be sent to the hit opponent and/or other players. A still image of the hit (e.g., a picture of the opponent with an added colored mark on the opponent's head (or other body part) may be generated and saved and/or sent to the opponent and other players.

[0040] According to one embodiment a networked management system (180) can control the game and may send and receive messages to and from each player in a multi-player game.

[0041] According to one embodiment users register for a game through the management system (180). The users may register and further communicate through networks such as a cellular network, social network, etc. At the time of registration the users may be required to enter identifying details such as their height, head size etc. and/or identifying details relating to the mobile device they will be using in the game. For example, mobile telephones of a known manufacturer have known dimensions and thus details of the manufacturer and model of the mobile phone being used by a user may be used to identify the mobile telephone or the human holding the telephone as a target in the images of the real world.

[0042] According to one embodiment a user may use his mobile device held up against the background of his head to enable calibration and calculation of the user's head size (compared to the known dimensions of the mobile device).

[0043] According to one embodiment the mobile device includes a processor capable of identifying the shape of human head (or other human body parts or other predefined shapes that may be imaged in the real world). By using the (known or predetermined) average dimensions of the identified head shape and by using the angular size of the head image, an approximate distance of the head from the back and front cameras may be calculated, hence providing an approximate three dimensional (3D) location of the head shape. The approximate location may be used for detecting "hits" and for calculating the animated trajectory of the graphical object.

[0044] According to some embodiments the mobile device includes a processor capable of identifying a sticker or tag or other recognizable element worn by the opponent (e.g., an augmented reality sticker similar to stickers produced by Elipse Analysis and Design sticker). Identifying the sticker or tag facilitates the detection of "hits" and calculating the animated trajectory of the graphical object. Additional parameters used for detecting "hits" and for calculating the animated trajectory of the graphical object, may be parameters of the user's hand movement, which is imaged by the front camera. Examples of these parameters will be further described with reference to Fig. 3.

[0045] Thus, as schematically illustrated in Fig. IB, a method for user interaction with a real scene according to one embodiment of the invention, may be described by following steps carried out in two camera systems.

[0046] In a first camera system a real scene is imaged and displayed to a user (110). A target in the real scene is identified (112) (e.g., a human form or parts of a human body may be identified by known object recognition algorithms or by having the target wear an augmented reality sticker or tag or by other suitable methods) and the location of the target in the first camera's image coordinate system (typically the 3D location relative to the first camera) of the target can be estimated (114), for example by comparing the known average size of the target in real life (e.g., the circumference of an average human head can be between 50 and 60 cm or a sticker or tag may have a known size) to the angular size of the target in the image and thus estimating the distance of the target from the camera imaging it.

[0047] In a second camera system the user is imaged (120). In the images of the user, the user's hand (or other body part) may be identified (e.g., by using shape recognition algorithms) and tracked (e.g., by determining optical flow or other known tracking methods). A location of the hand in one or more images, on the second camera's image coordinate system, is calculated (122). The two image coordinate systems may be aligned or registered to create a unified coordinate system and a trajectory of a virtual object being manipulated by the user's hand (or other body part) may be calculated based on the hand location and target location in the unified coordinate system (124).

[0048] According to one embodiment the images from the first and second cameras are aligned or registered (e.g., by coordinate transformation) and a meeting point between the estimated location of the target (estimated in step 114) and the trajectory of the virtual object (calculated in step 124) can then be identified. If the trajectory coincides with the location of the target (130) then the user may be notified (140) (e.g., by a graphic indication or message or other alert appearing on the user's display). If no meeting point was found (the trajectory does not coincide with the location of the target then no alert appears on the user's display or the user may be alerted to the fact that he "missed" (150). Alternatively, users may be notified by how much they missed or they may be advised of parameters relating to their throw so that they may improve their aim next time.

[0049] Several users may be connected through a network (e.g. a cellular network) so that they may all receive alerts relating to each other's actions.

[0050] According to one embodiment different users may be identified based on unique user identification details (for example, which may be entered by the user when registering through a game management application) or unique characteristics of their mobile device (e.g., based on their GPS location or based on a unique identifier (ID) (e.g. an RF ID) or based on known mobile device specification or design which may be entered, for example, by the user when registering for a game) and notice of a "hit" on a specific user may be sent to that user based on this identification. Alternatively, communication between mobile devices of players (such as by IR, Bluetooth or other wireless communication techniques) may be used to identify a hit player. Other methods of identifying a hit player may be used.

[0051 ] The game described above may be played using any electronic device that has or that is connected to an electronic display and camera. Preferably a mobile device is used, enabling the user(s) to move easily and quickly, however, other less mobile devices may be used also.

[0052] An example of a device operable according to embodiments of the invention is schematically illustrated in Fig. 1C.

[0053] The device 10 may be a specifically designed device or may be a common device such as a mobile telephone which may run an appropriate application. The device 10 may include for example a display 11 and two cameras, 12 and 13. (Other numbers of cameras or imagers may be used.) Cameras 12 and 13 are located on the device and/or configured such that while the user holds the device with the display 11 facing him, camera 12 captures a field of view (FOV) which includes the user's body (according to one embodiment, the user's free hand 18, rather than the hand holding the device) and camera 13 captures a FOV of the world outside of the user. According to one embodiment camera 12 captures a FOV which is directly or almost directly opposite, or facing in a substantially reverse direction, the FOV captured by camera 13, however, the cameras may be positioned or located in other configurations. For example the cameras may face in opposite (e.g., 180 degrees difference in the direction in which they are pointed) or substantially opposite directions.

[0054] According to one embodiment the cameras 12 and 13 are 2D cameras embedded in device 10. According to other embodiments three dimensional cameras may used to obtain images of the user and/or of the outside world. According to other embodiments a plurality of two dimensional cameras may be used to obtain the images. The plurality of 2D cameras may be positioned relative to each other to obtain a stereoscopic view of the user's hand and/or of the outside world.

[0055] Display 11 shows the user the FOV captured by camera 13 (typically including a target 14) and may show the user animated additions, such as an animated trajectory 15 of a virtual object and graphical indications of "hits" 16.

[0056] The device may include buttons 17 and switches for operation of the device such as ON/OFF, send, volume, etc.

[0057] The device 10 typically includes processors for operating the game, as described above.

[0058] For example, processor 101 may be configured for carrying out embodiments of the invention by for example being connected to a memory (e.g., memory 102) and carrying out instructions or executing software stored on the memory. Embodiments of the invention may include a non-transitory computer readable storage medium including or storing a computer program or computer executable instructions which, when executed by a processor in a computing system cause the processor to perform methods described herein.

[0059] Another game operable according to embodiments of the invention is described in Fig. 2.

[0060] In the game exemplified in Fig. 2 the environment (210) displayed to the user is typically computer generated. The user may control one or more of the computer generated objects by gesturing.

[0061] A system capable of supporting a game such as described with reference to Fig. 2 and Figs. 1A and IB typically includes an image sensor or camera to obtain image data of a field of view (FOV). The image data is sent to a processor to perform image analysis to detect and track a user's hand from the image data and to detect postures and gestures of the user's hand. For example, a posture or shape of the user's hand can be detected or identified by using an algorithm which calculates Haar-like features in a Viola- Jones object detection framework, to detect a hand shape.

[0062] The image sensor may be associated with a storage device for storing image data. The storage device may be integrated within the image sensor or may be external to the image sensor. According to some embodiments image data may be stored in the processor, for example in a cache memory.

[0063] According to some embodiments more than one processor may be used by the system.

[0064] The game may be operated on any electronic device that has or that is connected to an electronic display, e.g., television (TV), DVD player, PC, mobile phone, camera, or on an electronic device available with an integrated standard 2D camera. According to some embodiments a camera is an external accessory to the device. An external camera may include a processor and appropriate algorithms for gesture/posture control. According to some embodiments, more than one 2D camera is provided to enable obtaining 3D information. According to some embodiments the system includes a 3D and/or stereo camera.

[0065] Processors may be integral to the image sensor or may be in separate units. Alternatively, a processor may be integrated within the device. According to other embodiments a first processor may be integrated within the image sensor and a second processor may be integrated within the device.

[0066] Communication between the image sensor and the processor and/or between the processor and the device may be through a wired or wireless link, such as through IR communication, radio transmission, Bluetooth technology and other suitable communication routes and protocols.

[0067] According to one embodiment the image sensor may be a standard 2D camera such as a webcam or other standard video capture device, typically installed on PCs, smart phones or other electronic devices. According to some embodiments, the image sensor can be IR sensitive. According to other embodiments the system may include a stereo camera. The processor can apply image analysis algorithms, such as motion detection and shape recognition algorithms to identify and further track the user's hand.

[0068] Referring back to Fig. 2, a user may control a computer generated object on a display by touchlessly (e.g., not touching a game controller or screen) gesturing within the field of view of the camera connected to the computer. Parameters of the user's hand movement (e.g., as discussed with reference to Fig. 3) are calculated and translated to movement of a selected object. According to one embodiment the object is an arrow (202) or some other type of ammunition. Other objects may be similarly controlled.

[0069] A trajectory (204) of the shot arrow may be animated on the display (200). Hits may give the user points, which may be displayed on screen (e.g., display 200) during the game.

[0070] According to one embodiment an icon (206) or other graphical representation (e.g., an icon of a hand) may appear on screen to reassure the user that his hand is within FOV of the camera. The icon may indicate to the user also if the system is in a mode of translating his hand movements to "throwing" or not (e.g., based on the shape of the icon).

[0071] According to one embodiment a user's hand movement is translated to movement of a graphical object (such as in Figs. 1A and IB and Fig. 2) based on identification of a pre-determined posture or shape of the user's hand. For example, a user may move an open hand (all fingers extended) in view of a camera to control a cursor or other functions of a device whereas, when the user closes his fingers to make a fist or grab-like posture or changes the shape of his hand to any other pre-determined posture, movement of the user's hand will be translated to control a display in games as described above.

[0072] According to another embodiment hand gestures for controlling the device may include touch gestures. According to one embodiment the display (e.g., display 11) is touch sensitive and a user may apply a touch gesture on a location on the display to signify selecting a graphical object and initiating an event in which further touchless gestures of the user's hand are translated to other manipulations of the graphical object. For example, a touch-based pinch gesture may signify "select" while a further touchless gestures may cause movement of the selected graphical object.

[0073] A method for controlling a device, according to embodiments of the invention is schematically illustrated in Fig. 3. According to one embodiment the method includes obtaining a sequence of images, the images comprising a user's hand or other body part (310); determining X,Y coordinates of the user's hand at a specific location within an image from the sequence of images (320) to calculate a direction of a vector (330). A Z-coordinate of the user's hand at the specific location is also determined (340) to calculate a magnitude of the vector (350) and the vector is used to control the device (360).

[0074] According to one embodiment controlling the device includes setting a direction of movement of the object on the device's display based on the direction of the vector and setting the velocity of the object on the device's display based on the magnitude of the vector.

[0075] For example, a user playing virtual paintball (as in Fig. 1A and IB) or other computer games may move his hand (according to one embodiment, while in a predetermined posture) back to stretch a virtual bow or slingshot and then may apply a hand gesture or posture such as opening the palm to signify release of the virtual arrow or other ammunition. The trajectory of the virtual arrow being shot by the user is based on a vector calculated based on the location of the user's hand on the X and Y axes and a location on the Z axis (typically relative to the camera imaging the hand) when the hand release gesture or posture is detected.

[0076] According to one embodiment movement of a hand may be defined as having a beginning (when a hand changes from static to moving) and an end (when the hand changes from mobile to static). A "select event" may be determined to occur at the beginning of the movement and a "release event" may be determined to occur at the end of the movement. According to some embodiments the beginning and end of hand movements may be detected based on the shape of the user's hand (e.g., using one posture of the hand to signify starting a hand movement (and a "select event") and another posture to signify ending the hand movement (and a "release event")) or based on a time period lapsed or on motion parameters (e.g., based on the speed of the motion detected where below a predetermined speed of movement the hand is defined "static" and above the predefined speed the hand is determined to be "mobile") or by other suitable methods.

[0077] According to some embodiments the X, Y, Z coordinates are determined upon detection of a change of posture of the hand or upon detection of a pre-determined posture of the hand. Thus, if a user closes the fingers (or one finger) of his hand into a grab-like or pinch-like posture to select and pull back a virtual arrow in a virtual bow, locations of the hand on the X and Y axes, while in the grab or pinch-like posture, are used to calculate an estimated direction of movement of the virtual arrow whereas the location of the hand on the X, Y and Z axis once the user opens his fingers (e.g., to simulate letting go of the arrow) are used to determine the velocity (possibly the initial velocity) and/or the distance of travel of the virtual arrow from the X,Y location when the hand's posture changed.

[0078] According to one embodiment the camera used to obtain images of the user is a 2D camera and the X,Y coordinates relate to a coordinate system of the image produced by the camera while the Z coordinates relate to locations relative to the 2D camera itself. Z coordinates may be relative locations (e.g., closer or further away from the camera).

[0079] Detecting X, Y and Z coordinates may be done, for example, by using the known dimensions of the image frames and by detecting a pitch angle of the hand (e.g., by calculating the angle between the user's arm and a transverse axis of the hand or arm or between the hand and a virtual line connecting the hand and the display, or between a virtual line connecting the hand and the camera used to obtain the images of the hand). In some cases the size or shape of the user's hand (or change in size or shape of the user's hand in between images) may be used in calculating the angle of pitch of the hand and/or to determine a coordinate of the hand. Additional methods may be used for detecting X,Y, Z coordinates, for example, by detecting a transformation of movement of selected points/pixels from within images of a hand, determining changes of scale along X and Y axes from the transformations and determining movement along the Z axis from the scale changes or any other appropriate methods, for example, by using stereoscopy or 3D imagers.

Claims

1. A method for computer vision based control of a device, the method comprising: obtaining a first sequence of images, the images comprising a user's hand; determining X and Y coordinates of the user's hand in an image within the sequence of images, the X and Y coordinates determining a direction of a vector; determining a Z coordinate of the user's hand to determine a magnitude of the vector; and controlling a device based on the vector.

2. The method of claim 1 comprising obtaining a sequence of images using a 2D camera and wherein the X,Y and Z coordinates are relative to the 2D camera.

3. The method of claim 1 wherein determining the X,Y and Z locations of the user's hand comprises detecting a posture of the user's hand and determining the X,Y and Z coordinates of the hand in the first posture.

4. The method of claim 3 comprising controlling the device based on the

detection of the posture of the user's hand and based on the vector.

5. The method of claim 1 wherein controlling the device comprises interacting with an object displayed on the device based on the vector.

6. The method of claim 5 wherein interacting with the displayed object

comprises setting a direction of movement of the object on the device's display based on the direction of the vector and setting the velocity of the object on the device's display based on the magnitude of the vector.

7. The method of claim 6 comprising detecting a shape of the user's hand and causing the object to move on the display of the device, based on the detection of the shape of the user's hand.

8. The method of claim 5 comprising: obtaining a second sequence of images, the images comprising a target; determining a location of the target in an image within the sequence of images; and controlling the device based on the vector and on the location of the target.

9. The method of claim 8 wherein the images of the first sequence of images are images of the user and images of the second sequence of images are real world images.

10. An augmented reality system, the system comprising:

a first camera for obtaining images of a real world;

a second camera for obtaining images of a user;

a display for displaying images of the real world and for displaying a user controlled graphical object; and

a processor for identifying user gestures from the images of the user and for controlling the graphical object based on the user gesture.

11. The system of claim 10 wherein the first and second cameras are located on a single device.

12. The system of claim 11 wherein the first and second cameras are configured for obtaining opposing fields of view.

13. The system of claim 10 wherein the processor for identifying user gestures identifies a location of a gesturing hand on X,Y coordinates of the second camera and a coordinate on the Z axis relative to the second camera and controls the graphical object based on the X,Y, Z coordinates.

14. The system of claim 10 wherein the processor is to create a display of a trajectory based on the user gestures.

15. The system of claim 14 wherein the processor is for identifying a target in the real world images and to estimate a location of the target on a set of coordinates of the first camera.

16. The system of claim 15 wherein the processor is for identifying a meeting point between the trajectory and the estimated location of the target and for issuing an alert for the user based on the identification of the meeting point.

17. The system of claim 10 wherein the processor is for identifying a shape of a hand and wherein controlling the graphical object is based on the identification of the shape of the hand and on the user gesture.