WO2015189860A2 - Method for interaction with devices - Google Patents

Method for interaction with devices Download PDF

Info

Publication number
WO2015189860A2
WO2015189860A2 PCT/IN2015/000241 IN2015000241W WO2015189860A2 WO 2015189860 A2 WO2015189860 A2 WO 2015189860A2 IN 2015000241 W IN2015000241 W IN 2015000241W WO 2015189860 A2 WO2015189860 A2 WO 2015189860A2
Authority
WO
WIPO (PCT)
Prior art keywords
user
devices
camera
server
gesture
Prior art date
Application number
PCT/IN2015/000241
Other languages
French (fr)
Other versions
WO2015189860A3 (en
Inventor
Pranav Mishra
Rajeswari Kannan
Ramesh Raskar
Original Assignee
Lensbricks Technology Private Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lensbricks Technology Private Limited filed Critical Lensbricks Technology Private Limited
Publication of WO2015189860A2 publication Critical patent/WO2015189860A2/en
Publication of WO2015189860A3 publication Critical patent/WO2015189860A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm

Definitions

  • the present invention describes a system and method of interaction with devices using gestures.
  • Gesture recognition technology helps a device to interpret human gestures using mathematical algorithms. Such gestures are mainly a combination of hand, arm, body and facial gestures. Gestures are interpreted based on a number of variables including spatial information, the path a gesture takes, symbolic information encoded in a gesture or information associated with a user's emotions. The concept of capturing gestures can be done using many peripherals including wired gloves, depth-aware cameras, stereo cameras, controllers and radars . Vision analysis and image processing forms an integral part of such gesture recognition systems.
  • Vision based gesture recognition mainly depends on static human pose, static hand pose, activity human body gesture, and activity hand gesture of the user with respect to the camera. Different tasks are being associated with the gesture recognition systems like hand tracking, dynamic gesture recognition, static gesture recognition, sign language recognition and pointing.
  • Vision based gesture recognition method involves capturing, storing images, mapping the stored images to corresponding control commands to obtain the desired function output. Vision based gesture recognition functions by image processing and then video content analysis. Signal processing is to be done for the obtained images through the camera to extract the required features of the.
  • Video content analysis depends on automated detection of temporal and spatial events considering the motion detection to fixed background scene estimation gestures where the process includes detection of the object and segmentation of the object from a video in foreground done by the methods of background subtraction or compression.
  • WO 2013090868 Al titled "Interacting with a mobile device within a vehicle using gestures” describes a method for recognizing gestures using a mobile device within a vehicle.
  • a mobile device is used to perform operations based on the gestures made by the user.
  • the gestures made by the user within an interaction space is captured by a camera, this information is transferred to the mobile device, instructing it to perform the desired operation.
  • the current invention is not restricted to a vehicle.
  • the current invention describes a method where gestures are used to interact in a place with many users.
  • EP 26552 Al titled “Method and apparatus for providing a mechanism for gesture recognition” describes a mechanism for utilizing gesture recognition, which further includes down sampling of images to generate down- sampled image blocks for plurality of image frames. Further, the moving status of the down sampled images is determined based on the different values of the respective features in the consecutive frames. Also, based on the first border and second border of the projection histogram, the motion status of an object is determined.
  • the present invention describes a system and method for gesture based, context sensitive response in a user's interaction with their environment, using one or more cameras and sensors to detect the user-specific preferences.
  • the present invention proposes a system and method, comprised of three abstract modules including: (a) Initial Configuration, (b) Gesture Detection, and (c) Context Sensitive Response.
  • the system consists of one or more cameras that capture one or more events, a server where the event is detected, analyzed and the information is stored in the form of tables. Further, the desired action is performed based on the information stored and the context determined by the user.
  • the detailed descriptions of the modules are as follows:
  • Initial configuration The system is initially configured for the user and the devices that are to be controlled by the user.
  • the user configures the indoor locations by moving across the room along with a hand held mobile device with an installed application in it.
  • One or more cameras mounted in the room detects the user's site, the information on the location can be obtained from the hand held mobile device (GPS co-ordinate).
  • GPS co-ordinate This information is further stored in a server in the form of a position co-ordinate table.
  • the table gives the real world coordinates of the user based upon their location observed on the camera.
  • a user can fetch the GPS co- ordinate by sending a unique flash pattern emitted by their hand held device. Based upon the site of the flash pattern (from the camera), the user's co-ordinates are transmitted back to the hand held device.
  • Initialization of devices in the room takes place by marking the region of interest (ROI). Each of these devices is provided with an ID, and this is stored in the server in the form of a Device ID table.
  • ROI region of interest
  • Each of these devices is provided with an ID, and this is stored in the server in the form of a Device ID table.
  • Gesture Detection Once the camera captures the user making a gesture, this video is streamed to the server or the processing can be done on the processor itself. The server detects and analyzes the gesture based on the direction towards which the user is making the gesture.
  • Context Sensitive Response Upon detection of the user's gesture by the server, the direction towards which the user is making the gesture is analyzed. Further, based on the state table stored in the server, the devices are accordingly notified.
  • both devices When there are many users, and a user waves towards a fan and the other towards a TV, then both devices will change their state.
  • the gesture made by the users is detected independent of the flash design, and actions are performed independent of the flash design.
  • the user's profile stored in an application installed in the user's handheld device is shared with the server.
  • a linkage between the user's profile and the co-ordinate of the user is made through the flash pattern emitted by the user's handheld device or by face recognition. Subsequently 'the waiter is notified; hence the waiter attends to the user along with their profile.
  • This information from the server can further be transmitted to a third party's mobile device, based on the direction towards which the user is making a gesture.
  • Figure 1 describes the method of interacting with one or more devices by making gestures.
  • Figure 2a describes the initial configuration of the system for a user
  • Figure 2b describes the working of the system when there are many users in a room.
  • Figure 2c describes the steps for initial configuration of the system, for one or more fixed devices in a room.
  • Figure 2d describes a method for tracking the user using multiple cameras
  • Figure 3 describes the steps to enable a user to control their device, which is not within their proximity.
  • Figure 4 describes a method where the users profile is shared with a third party.
  • Figure 5 describes the architecture of the system of the present invention. DETAILED DESCRIPTION OF THE ACCOMPANYING EMBODIMENTS
  • Figure 1 describes the method for interacting with one or more devices by making gestures.
  • one or more cameras mounted in the room detect the user in the scene 1.
  • the camera recognizes the site of the user, and a co-ordinate is associated with the position of the user 2.
  • the user moves around the room with an application installed on their mobile device. Every time the user moves, the camera mounted records the position of the user.
  • the indoor position (GPS co-ordinate) is obtained from the mobile device, and is tagged with the user's position. This is stored in a server as a position co-ordinate table 3. Further, one or more fixed devices in the room are registered 4.
  • the camera uses other still objects in the room to mark the depth of the devices.
  • the position of each of these devices, along with their state is stored in the server in the form of a state table 5.
  • the state of the device typically depends on the type of device.
  • the camera in the room captures the hand of the user, from the elbow to the fmger 6. Additionally, the orientation of the wrist, from the key feature points on the fmger is detected 7. This information is utilized to identify the device at which the user is making a gesture 8.
  • the (x, y, z) co-ordinates of both the user and all the devices are known. Considering two devices in the room at (xl, yl, zl) and (x2, y2, z2), the user's position is at (x, y, z).
  • the direction of the device from the user can be determined by using dot product:
  • Figure 2a describes the initial configuration of the system by a user 20.
  • the user 20 can initially configure the indoor locations by moving across the room 21 with a mobile device 22, and an installed use in their mobile device.
  • a camera 23 mounted in the room records the position of the user 20 and the indoor position (coordinate) is obtained from the mobile device 20, and is tagged to the user's 20 position.
  • Position-0 is the right hand corner of the Room-0.
  • the location and ID tuples are stored in a table.
  • the user 20 can press a number at each of their locations and that number gets associated with the position at which the user is present, as observed by the camera 23.
  • Position-0 has a number "0" associated with it and is the right hand corner of the Room-0. This information is stored in a server 24 in a position co-ordinate table.
  • Figure 2b describes the working of the system when there are many users in a room.
  • the camera installed in the room observers the user and detects their position 30. This observed position and co-ordinate is stored in the position co-ordinate table on the server.
  • the user enters the field view of the camera 31.
  • the camera notifies the server about the position of the user in its frame, and the server can identify the coordinate of the user from the position co-ordinate table. This can be done successfully for many users.
  • the user wants to procure their position co-ordinate, they wave their hand 32.
  • the camera recognizes the co-ordinates of the person waving the hand from the position co-ordinate table.
  • the device obtains the co-ordinates of the user waving their hand from the server.
  • the server will be unable to identify the user who is waving their hand, and the corresponding device associated with that user 33.
  • the mobile device held by the user transmits a code through its Flash.
  • Flash can transmit 64 as ⁇ 000000'.
  • the camera observes this.
  • This code gets associated with the co-ordinate of the device 34.
  • the mobile device wants to obtain the user's co-ordinate from the server, it uses this code to obtain the same.
  • the user enters the room they can tap their phone to the NFC (Near Field Communication) device. This gesture is recorded and the device is linked to the position co-ordinate table on the server. Thereafter the user is tracked by using the camera. Through this tracking, the server always identifies the particular user by associating the user with flash pattern emitted by the user's hand held device.
  • NFC Near Field Communication
  • Figure 2c the steps in the initial configuration of the system, for one or more fixed devices in a room.
  • the user can mark a Region of Interest (ROI) 40 from where a device 41 is located.
  • the states the device can occupy including "ON”, “OFF”, “SLEEP”, etc. is also identified.
  • Each of the devices in the ROI 40 is provided with a unique ID, called the device-ID, stored in a state table 42 maintained in a server 43.
  • the user will set priority of the device states manually. For example, they can set all the states involving sound as the highest priority. Further, they can set all states involving light as the second highest priority and then entertainment and so on. All states of the devices are classified into groups such as audio, light, entertainment etc. Intelligent contextual algorithms are used to determine the action on the devices.
  • the contextual algorithm will consider the state of all the devices and assign priorities to them based upon user settings. It will then provide the gesture information to the device having its state as the highest priority. For example, if audio is the highest priority and a phone is ringing, the phone's state is at the highest priority. As soon as a gesture is made, the phone is subsequently notified. When a gesture is made in the direction of the ROI 40, the state of the device is modified. The camera 44 recognizes the gesture made by the user. The state of the device is obtained from the state table 42 in the server 43. The user can use this feature in many ways.
  • the user For example, if the user is watching TV and they need to move away from that location, but want to watch this show on their phone as they are travelling, they further makes a gesture towards the TV to enable this.
  • the information about the channel being played is stored and bookmarked on the server. This information is shared with other devices registered with the server. As soon as they turn on another device, they will be able to view the channel on their mobile phone.
  • Figure 2d describes a method for tracking the user using multiple cameras.
  • each of these cameras has their own position co-ordinate tables.
  • the intersection co-ordinate 50 of both these cameras is identified during the initialization.
  • the initialization of earner a_l ends and the initialization of earner a_2 begins.
  • the earner a_2 initializes the grids based on camera l, and that particular co-ordinate is marked as (0, 6). This can be done either manually or automatically based on the user inputs. If indoor GPS co-ordinates are available through hand held devices, this process becomes a lot easier.
  • Figure 3 describes a method where a user can control their device that is not within their proximity.
  • a user first logs onto a server 60 and registers their mobile device with desired settings. This enables the user's phone to receive notifications from the server on the user state.
  • API's Application Program Interfaces
  • the different devices in the room are registered with the server 61.
  • the camera in the room observes the user making a gesture 62.
  • the different devices registered with the server constantly update their current state to the server 63.
  • the camera observing the user uploads the video to the server 64, which is based upon certain analytics. For example, it could be user detection or motion detection. Once the user or the motion is detected, only then the video is uploaded to the server.
  • the video analytics can be performed either on the camera or the server or this could be a combination of both. For example, a people detection algorithm can be applied on a processor on the camera.
  • the server analyses the video stream and analytics received from the camera to detect the gesture of the user 65.
  • the server detects the current state of the various registered devices. Based on the priority of the devices state, an appropriate device is notified. Alternatively, all the devices can be notified about the current user state.
  • An application running on the device receives the notification from the server. Further, the application performs the appropriate desired action on the device 66.
  • Figure 4 describes a method where the user's profile is shared with a third party.
  • a waiter can obtain the profile of their customer using the interactive device.
  • their position is, identified either by the indoor GPS co-ordinates 71, or the user taps their phone against NFC 72. This information is then conveyed to the server. Further, the user will be tracked using the images obtained from the camera.
  • the server can associate the user's profile with the co-ordinate of the user.
  • the device is then immediately associated with the server 74.
  • the waiter obtains the profile of the user from the server 76.
  • facial recognition 73 can also be used in such situations.
  • the profile of the user is already stored on the server. Once their face is recognized, a link is created between the user and their profile. The user is now tracked. Once they waves, their profile their profile is shared with the third party.
  • Figure 5 describes the architecture of the present invention.
  • the user's position is captured in a position co-ordinate table 94.
  • a camera 90 installed in the room captures multiple events 91 associated with the user, having a mobile device 89, being any of a mobile phone, PDA, etc.
  • the event is the gesture made by the user.
  • the event detection 91 is used to upload this video to the server 92.
  • the event detection layer 91 could be detecting a person, motion detection, or an action to be performed.
  • the computer vision layer 93 processes the video stream to get the appropriate gesture.
  • the, visual analytics layer 95 will process the state of the device and the gestures to generate appropriate notifications 96 for the devices.
  • the devices 97a and 97b are notified, they get registered 98 and the information about the state of the device gets stored in the server in the form of a state table. Further, the devices 97a and 97b are connected to a device state layer 99 to determine the state of the devices 97a and 97b.
  • the device state layer 99 is connected to the visual analytics layer.
  • the user will set priority of the device states manually. For example, they can set all the states involving sound as the highest priority. Further, they can set all states involving light as the second highest priority and then entertainment and so on. All states of the devices are classified into groupings such as audio, light, entertainment etc. Certain intelligent contextual algorithms are used to determine the action on the devices.
  • the contextual algorithm will consider the state of all the devices and assign priorities to them based upon user settings. It will then provide the gesture information to the device having its state as the highest priority. For example, if audio is the highest priority and a phone is ringing, the phone's state is at the highest priority. As soon as a gesture is made, the phone is notified.
  • the devices to be controlled could be present in a room, a restaurant, an aircraft, etc. and they are controlled based on the user's preference.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present invention describes a system and method for gesture based, context sensitive response in a user's interaction with their environment, using one or more cameras and sensors to detect the user-specific preferences. The system and method described in the invention involve three modules including an initial configuration, gesture recognition and a context sensitive response. Here, the user's surrounding is organized based on the prerequisites set by the user, by just making a gesture. The system makes use of a camera, and a cloud server to create a customized user space and a method to interact with a third person.

Description

METHOD FOR INTERACTION WITH DEVICES
BACKGROUND OF THE INVENTION
FIELD OF INVENTION
The present invention describes a system and method of interaction with devices using gestures.
DISCUSSION OF PRIOR ART
Many innovative ways are being developed for interacting with electronic devices with enhanced usability. These innovations include different display and input techniques such that a controllable device is provided with a desirable feature of interconnection and communication. Different input technologies have been developed for this communication. The use of a touch screen is now prevalent where the system recognizes user-inputs via touch, using sensors. However, this technique is limited to detection of physical interactions between the user and the computing device. Input techniques also include user interface by speech recognition where a voice is analyzed and necessary instruction corresponding to the voice input is being generated, but this technique has a problem that a speech recognition rate is very low. Now, the input techniques are being diversified such that the system recognizes a user's gesture to receive a command.
Gesture recognition technology helps a device to interpret human gestures using mathematical algorithms. Such gestures are mainly a combination of hand, arm, body and facial gestures. Gestures are interpreted based on a number of variables including spatial information, the path a gesture takes, symbolic information encoded in a gesture or information associated with a user's emotions. The concept of capturing gestures can be done using many peripherals including wired gloves, depth-aware cameras, stereo cameras, controllers and radars . Vision analysis and image processing forms an integral part of such gesture recognition systems.
In the present invention, a vision based approach has been developed where image from the camera is considered to be the data of the gesture and then extract the features of the gesture. Vision based gesture recognition mainly depends on static human pose, static hand pose, activity human body gesture, and activity hand gesture of the user with respect to the camera. Different tasks are being associated with the gesture recognition systems like hand tracking, dynamic gesture recognition, static gesture recognition, sign language recognition and pointing. Vision based gesture recognition method involves capturing, storing images, mapping the stored images to corresponding control commands to obtain the desired function output. Vision based gesture recognition functions by image processing and then video content analysis. Signal processing is to be done for the obtained images through the camera to extract the required features of the. Video content analysis depends on automated detection of temporal and spatial events considering the motion detection to fixed background scene estimation gestures where the process includes detection of the object and segmentation of the object from a video in foreground done by the methods of background subtraction or compression.
WO 2013090868 Al titled "Interacting with a mobile device within a vehicle using gestures" describes a method for recognizing gestures using a mobile device within a vehicle. Here, a mobile device is used to perform operations based on the gestures made by the user. The gestures made by the user within an interaction space is captured by a camera, this information is transferred to the mobile device, instructing it to perform the desired operation. However, the current invention is not restricted to a vehicle. The current invention describes a method where gestures are used to interact in a place with many users.
1 http://en.wikipedia.org/wiki/Gesture_recognition EP 26352 Al titled "Method and device for detecting gesture inputs "describes a method where, the gesture inputs are detected for two consecutive gestures made in front of a detecting device. Here, when two different gestures are made consecutively, each of these gestures is recognized by the detecting device to give different output signals. Whereas, the current invention describes a method in which multiple signals made by many users in a room is recognized, and corresponding output is given.
EP 26552 Al titled "Method and apparatus for providing a mechanism for gesture recognition " describes a mechanism for utilizing gesture recognition, which further includes down sampling of images to generate down- sampled image blocks for plurality of image frames. Further, the moving status of the down sampled images is determined based on the different values of the respective features in the consecutive frames. Also, based on the first border and second border of the projection histogram, the motion status of an object is determined. SUMMARY OF THE INVENTION
The present invention describes a system and method for gesture based, context sensitive response in a user's interaction with their environment, using one or more cameras and sensors to detect the user-specific preferences. The present invention proposes a system and method, comprised of three abstract modules including: (a) Initial Configuration, (b) Gesture Detection, and (c) Context Sensitive Response. The system consists of one or more cameras that capture one or more events, a server where the event is detected, analyzed and the information is stored in the form of tables. Further, the desired action is performed based on the information stored and the context determined by the user. The detailed descriptions of the modules are as follows:
Initial configuration: The system is initially configured for the user and the devices that are to be controlled by the user. When there is one user in a room, the user configures the indoor locations by moving across the room along with a hand held mobile device with an installed application in it. One or more cameras mounted in the room detects the user's site, the information on the location can be obtained from the hand held mobile device (GPS co-ordinate). This information is further stored in a server in the form of a position co-ordinate table. The table gives the real world coordinates of the user based upon their location observed on the camera. Once the initial configuration is done it can be utilized for multiple numbers of users. By observing the position of the user, the GPS co-ordinate for any user can be identified. However, when there are many users in the room, a user can fetch the GPS co- ordinate by sending a unique flash pattern emitted by their hand held device. Based upon the site of the flash pattern (from the camera), the user's co-ordinates are transmitted back to the hand held device. Initialization of devices in the room takes place by marking the region of interest (ROI). Each of these devices is provided with an ID, and this is stored in the server in the form of a Device ID table. Gesture Detection: Once the camera captures the user making a gesture, this video is streamed to the server or the processing can be done on the processor itself. The server detects and analyzes the gesture based on the direction towards which the user is making the gesture. When the user enters the field view of the camera, the user makes a gesture towards a device they want to control or towards a person with whom they want to share their profile with. In the case of presence of many users in a room, as in case of a restaurant, it is not important to know the identity of the user. Only the co-ordinates of the gesture is identified, and attended to. When many users are making gestures, all of them must be attended to, and the identity of the user is not important. Context Sensitive Response: Upon detection of the user's gesture by the server, the direction towards which the user is making the gesture is analyzed. Further, based on the state table stored in the server, the devices are accordingly notified. When there are many users, and a user waves towards a fan and the other towards a TV, then both devices will change their state. The gesture made by the users is detected independent of the flash design, and actions are performed independent of the flash design. The user's profile stored in an application installed in the user's handheld device is shared with the server. A linkage between the user's profile and the co-ordinate of the user is made through the flash pattern emitted by the user's handheld device or by face recognition. Subsequently 'the waiter is notified; hence the waiter attends to the user along with their profile. This information from the server can further be transmitted to a third party's mobile device, based on the direction towards which the user is making a gesture. BRIEF DESCRIPTION OF THE DRAWINGS
The drawings are shown only for the illustration purposes and should not be treated as limitations to the invention.
Figure 1 describes the method of interacting with one or more devices by making gestures. Figure 2a describes the initial configuration of the system for a user
Figure 2b describes the working of the system when there are many users in a room.
Figure 2c describes the steps for initial configuration of the system, for one or more fixed devices in a room.
Figure 2d describes a method for tracking the user using multiple cameras Figure 3 describes the steps to enable a user to control their device, which is not within their proximity.
Figure 4 describes a method where the users profile is shared with a third party. Figure 5 describes the architecture of the system of the present invention. DETAILED DESCRIPTION OF THE ACCOMPANYING EMBODIMENTS
Figure 1 describes the method for interacting with one or more devices by making gestures. Considering there is one user in a room, one or more cameras mounted in the room detect the user in the scene 1. The camera recognizes the site of the user, and a co-ordinate is associated with the position of the user 2. The user moves around the room with an application installed on their mobile device. Every time the user moves, the camera mounted records the position of the user. The indoor position (GPS co-ordinate) is obtained from the mobile device, and is tagged with the user's position. This is stored in a server as a position co-ordinate table 3. Further, one or more fixed devices in the room are registered 4. The camera uses other still objects in the room to mark the depth of the devices. The position of each of these devices, along with their state is stored in the server in the form of a state table 5. The state of the device typically depends on the type of device. The camera in the room captures the hand of the user, from the elbow to the fmger 6. Additionally, the orientation of the wrist, from the key feature points on the fmger is detected 7. This information is utilized to identify the device at which the user is making a gesture 8. The (x, y, z) co-ordinates of both the user and all the devices are known. Considering two devices in the room at (xl, yl, zl) and (x2, y2, z2), the user's position is at (x, y, z). The direction of the device from the user can be determined by using dot product:
Figure imgf000008_0001
Further, based on the position-co-ordinate table and the state table stored in the server, the state of devices belonging to the user maybe altered 9. The current state of the device is obtained from the state table. Actions associated with the gesture are decided by the state of the device 10. Figure 2a describes the initial configuration of the system by a user 20. Here, the user 20 can initially configure the indoor locations by moving across the room 21 with a mobile device 22, and an installed use in their mobile device. A camera 23 mounted in the room records the position of the user 20 and the indoor position (coordinate) is obtained from the mobile device 20, and is tagged to the user's 20 position. For example, Position-0 is the right hand corner of the Room-0. The location and ID tuples are stored in a table. Alternatively, the user 20 can press a number at each of their locations and that number gets associated with the position at which the user is present, as observed by the camera 23. For example, Position-0 has a number "0" associated with it and is the right hand corner of the Room-0. This information is stored in a server 24 in a position co-ordinate table.
Figure 2b describes the working of the system when there are many users in a room. The camera installed in the room observers the user and detects their position 30. This observed position and co-ordinate is stored in the position co-ordinate table on the server. The user enters the field view of the camera 31. The camera notifies the server about the position of the user in its frame, and the server can identify the coordinate of the user from the position co-ordinate table. This can be done successfully for many users. Now, if the user wants to procure their position co-ordinate, they wave their hand 32. The camera recognizes the co-ordinates of the person waving the hand from the position co-ordinate table. The device obtains the co-ordinates of the user waving their hand from the server. In case of many users, the server will be unable to identify the user who is waving their hand, and the corresponding device associated with that user 33. Here, the mobile device held by the user transmits a code through its Flash. For example, Flash can transmit 64 as Ί 000000'. The camera observes this. This code gets associated with the co-ordinate of the device 34. When the mobile device wants to obtain the user's co-ordinate from the server, it uses this code to obtain the same. Alternatively, when the user enters the room, they can tap their phone to the NFC (Near Field Communication) device. This gesture is recorded and the device is linked to the position co-ordinate table on the server. Thereafter the user is tracked by using the camera. Through this tracking, the server always identifies the particular user by associating the user with flash pattern emitted by the user's hand held device.
Figure 2c the steps in the initial configuration of the system, for one or more fixed devices in a room. The user can mark a Region of Interest (ROI) 40 from where a device 41 is located. The states the device can occupy including "ON", "OFF", "SLEEP", etc. is also identified. Each of the devices in the ROI 40 is provided with a unique ID, called the device-ID, stored in a state table 42 maintained in a server 43. The user will set priority of the device states manually. For example, they can set all the states involving sound as the highest priority. Further, they can set all states involving light as the second highest priority and then entertainment and so on. All states of the devices are classified into groups such as audio, light, entertainment etc. Intelligent contextual algorithms are used to determine the action on the devices. Here, the contextual algorithm will consider the state of all the devices and assign priorities to them based upon user settings. It will then provide the gesture information to the device having its state as the highest priority. For example, if audio is the highest priority and a phone is ringing, the phone's state is at the highest priority. As soon as a gesture is made, the phone is subsequently notified. When a gesture is made in the direction of the ROI 40, the state of the device is modified. The camera 44 recognizes the gesture made by the user. The state of the device is obtained from the state table 42 in the server 43. The user can use this feature in many ways. For example, if the user is watching TV and they need to move away from that location, but want to watch this show on their phone as they are travelling, they further makes a gesture towards the TV to enable this. The information about the channel being played is stored and bookmarked on the server. This information is shared with other devices registered with the server. As soon as they turn on another device, they will be able to view the channel on their mobile phone.
Figure 2d describes a method for tracking the user using multiple cameras. Here, considering two cameras, each of these cameras has their own position co-ordinate tables. The intersection co-ordinate 50 of both these cameras is identified during the initialization. When a person moves from the co-ordinate (0, 5) which lies in the field view of camera l 51 to (0, 6) which is the field view of camera_2 52, the initialization of earner a_l ends and the initialization of earner a_2 begins. The earner a_2 initializes the grids based on camera l, and that particular co-ordinate is marked as (0, 6). This can be done either manually or automatically based on the user inputs. If indoor GPS co-ordinates are available through hand held devices, this process becomes a lot easier. Further, there can be a common area, where both the cameras can view the user. Figure 3 describes a method where a user can control their device that is not within their proximity. Here, a user first logs onto a server 60 and registers their mobile device with desired settings. This enables the user's phone to receive notifications from the server on the user state. Through the server, API's (Application Program Interfaces) are provided to application developers to read the notifications received from the server and create applications based on it. The different devices in the room are registered with the server 61. The camera in the room observes the user making a gesture 62. Simultaneously, the different devices registered with the server constantly update their current state to the server 63. The camera observing the user uploads the video to the server 64, which is based upon certain analytics. For example, it could be user detection or motion detection. Once the user or the motion is detected, only then the video is uploaded to the server. The video analytics can be performed either on the camera or the server or this could be a combination of both. For example, a people detection algorithm can be applied on a processor on the camera. Once the user is detected in the room, only then the video is streamed to the server. The server analyses the video stream and analytics received from the camera to detect the gesture of the user 65. The server detects the current state of the various registered devices. Based on the priority of the devices state, an appropriate device is notified. Alternatively, all the devices can be notified about the current user state. An application running on the device receives the notification from the server. Further, the application performs the appropriate desired action on the device 66. Here, it is assumed that all the devices are linked to the Internet or a common network.
Figure 4 describes a method where the user's profile is shared with a third party. Here, considering an example of a restaurant a waiter can obtain the profile of their customer using the interactive device. When the user enters a restaurant 70, their position is, identified either by the indoor GPS co-ordinates 71, or the user taps their phone against NFC 72. This information is then conveyed to the server. Further, the user will be tracked using the images obtained from the camera. At any point, the server can associate the user's profile with the co-ordinate of the user. The device is then immediately associated with the server 74. When the user makes a gesture 75 this is captured by the camera, the waiter obtains the profile of the user from the server 76. Alternatively, facial recognition 73 can also be used in such situations. Here, the profile of the user is already stored on the server. Once their face is recognized, a link is created between the user and their profile. The user is now tracked. Once they waves, their profile their profile is shared with the third party.
Figure 5 describes the architecture of the present invention. During initialization, the user's position is captured in a position co-ordinate table 94. A camera 90 installed in the room captures multiple events 91 associated with the user, having a mobile device 89, being any of a mobile phone, PDA, etc. Here, the event is the gesture made by the user. The event detection 91 is used to upload this video to the server 92. The event detection layer 91 could be detecting a person, motion detection, or an action to be performed. The computer vision layer 93 processes the video stream to get the appropriate gesture. Further, the, visual analytics layer 95 will process the state of the device and the gestures to generate appropriate notifications 96 for the devices. Once the devices 97a and 97b are notified, they get registered 98 and the information about the state of the device gets stored in the server in the form of a state table. Further, the devices 97a and 97b are connected to a device state layer 99 to determine the state of the devices 97a and 97b. The device state layer 99 is connected to the visual analytics layer. The user will set priority of the device states manually. For example, they can set all the states involving sound as the highest priority. Further, they can set all states involving light as the second highest priority and then entertainment and so on. All states of the devices are classified into groupings such as audio, light, entertainment etc. Certain intelligent contextual algorithms are used to determine the action on the devices. Here, the contextual algorithm will consider the state of all the devices and assign priorities to them based upon user settings. It will then provide the gesture information to the device having its state as the highest priority. For example, if audio is the highest priority and a phone is ringing, the phone's state is at the highest priority. As soon as a gesture is made, the phone is notified. Here, the devices to be controlled could be present in a room, a restaurant, an aircraft, etc. and they are controlled based on the user's preference.

Claims

1. A system for interaction with devices for gesture-based, context-sensitive response, using multiple cameras and sensors to detect user preferences wherein gestures initiate multiple actions having (a) one or more cameras 90, (b) an event detection layer 91, (c) a server 92, (d) a computer vision layer 93, (e) a position co-ordinate table 94, (f) an analytics layer 95, (g) a notification layer 96, (h) one or more devices 97a,b at different proximities, (i) a device state layer 99 and (j) mobile devices associated with users 89 comprising of three abstract modules including:
i. An Initial Configuration Module which further:
a. Enables the user to set an initial configuration of the indoor locations 20;
b. Detects a scene using one or more camera mounted in the room 1;
c. Recognizes the position of the user and a co-ordinate is associated with the position of the user by the camera 2; d. Obtains the indoor position (GPS co-ordinate) from the mobile device by tagging with the user's position and creating a position co-ordinate table in a server 3;
e. Registers one or more relevant devices in the room 4 in a device state table;
f. Sets priority for the devices; and
g. Stores the position of each of the registered devices along with their state in the server in the form of a state table 5; ii. A Gesture Detection Module, which further;
a. Captures the hand of the person from the elbow to the fingers 6; b. Detects the orientation of the wrist from the key feature points on the finger 7; and
c. Identifies the device towards which the user is making a gesture 8; and
iii. A module for Context Sensitive Response, which further:
a. Identifies the device towards which the user is making a gesture 9; and
b. Associates an action with the gesture 10;
2. A system for interaction with devices of Claim 1 wherein:
a) The camera 90 installed in the room tracks users and captures the gesture made by the users;
b) The event detection layer 91 which detects a person, motion detection, or an action to be performed is used to upload this video to the server 92;
c) The computer vision layer 93 processes the video stream to get the appropriate gesture;
d) The visual analytics layer 94 processes the state of the device and the gestures to generate appropriate notifications 95 for the devices;
e) Once the devices 97a and 97b are notified, they get registered 98 and the information about the state of the device gets stored in the server in the form of a state table;
f) The position co-ordinate table 94 records the user's position;
g) The devices 97a through 97b are connected to a device state layer 99 to determine the state of the devices; and
h) The device state layer 99 is connected to the visual analytics layer 95;
3. A system for interaction with devices of Claim 1 wherein the position coordinate table 25 stored on a server 24 is configured by (a) A user 20 moving across a space 21 and indicating the position for configuration using a mobile device 22, said position being captured by a camera 23 present in the same space; and (b) Multiple users in a room moving across a space 21 and indicating the position for configuration using their mobile device 22, which is captured by a camera 23 present in the same space by unique codes associated with each user's mobile-device based camera Flash 34.
4. A system for interaction with devices of Claim 1 wherein the device state table is configured by (a) Marking a Region of Interest (ROI) 40 in which a device 41 is located, (b) Identifying the states the device can occupy including "ON", "OFF", "SLEEP" modes, (c) Providing each of the devices in the ROI 40 with a unique ID, called the device-ID, stored state table 42 maintained in the server 43, and (d) setting a priority of the devices states manually;
5. A system for interaction with devices of Claim 1 wherein a priority is associated with the devices, said priority being set by the user and classified into groups including as audio, light, entertainment such that:
a) A contextual method is enabled to assess the state of all the devices and assigns priorities to them based upon user settings; and b) A contextual method is enabled to provide the gesture information to the device having its state as the highest priority;
6. A system for interaction with devices of Claim 1 wherein contextual information about user-preferences and events are retained to enable continuous and seamless user-interaction with the devices in their environment;
7. A system for interaction with devices of Claim 1 wherein the user is tracked using more than one camera such that: a) Multiple cameras identify the intersection co-ordinates 50 of one or more cameras during the initialization wherein the initialization of one camera ends where the initialization of other camera begins;
b) One camera initializes the grids present on the other camera either manually or automatically based on the user inputs; and
c) One or more cameras have one or more common area where both the cameras can view the user;
The system for interaction with devices as in 1 wherein a user can control their device not within their proximity by logging into a server 60, such that: a) The user checks if the device is registered or not registered:
i. If the device is already registered, the camera observes the user making a gesture in the room 62 and uploads a video to the server 64; and
ii. If the device is not registered, registering the device with server 61 and uploading the current state of the device 63; b) The server analyzes the video stream and analytics received from the camera to detect the gesture of the user 65; and
c) The application running on the fixed device performs the desired action 66;
The system for interaction with devices as in 1 wherein the user's profile is shared with a third party 76 such that:
a) The camera identifies the user when they enters a room 70;
b) One or more actions of the user on their mobile device is identified including (a) their using indoor GPS co-ordinates 71, (b) tapping the device against NFC 72, and (c) face recognition 73 and linked to the server 74; and
c) The user's profile of their gesture 75 is shared with a third party 76;
10. A method for interaction with devices for gesture-based, context-sensitive response, using multiple cameras and sensors to detect user preferences wherein gestures initiate multiple actions having (a) one or more cameras 90, (b) an event detection layer 91, (c) a server 92, (d) a computer vision layer 93, (e) a position co-ordinate table 94, (f) an analytics layer 95, (g) a notification layer 96, (h) one or more devices 97a,b at different proximities (i) a device state layer 99 and (j) mobile devices associated with users 89 comprising the steps of:
i. Initial Configuration further comprising the steps of:
a. Enabling the user to set an initial configuration of the indoor locations 20;
b. Detecting a scene using one or more camera mounted in the room 1;
c. Recognizing the position of the user and a co-ordinate is associated with the position of the user by the camera 2; d. Obtaining the indoor position (GPS co-ordinate) from the mobile device by tagging with the user's position and creating a position co-ordinate table in a server 3;
e. Registering one or more relevant devices in the room 4 in a device state table;
f. Sets priority for the devices; and
g. Storing the position of each of the registered devices along with their state in the server in the form of a state table 5; ii. Gesture Detection further comprising the steps of:
a. Capturing the hand of the person from the elbow to the fingers 6;
b. Detecting the orientation of the wrist from the key feature points on the finger 7; and c. Identifying the device towards which the user is making a gesture 8; and
iii. Context Sensitive Response further comprising the steps of:
a. Identifying the device towards which the user is making a gesture 9; and
b. Associating an action with the gesture 10;
11. A method for interaction with devices of Claim 10 wherein:
a) The camera 90 installed in the room tracks users and captures the gesture made by the users;
b) The event detection layer 91 which detects a person, motion detection, or an action to be performed is used to upload this video to the server 92;
c) The computer vision layer 93 processes the video stream to get the appropriate gesture;
d) The visual analytics layer 94 processes the state of the device and the gestures to generate appropriate notifications 95 for the devices;
e) Once the devices 97a and 97b are notified, they get registered 98 and the information about the state of the device gets stored in the server in the form of a state table;
f) The position co-ordinate table 94 records the user's position;
g) The devices 97a through 97b are connected to a device state layer 99 to determine the state of the devices; and
h) The device state layer 99 is connected to the visual analytics layer 95;
12. A method for interaction with devices of Claim 10 wherein the position coordinate table 25 stored on a server 24 is configured by (a) A user 20 moving across a space 21 and indicating the position for configuration using a mobile device 22, said position being captured by a camera 23 present in the same space; and (b) Multiple users in a room moving across a space 21 and indicating the position for configuration using their mobile device 22, which is captured by a camera 23 present in the same space by unique codes associated with each user's mobile-device based camera Flash 34;
13. A method for interaction with devices of Claim 10 wherein the device state table is configured by (a) Marking a Region of Interest (ROI) 40 in which a device 41 is located, (b) Identifying the states the device can occupy including "ON", "OFF", "SLEEP" modes, (c) Providing each of the devices in the ROI 40 with a unique ID, called the device-ID, stored state table 42 maintained in the server 43, and (d) Setting a priority of the devices states manually;
14. A method for interaction with devices of Claim 10 wherein a priority is associated with the devices, said priority being set by the user and classified into groups including as audio, light, entertainment further comprising the steps of:
a) Enabling one or more contextual methods to assess the state of all the - devices and assigns priorities to them based upon user settings; and b) Enabling one or more contextual methods to provide the gesture information to the device having its state as the highest priority;
15. A method for interaction with devices of Claim 10 wherein contextual information about user-preferences and events are retained to enable continuous and seamless user-interaction with the devices in their environment;
16. A method for interaction with devices of Claim 10 wherein the user is tracked using more than one camera further comprising the steps of:
a) Multiple cameras identifying the intersection co-ordinates 50 of one or more cameras during the initialization wherein the initialization of one camera ends where the initialization of other camera begins; b) One camera initializing the grids present on the other camera either manually or automatically based on the user inputs; and c) One or more cameras having one or more common area where both the cameras can view the user;
17. A method for interaction with devices of Claim 10 wherein a user can control their device not within their proximity by logging into a server 60, further comprising the steps of:
a) The user checking if the device is registered or not registered:
i. If the device is already registered, the camera observes the user making a gesture in the room 62 and uploads a video to the server 64; and
ii. If the device is not registered, registering the device with server 61 and uploading the current state of the device 63; b) The server analyzing the video stream and analytics received from the camera to detect the gesture of the user 65; and
c) The application running on the fixed device performs the desired action 66; and
18. A method for interaction with devices of Claim 10 wherein the user's profile is shared with a third party 76 further comprising the steps of:
a) Identifying by means of the camera, the user, when they enters a room
70;
b) Identifying one or more actions of the user on their mobile device including (a) their using indoor GPS co-ordinates 71, (b) tapping the device against NFC 72, and (c) face recognition 73 and linked to the server 74; arid
c) Sharing the user's profile of their gesture 75 with a third party 76.
PCT/IN2015/000241 2014-06-12 2015-06-12 Method for interaction with devices WO2015189860A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN2854CH2014 2014-06-12
IN2854/CHE/2014 2014-06-12

Publications (2)

Publication Number Publication Date
WO2015189860A2 true WO2015189860A2 (en) 2015-12-17
WO2015189860A3 WO2015189860A3 (en) 2016-01-28

Family

ID=54834515

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2015/000241 WO2015189860A2 (en) 2014-06-12 2015-06-12 Method for interaction with devices

Country Status (1)

Country Link
WO (1) WO2015189860A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116704614A (en) * 2023-06-29 2023-09-05 北京百度网讯科技有限公司 Action recognition method, device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4304337B2 (en) * 2001-09-17 2009-07-29 独立行政法人産業技術総合研究所 Interface device
US6937742B2 (en) * 2001-09-28 2005-08-30 Bellsouth Intellectual Property Corporation Gesture activated home appliance

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116704614A (en) * 2023-06-29 2023-09-05 北京百度网讯科技有限公司 Action recognition method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2015189860A3 (en) 2016-01-28

Similar Documents

Publication Publication Date Title
US10572073B2 (en) Information processing device, information processing method, and program
US9979921B2 (en) Systems and methods for providing real-time composite video from multiple source devices
CN104956292B (en) The interaction of multiple perception sensing inputs
KR102062310B1 (en) Method and apparatus for prividing control service using head tracking in an electronic device
US9912970B1 (en) Systems and methods for providing real-time composite video from multiple source devices
US9746927B2 (en) User interface system and method of operation thereof
US9898090B2 (en) Apparatus, method and recording medium for controlling user interface using input image
CN111045511B (en) Gesture-based control method and terminal equipment
US20150009124A1 (en) Gesture based user interface
US20110261213A1 (en) Real time video process control using gestures
US9671873B2 (en) Device interaction with spatially aware gestures
EP3111300A2 (en) Controlling a computing-based device using gestures
CN102200830A (en) Non-contact control system and control method based on static gesture recognition
CN102541256A (en) Position aware gestures with visual feedback as input method
US10528145B1 (en) Systems and methods involving gesture based user interaction, user interface and/or other features
US11188145B2 (en) Gesture control systems
JP2016507810A (en) Using distance between objects in touchless gesture interface
CN110837766B (en) Gesture recognition method, gesture processing method and device
CN113497912A (en) Automatic framing through voice and video positioning
WO2015189860A2 (en) Method for interaction with devices
US9761009B2 (en) Motion tracking device control systems and methods
US9948894B2 (en) Virtual representation of a user portion
Lee et al. Mouse operation on monitor by interactive analysis of intuitive hand motions
WO2012007034A1 (en) Sending and receiving information
Devi et al. AI-Enhanced Cursor Navigator

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15807472

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15807472

Country of ref document: EP

Kind code of ref document: A2