US11199906B1 - Global user input management - Google Patents
Global user input management Download PDFInfo
- Publication number
- US11199906B1 US11199906B1 US14/018,331 US201314018331A US11199906B1 US 11199906 B1 US11199906 B1 US 11199906B1 US 201314018331 A US201314018331 A US 201314018331A US 11199906 B1 US11199906 B1 US 11199906B1
- Authority
- US
- United States
- Prior art keywords
- application
- user
- computing device
- input
- command
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/16—Constructional details or arrangements
- G06F1/1613—Constructional details or arrangements for portable computers
- G06F1/1626—Constructional details or arrangements for portable computers with a single-body enclosure integrating a flat display, e.g. Personal Digital Assistants [PDAs]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/0482—Interaction with lists of selectable items, e.g. menus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
- G06F3/04883—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
Definitions
- some personal electronic devices are capable of detecting touches and other touch-based gestures, such as by capacitive touch sensors incorporated in a touchscreen.
- the tap of a virtual key of a soft keyboard displayed on the touchscreen may correspond to entry of the key into a device.
- a swipe of the touchscreen may navigate a user to a different portion of a graphical user interface presented on the touchscreen.
- Other devices can detect device motion via inertial sensors, such as accelerometers, gyroscopes, magnetometers, and/or inclinometers, and perform actions based on the detected motion.
- a device can detect a rotation of the device of approximately ninety degrees, interpret such motion as an intent of the user to change the orientation of content being displayed on the device from portrait mode to landscape mode (or vice versa), and re-display the content according to the changed orientation of the device.
- a rotation of the device of approximately ninety degrees
- interpret such motion as an intent of the user to change the orientation of content being displayed on the device from portrait mode to landscape mode (or vice versa)
- re-display the content according to the changed orientation of the device As electronic devices become more powerful and capable of sensing more of the world around them, new approaches can be developed for users to interact with such devices.
- FIGS. 1A-1B illustrate an example approach of detecting and managing various user inputs in accordance with an embodiment
- FIG. 2 illustrates an example of a software architecture that can be used in accordance with an embodiment
- FIG. 3 illustrates an example system for detecting and managing various user inputs in accordance with an embodiment
- FIG. 4 illustrates an example approach for detecting and managing various user inputs in accordance with an embodiment
- FIG. 5 illustrates an example approach for configuring a system for detecting and managing various user inputs in accordance with an embodiment
- FIG. 6 illustrates an example process for detecting and managing various user inputs in accordance with an embodiment
- FIG. 7 illustrates an example of a computing device that can be used in accordance with various embodiments.
- FIG. 8 illustrates an example configuration of components of a computing device such as that illustrated in FIG. 7 .
- users may desire to interact concurrently with multiple applications in a multi-tasking environment.
- Conventional systems and approaches may support multi-tasking, wherein a device can provide for concurrent execution of multiple user applications.
- conventional devices and techniques may be limited to direct interaction with a single application at a time. For example, a user may be operating a first user application, such as a web browser or an email application, while a music player application is concurrently executing. At a particular point in time, the user may wish to replay a song or skip a song playing on the music player.
- the user may be required to halt interaction with the first user application, select the music player as the active or foreground application, direct the music player to replay the song or skip the song, and re-select the first user application to continue interacting with the first user application.
- the user may be interacting with a first user application while a second user application is concurrently running in the background.
- the user may change the orientation of a first graphical user interface corresponding to the first user application, such as by tilting the device to a new orientation.
- the user may then switch to operation of the second user application.
- a second graphical user interface corresponding to the second user application may not immediately reflect the new orientation of the device. Instead, the user may have to re-tilt the device and/or there may be a delay associated with re-determining the new orientation of the device and re-displaying the second graphical interface to comport with the new orientation.
- Systems and methods in accordance with various embodiments of the present disclosure may overcome one or more of the aforementioned and other deficiencies experienced in conventional approaches for managing user gestures and commands in a multi-tasking environment.
- various embodiments enable concurrent interaction with multiple applications in a multi-tasking environment via a global user input detection and management system.
- a device operating according to various embodiments can be configured to recognize an assortment of gestures and commands, such as touch-based gestures (e.g., taps, swipes, or other pointer gestures), auditory commands (e.g., voice commands, whistles, finger snaps), device motions and/or orientations (e.g., rotations or translations of the device, device gestures), visual gestures (e.g., hand gestures, facial movements, body movements), among others.
- gestures and commands such as touch-based gestures (e.g., taps, swipes, or other pointer gestures), auditory commands (e.g., voice commands, whistles, finger snaps), device motions and/or orientations (e.g., rotations or translations of the device, device gestures), visual gestures (e.g., hand gestures, facial movements, body movements), among others.
- User input recognition can be centralized instead of on an ad-hoc application-by-application basis. In this manner, gestures and commands may be better managed. For
- a type of user input is a category of commands or gestures supported by an application, such as audio commands, touch gestures, device gestures, or visual gestures.
- a type of input can correspond to one more sensors or input devices. For example, audio or voice commands may be associated with a microphone, touch gestures may be associated with one or more touch sensors, device gestures may be associated with accelerometers, gyroscopes, magnetometers, and visual gestures may be associated with one more cameras or other optical input devices. It will be appreciated that certain types of user inputs may correspond to sensors or other input devices that are also associated with other types of use inputs.
- voice commands may be based on audio data captured by a microphone and image data of a user's lip movement captured by one or more cameras, which can be used to enhance voice recognition.
- Other sensors and input devices whose data can be influenced by a user or whose data can provide additional context for command/gesture recognition can also be used in various embodiments, such as thermal sensors (e.g., the user placing a device closer or further away from the user's body), location determination components (e.g., GPS, cellular network system, radio frequency (RF) antenna, NFC antenna, Bluetooth®, altimeter), ambient light sensors (e.g., influencing cameras and optical sensors), among others.
- thermal sensors e.g., the user placing a device closer or further away from the user's body
- location determination components e.g., GPS, cellular network system, radio frequency (RF) antenna, NFC antenna, Bluetooth®, altimeter
- RF radio frequency
- a computing device can be configured to intelligently distribute user input received to the device to an appropriate application.
- the device may process a set of rules for propagating user input and select at least one of the user applications for receiving the recognized gesture or command based on the state of each user application and the propagation rules.
- a user may be concurrently operating multiple applications on a computing device, with a first user application running in the foreground and a second user application running in the background. The user may change the orientation of content being displayed by the first user application by tilting the device. The new orientation of the device can be propagated to each user application configured to receive and recognize such user input.
- determination of the orientation of the device can occur once and be distributed to interested applications. This may reduce processing by the computing device and increase battery life. Further, there may be less latency associated with the change in the orientation of the second graphical user interface such that the device may be more responsive than conventional systems and techniques.
- a user may be operating multiple applications in multiple windows, such as a video game in one window and an email application in a second window.
- the game may be a first-person perspective game wherein navigation is based on device motion (e.g., tilting the device forward, backward, right, or left causes the video game character to move forward, backward, right, or left, respectively).
- the email application may also include a motion-based interface.
- the user may interact with the email application by performing certain gestures with the device (e.g., tilting the device forward may cause an email to be opened, tilting the device to the right may result in selection of a next email, and tilting the device to the left may result in selection of a previous email).
- a tilt of the device may be passed to the video game for consumption by the video game because propagation rules may prioritize the video game for receiving such user input.
- the video game may be paused, however, and the device motion may be distributed to the email application instead.
- An electronic device that implements a global approach for handling user input may also improve device power usage by exercising greater control over activation and deactivation of cameras, sensors, and other input devices.
- a user application may request that certain types of user input or input modalities be available in specific instances.
- an application may indicate that certain types of user input or input modalities must be available when the application is running (e.g., the user has launched the application and the application is running but could be running in the background), when the application is visible on the screen, or when the application has focus (e.g., the application is displayed on the screen and has priority over other applications for receiving input).
- the device could maintain state information for each executing user application and activate/deactivate sensors and other input devices based on the execution state of an application (e.g., the application is running, displayed, or focused). It will be appreciated that in at least some embodiments, multiple applications can be running and displayed simultaneously.
- a user application may have focus but may not necessarily be displayed at the top-most layer of a graphical user interface. For example, a first user application may retain focus even when a pop-up window overlays the first user application.
- whether a particular application has focus may also depend on input modality. For instance, a first user application may have focus with respect to visual gestures and a second user application may have focus with respect to entry via a keyboard.
- a user application may have an interface that is based on visual gestures.
- the device may keep a camera turned on and continuously sample image data while the application is executing to monitor for a visual gesture from a user. This may quickly drain the battery of the device, especially if multiple applications are concurrently executing.
- a global user input management system could utilizes a different approach that uses power more efficiently, such as sampling images at a lower resolution, sampling over longer periods of time until an initial user motion is detected, sampling only portions of images, among other techniques.
- the device could monitor a remaining amount of battery life and implement a more power-efficient approach for recognizing user input when the battery life is low.
- FIGS. 1A-1B illustrate an example approach for detecting and managing various user inputs in accordance with an embodiment.
- a user 102 can be seen viewing a display screen 108 of a computing device 104 .
- a portable computing device e.g., a smart phone, tablet, or portable media player
- the display screen 108 is a touchscreen comprising a plurality of capacitive touch sensors and capable of detecting the user's fingertip touching points of the screen as input for the device.
- the display element may implement a different touch technology (e.g., resistive, optical, ultrasonic).
- the computing device includes at least one camera 106 located on the front of the device and the on same surface as the display screen to capture image data of subject matter facing the front of the device, such as the user 102 viewing the display screen.
- the components of the example device are shown to be on a “front” of the device, there can be similar or alterative components on the “top,” “side,” or “back” of the device as well (or instead). Further, directions such as “top,” “side,” and “back” are used for purposes of explanation and are not intended to require specific orientations unless otherwise stated.
- a computing device may also include more than one camera on the front of the device and/or one or more cameras on the back (and/or sides) of the device capable of capturing image data facing the back surface (and/or top, bottom, or side surface) of the computing device.
- the camera 106 comprises a digital camera incorporating a CMOS image sensor.
- a camera of a device can incorporate other types of image sensors (such as a charged couple device (CCD)) and/or can incorporate multiple cameras, including at least one wide-angle optical element, such as a fish eye lens, that enables the camera to capture images over a wide range of angles, such as 180 degrees or more.
- CCD charged couple device
- each camera can comprise a digital still camera, configured to capture subsequent frames in rapid succession, or a video camera able to capture streaming video.
- a computing device can include other types of imaging elements, such as ambient light sensors, IR sensors, and other optical, light, imaging, or photon sensors.
- the computing device also includes one or more motion or orientation determination elements, such as accelerometers, gyroscopes, magnetometers, inclinometers, proximity sensors, distance sensors, depth sensors, range finders, ultrasonic transceivers, among others.
- motion or orientation can be determined using image analysis techniques.
- a combination of approaches such as one or more techniques based on inertial sensors and one or more image analysis techniques can be aggregated or fused to estimate motion of the device.
- the computing device 100 also includes one or more microphones 110 or other audio capture components capable of capturing audio data, such as words spoken by the user 102 of the device.
- the microphone 110 is placed on the same side of the device 100 as the display screen 108 , such that the microphone 110 will typically be better able to capture words spoken by a user of the device.
- the microphone can be a directional microphone that captures sound information from substantially directly in front of the device, and picks up only a limited amount of sound from other directions, which can help to better capture words spoken by a primary user of the device.
- a computing device may include multiple microphones to capture 3D audio.
- a computing device can also include an audio output element, such as internal speakers or one or more ports to support peripheral audio output components, such as headphones or loudspeakers.
- FIG. 1B illustrates an example 120 of the contents displayed on touchscreen 108 of computing device 104 .
- a home screen 122 with application icons 124 can be seen overlaid by email application 126 and music player 128 .
- home screen application 122 , email application 126 , and music player 128 each include a respective touch-based interface enabling a user to interact with each application by tapping interface elements or performing other touch gestures.
- Conventional pointer-based user interfaces such as those enabling control via a user's finger, a stylus, a mouse, a pointing stick, a track pad, among others, can be utilized for a multi-tasking platform, but user interaction may be limited to a certain extent.
- physical pointers e.g., user's finger, stylus
- virtual pointers e.g., mouse, pointing stick, track pad
- a tap of a physical pointer or a click by a virtual pointer located at a particular region within a conventional pointer-based user interface may only enable the user to control one of the home screen application, email application, or music player corresponding to the region with which the user interacted.
- Electronic devices are incorporating new types of sensors and other input mechanisms that enable user interactions that are not limited to the windows, icons, menus, pointer paradigm.
- the user 102 may desire to interact with any one of user applications 122 , 126 , and 128 without necessarily having to first select one of the applications as the active application or the foreground application.
- Approaches in accordance various embodiments enable concurrent interaction with multiple applications in a multi-tasking environment.
- a user may wish to interact with any one of applications 122 , 126 , and 128 by voice command, such as “Start up App A” for the home screen application, “Create a new email message” for the email application, or “Play the next song” for the music player.
- home screen application 122 may be configured to recognize the gaze of the user with respect to the device as input, such as for rendering the content of the home screen according to the user's gaze, and music player 128 may support hand or finger gestures.
- Shaking a thumb in front of the camera 106 in a leftward direction can cause the selection of a previous track of an album being played by the music player, shaking the thumb in a rightward direction can cause selection of the next track, shaking the thumb upward may cause the current track to be played, shaking the thumb downward may cause the music player to stop playing the current track, and shaking an open palm toward the front of the camera may cause the music player to pause the current track.
- the device may be capable of concurrently recognizing head tracking gestures and hand gestures to enable the user to cause the contents of the home screen to be rendered according to a new direction of his gaze and perform thumb gestures to control music playback at substantially the same time.
- the device can recognize a particular type of user input (e.g., one of facial movement or hand/finger gesture) and forward the user input to the appropriate user application for receiving the recognized user input.
- a particular type of user input e.g., one of facial movement or hand/finger gesture
- User input distribution may be based on propagation rules and/or a respective state of each user application, as discussed elsewhere herein.
- head or facial movements can be recognized as user input.
- Approaches for recognizing facial expressions or movements as input for a computing device are discussed in co-pending U.S. patent application Ser. No. 12/332,049, filed Dec. 8, 2010, entitled, “Movement Recognition as Input Mechanism,” which is incorporated by reference herein.
- other facial features such as a user's eyes, mouth, nose, or other facial features, can be analyzed over a set of images to determine whether changes in the user's facial features correspond to user input.
- eye winks, patterns of eye winks, or other ocular motions can be recognized by a computing device to perform various actions.
- Approaches for detecting a user's eye movements as input for a computing device are discussed in co-pending U.S. patent application Ser. No. 13/791,265, filed Mar. 7, 2013, entitled, “User Eye Input to Display Content,” which is incorporated by reference herein.
- some embodiments can detect other bodily movements, such as motion of the arms, legs, and/or other parts of a user, as input for a computing device.
- Approaches for detecting bodily movements as user input for a computing device are discussed in co-pending U.S. patent application Ser. No. 13/914,306, filed Jun. 10, 2013, entitled, “Dynamic User Detection and Tracking,” which is incorporated by reference herein.
- a device may include one or more microphones for capturing audio data.
- the device may be capable of analyzing the received audio data to recognize auditory commands, such as voice commands, whistles, hand claps, finger snaps, among others.
- auditory commands such as voice commands, whistles, hand claps, finger snaps, among others.
- Approaches for recognizing auditory commands as user input are discussed in allowed U.S. patent application Ser. No. 12/879,981, filed Sep. 10, 2010, entitled, “Speech-Inclusive Device Interfaces,” which is incorporated by reference herein.
- voice command recognition may be enhanced based on image analysis techniques performed on image data captured of the user's mouth or other user motion (e.g., nodding or shaking of the user's head).
- image analysis techniques performed on image data captured of the user's mouth or other user motion (e.g., nodding or shaking of the user's head).
- Such approaches are discussed in co-pending U.S. patent application Ser. No. 13/626,5
- motion of a computing device can be recognized as user input.
- motion of the device can be detected using one or more inertial sensors, such as accelerometers, gyroscopes, and/or magnetometers.
- motion of the device can be estimated based on analyzing one or more objects captured over a sequence of images using image analysis techniques such as block-matching, optical flow, phase correlation, feature-based methods, among others.
- image analysis techniques such as block-matching, optical flow, phase correlation, feature-based methods, among others.
- data from cameras, inertial sensors, and other input devices can be combined using sensor fusion techniques to estimate motion of the device.
- FIG. 2 illustrates an example of software architecture 200 for a personal computing device that can be used in accordance an embodiment.
- Software architecture 200 may be based on the open-source Android® platform, but it will be appreciated that other platforms can be utilized in various embodiments, such as iOS®, Windows Phone®, Blackberry®, webOS®, among others.
- the kernel 210 At the bottom of the software stack 200 resides the kernel 210 , which provides a level of abstraction between the hardware of the device and the upper layers of the software stack.
- the kernel 210 may be based on the open-source Linux® kernel.
- the kernel 210 may be responsible for providing low level system services such as the driver model, memory management, process management, power management, networking, security, support for shared libraries, logging, among others.
- the next layer in the software stack 200 is the system libraries layer 230 which can provide support for functionality such as windowing (e.g., Surface Manager), 2D and 3D graphics rendering, Secure Sockets Layer (SSL) communication, SQL database management, audio and video playback, font rendering, webpage rendering, System C libraries, among others.
- windowing e.g., Surface Manager
- 2D and 3D graphics rendering e.g., 2D and 3D graphics rendering
- SSL Secure Sockets Layer
- system source libraries layer 230 can comprise open source libraries such as Skia Graphics Library (SGL) (e.g., 2D graphics rendering), Open Graphics Library (OpenGL) or OpenGL for Embedded Systems (OpenGL ES) (e.g., 3D graphics rendering), Open SSL (e.g., SSL communication), SQLite (e.g., SQL database management), Free Type (e.g., font rendering), WebKit (e.g., webpage rendering), and libc (e.g., System C libraries).
- the system libraries layer 230 can also include a hardware abstraction layer 220 comprising of a set of interfaces that hardware drivers are required to implement. Each hardware interface may loaded by the system at runtime on an as needed basis.
- the hardware abstraction layer 220 can provide interfaces for hardware components of a computing device, such as the graphics card, audio card, cameras, GPS, radio frequency (RF) modem, WiFi antenna, among others.
- RF radio frequency
- the runtime layer 240 Located on the same level as the system libraries layer is the runtime layer 240 , which can include core libraries and the virtual machine engine.
- the virtual machine engine may be based on Dalvik®.
- the virtual machine engine provides a multi-tasking execution environment that allows for multiple processes to execute concurrently.
- Each application running on the device is executed as an instance of a Dalvik® virtual machine.
- application code is translated from Java® class files (.class, .jar) to Dalvik® bytecode (.dex).
- the core libraries provide for interoperability between Java® and the Dalvik® virtual machine, and expose the core APIs for Java®, including data structures, utilities, file access, network access, graphics, among others.
- the application framework 250 comprises a set of services through which user applications interact. These services manage the basic functions of a computing device, such as resource management, voice call management, data sharing, among others.
- the Activity Manager controls the activity life cycle of user applications.
- the Package Manager enables user applications to determine information about other user applications currently installed on a device.
- the Window Manager is responsible for organizing contents of a display screen.
- the Resource Manager provides access to various types of resources utilized by user application, such as strings and user interface layouts. Content Providers allow user applications to publish and share data with other user applications.
- the View System is an extensible set of views used to create user interfaces for user applications.
- the Notification Manager allows for user applications to display alerts and notifications to end users.
- the Telephony Manager manages voice calls.
- the Location Manager provides for location management, such as by GPS or cellular network.
- Other hardware managers in the application framework 250 include the Bluetooth Manager, WiFi Manager, USB Manager, Sensor Manager, among others (not shown here).
- Located at the top of the software stack 200 are user applications, such as the home screen application, email application, music player, web browser, among others.
- FIG. 3 illustrates an example of a system for detecting and managing various user inputs in an environment.
- the software stack 300 may comprise at least some similar elements to software architecture 200 of FIG. 2 , including kernel 310 , core libraries 320 including a hardware abstraction layer, application framework 350 , and user application layer 360 .
- software architecture 200 of FIG. 2 is used for purposes of explanation, different software stacks may be used, as appropriate, to implement various embodiments.
- a global user input management system can be implemented as a system service in the application framework layer 350 . Centralizing user input detection and recognition can have certain advantages over conventional approaches that perform user input detection and recognition on an ad-hoc application-by-application basis.
- Code for implementing user input detection and recognition can be shared, which may result in less processing by a computing device. Latency can be improved because there may be less competition for sensors and other hardware input components. Further, such an approach can facilitate concurrent interaction with multiple applications in a multi-tasking environment.
- User applications such as a home screen application, email application, music player, browser, among others, can interface with the User Input Manager service 352 , including registering/unregistering the input modalities supported by each user application, defining the rules by which each user application receives gestures or commands, and providing information about the state of each application.
- the User Input Manager 352 may interact with other components 354 within the application framework 350 , such as to determine state information for applications currently executing on a device. These other components 354 may include the Activity Manager, Package Manager, Window Manager, Resource Manager, View System, Notification Manager, Telephony Manager, Location Manager, among others.
- the global user input management system can include an extensible set of recognizers for the various types of inputs or modalities supported by a computing device, such as an Audio Command Recognizer, Visual Gesture Recognizer, and Device Motion Recognizer.
- the system can be extended to include new types of recognizers for other sensors and input devices of a computing device. Further, each of the recognizers can be extended in various embodiments.
- the system includes a Voice Command Recognizer which extends from the Audio Command Recognizer and a Head Gesture Recognizer and a Hand Gesture Recognizer which each extend from the Visual Gesture Recognizer.
- the recognizers interface with components of the hardware abstraction layer to detect and recognize user input.
- recognizers can fuse data from multiple sensors to more accurately detect and recognize user gestures and commands.
- the Voice Command Recognizer may enhance voice recognition by analyzing image data corresponding to a user's lip movement. Therefore, in addition to analyzing audio data captured by audio components, the Voice Command Recognizer may also analyze image data captured by a camera of a computing device.
- recognizers may also pre-process raw user input such as by translating speech to text or sampling a gesture spatially and rendering the gesture as a two-dimensional image.
- a gesture may correspond to touches, a finger waving in the air, or motion of a device.
- the gesturing object i.e., fingertip on a touchscreen, finger in the air, or device, can be pointillized and sampled in space such that the gesture forms a shape that can be represented as the 2-D image.
- the recognizers may utilize a “library” or “dictionary” that maps data corresponding to user input, whether raw or pre-processed, to a higher level command.
- a media playing application may incorporate a visual gesture interface wherein particular gestures may be mapped to higher level commands such as skipping to a previous track or stopping play of a current track.
- FIG. 4 illustrates an example approach 400 for detecting and managing various user inputs in accordance with an embodiment.
- a multi-window multi-tasking environment can be seen.
- email application 410 and music player 430 can be seen overlaying a home screen application.
- a user has interacted with user interface element 420 of the email application to cause display of an input modality interface 412 indicating the types of inputs or modalities supported by the email application, touch gestures as represented by touch icon 414 , voice commands as represented by voice icon 416 , and device motion as represented by motion icon 418 .
- touch icon 414 and motion icon 418 are underlined to indicate that the email application has registered with a global input management service for these types of user input while voice icon 416 is not underlined to indicate that the email application has not been registered with the global input management service for voice commands.
- whether a user application registers a particular input modality supported by the application can be based on the state of the application and other executing applications, propagation rules, user preferences, or some combination thereof.
- the user application can issue a propagation rule that declares that a particular input modality should be supported when the application has focus and/or that the input modality can be deactivated when the application does not have focus.
- music player 430 similarly exposing an input modality interface indicating the types of user input supported by the music player.
- the input modalities capable of being recognized by the music player include touch gestures as represented by touch icon 432 , voice commands as indicated by voice icon 434 , device motions as indicated by motion icon 436 , and visual gestures as indicated by visual icon 438 .
- the music player has registered with the global user input management service to receive touch gestures, voice commands, and visual gestures but not device motions.
- user applications can be capable of supporting other input modalities in various embodiments. For instance, in other embodiments, gestures and commands supported by user applications can be broader.
- user applications are not necessarily limited to voice commands and may be capable of responding to auditory commands generally, such as whistles, hand claps, tongue clicks, among others.
- Input modalities supported by user applications may also be more granular in other embodiments.
- visual gestures may be further categorized according to specific user features, such as the user's head, face, eyes, mouth, hand, finger(s), arms, legs, among others.
- Provision of an input modality interface can be advantageous for users.
- a user may select or unselect certain modes of input for each user application to customize how she interacts with the device. For example, a user may have elected for voice commands to bypass email application 410 and/or selected voice commands to be received by music player 430 in order to concurrently interact with both applications. The user could maximize the graphical user interface corresponding to the email application on the touchscreen yet continue to interact with the music player via voice command. In addition, these user settings can be automatically saved for future use.
- FIG. 5 illustrates an example approach 500 for configuring a system for detecting and managing various user inputs in accordance with an embodiment.
- a user application 510 enabling a user to modify input modalities is depicted.
- User interface element 512 is provided to enable the user to modify other input modalities by swiping to a new page or screen of application 510 .
- the user applications listed in the first screen of application 510 are dynamically generated based on the user applications currently executing on the device.
- every user application can be listed to provide the user more control over how she may interact with each application.
- user interface elements 514 and 516 indicate that voice commands have been disabled respectively for a home screen application and an email application. Voice commands are enabled for the music player and an example of a propagation rule 518 is provided as another selection for the user.
- propagation rules can be used by a global user input management system to determine how to distribute user inputs that have been received and recognized by the system.
- Propagation rules can be defined by the device platform, user applications, or the user in various embodiments.
- An example of a propagation rule is to broadcast a type of user input to any executing application that has registered for that type of input.
- a propagation rule can forward a user input to the last active user application supporting the type of the user input.
- Some rules, such as rule 518 may require certain content to be included in the user input or a certain format for the user input in order to be propagated to a user application.
- Content can include keywords, image data, gestures, a change in sensor data meeting certain thresholds, among others.
- a keyword could be a name of the application or a voice command that pertains to the application.
- a user application that is only interested in facial movement may require that the image data includes at least one instance of a person's face.
- certain gestures can act as a cue or indicator that the user intends for input to be directed to a specific application.
- a specified format for a propagation rule can be defined using a template, such as a phrase pattern for a voice command or a gesture pattern for a touch gesture or visual gesture.
- Propagation rules can also be based on threshold lengths of time (minimum and/or maximum). Certain propagation rules can depend on the state of an executing application, such as bypassing a user application when the application is in a paused or suspended state.
- propagation rules may be based on the detected command or gesture being within threshold confidence levels. Propagation rules can also be based on a priority of each executing application as determined by a category of the application (e.g., business, finance, games), a time the user last directly interacted with the application, the percentage of a display screen corresponding to the application, the frequency of usage of the application, among others.
- a propagation rule may dictate that a certain command or gesture or a type of command or gesture is “monolithic” and is to be propagated to every executing application.
- FIG. 6 illustrates an example process 600 for detecting and managing various user gesture or commands in accordance with an embodiment. It should be understood that, for any process discussed herein, there can be additional, fewer, or alternative steps performed in similar or alternative orders, or in parallel, within the scope of the various embodiments unless otherwise stated.
- the process begins with concurrent execution of at least a first user application and a second user application 602 on a computing device.
- the user applications may each include their own respective graphical user interfaces, which can be displayed simultaneously on a screen of the computing device.
- one user application may be operating in the foreground, and another user application may be concurrently executing in the background.
- the device may determine one or more input modalities or types of user input supported by the application 604 .
- an application may accept auditory commands (e.g., voice commands, whistles, hand claps, finger snaps, or other sounds); device motions (e.g., rotations, translations, and other device gestures); and/or visual gestures (e.g., facial expressions or movements, hand or finger gestures, other user feature gestures).
- auditory commands e.g., voice commands, whistles, hand claps, finger snaps, or other sounds
- device motions e.g., rotations, translations, and other device gestures
- visual gestures e.g., facial expressions or movements, hand or finger gestures, other user feature gestures.
- the application may register the input modalities or types of user input supported by the application.
- the system may activate the appropriate software and hardware for detecting the user input corresponding to the modalities supported by the user application 606 .
- a microphone can be activated
- certain input modalities may only be available when an application has focus or is directly being interacted with by the user.
- two user applications may be concurrently executing on a device and a first application supports a touch interface and the second application does not support a touch interface.
- touch-related software and/or hardware may be activated to monitor touch interactions.
- the second application has focus (or the first application is sent to the background), the touch software and/or hardware may be deactivated.
- a user application can declare, via a propagation rule, whether a certain input modality should be available when the application has focus, such as via touch, and/or whether an input modality should always be available even when the application is running in the background, such as via audio command or visual gesture.
- the global user input management system can monitor those conditions and deactivate software and/or hardware when those conditions are not met.
- the device may monitor for user input corresponding to the modalities supported by each executing user application (and when certain conditions are met) by capturing input data using a sensor or other input device corresponding to the supported modalities 608 .
- the input data must be capable of being responded to meaningfully by the user application.
- a user application that does not recognize voice commands can hypothetically have voice data forwarded to the application.
- Such a user application may simply discard the voice data as it would be unintelligible by the user application.
- Such a response however is not a meaningful response as used herein.
- two user applications may be capable of recognizing touch gestures as a general matter. However, a touch outside of a window corresponding to a user application in a multi-window environment or a touch while a user application is in the background would not be meaningfully responded to by that user application.
- user applications may be multi-modal and one of the types of input supported by such applications may be de-selected.
- a user may be operating a word processor and a music player concurrently.
- the word processor and the music player may each include a touch-based interface as well as support voice commands.
- the user may wish to operate the word processor using the touch-based interface of the word processor and the music player using the voice-based interface of the music player.
- the user may configure the word processor to bypass voice commands.
- the user may interact with the word processor via the touch-based interface without having to switch between the graphical user interface of the word processor and the graphical user interface of the music player.
- the user can maximize the graphical user interface of the word processor while still being able to control the music player via voice command.
- the settings of the types of input corresponding to the types of user input supported by a user application can be configured by the user, and determination of the state of the user application can include identification of such settings.
- the device may determine at least one of the user applications for receiving data corresponding to the user input 610 .
- user input data can be pre-processed by the device and forwarded to a suitable user application.
- audio data captured by a microphone of a device can be pre-processed by converting the audio data from an analog format to a digital format, converting digital voice data and/or mapping a voice command encapsulated in the audio data to a higher level command to the device.
- visual gestures can be pre-processed by pointillizing an object to be tracked for gesture recognition, sampling the tracked point/object in space, converting the sampled data to a 2-D image, and mapping the image to a higher-level command from a gesture dictionary or library.
- pre-processing can include classifying or identifying the user input and correlating the user input to a higher level command.
- the raw sensor data e.g., voice data, image data, motion data
- an intermediate form of the user input can be forwarded to user applications, such as text corresponding to voice data or motion data corresponding to visual gestures.
- determination of the user application for receiving data corresponding to the user input can be based at least in part on a set of propagation rules.
- one propagation rule may be based on ranking or prioritizing each executing user application for receiving user input. The ranking or sorting of user applications according may be based on a category of each user application, the last time the user directly interacted with each user application, the frequency of usage of each application, or the percentage of a display screen taken up by each application, among others.
- Another propagation rule may be based on the content of the user input, such as the user input including a cue or indicator or conforming to a specified format. Propagation rules can also direct the user input to be broadcast to multiple user applications.
- the device can propagate the data to the selected user application(s) 612 and the user application may perform an action in response to receiving the data corresponding to the user input.
- FIG. 7 illustrates an example computing device 700 that can be used to perform approaches described in accordance with various embodiments.
- the computing device includes a camera 706 located at the top of a front face of the device and on the same surface as the display element 708 , and enabling the device to capture images in accordance with various embodiments, such as images of a user viewing the display element and/or operating the device.
- the computing device includes audio input element 710 , such as a microphone, to receive audio input from a user.
- the computing device also includes an inertial measurement unit (IMU) 712 , comprising a three-axis gyroscope, three-axis accelerometer, and magnetometer, that can be used to detect the motion of the device, from which position and/or orientation information can be derived.
- IMU inertial measurement unit
- FIG. 8 illustrates a logical arrangement of a set of general components of an example computing device 800 such as the device 700 described with respect to FIG. 7 .
- the computing device includes a processor 802 for executing instructions that can be stored in a memory element 804 .
- the computing device can include many types of memory, data storage, or non-transitory computer-readable storage media, such as a first data storage for program instructions for execution by the processor 802 , a separate storage for images or data, a removable memory for sharing information with other computing devices, etc.
- the computing device typically will include some type of display element 808 , such as a touchscreen, electronic ink (e-ink), organic light emitting diode (OLED), liquid crystal display (LCD), etc., although computing devices such as portable media players might convey information via other means, such as through audio speakers.
- the display screen provides for touch or swipe-based input using, for example, capacitive or resistive touch technology.
- the computing device in many embodiments will include one or more cameras or image sensors 806 for capturing image or video content.
- a camera can include, or be based at least in part upon any appropriate technology, such as a CCD or CMOS image sensor having a sufficient resolution, focal range, viewable area, to capture an image of the user when the user is operating the device.
- An image sensor can include a camera or infrared sensor that is able to image projected images or other objects in the vicinity of the computing device.
- Methods for capturing images or video using a camera with a computing device are well known in the art and will not be discussed herein in detail. It should be understood that image capture can be performed using a single image, multiple images, periodic imaging, continuous image capturing, image streaming, etc.
- a computing device can include the ability to start and/or stop image capture, such as when receiving a command from a user, application, or other computing device.
- the example computing device can similarly include at least one audio component, such as a mono or stereo microphone or microphone array, operable to capture audio information from at least one primary direction.
- a microphone can be a uni- or omni-directional microphone as known for such components.
- the computing device 800 includes at least one capacitive component or other proximity sensor, which can be part of, or separate from, the display assembly.
- the proximity sensor can take the form of a capacitive touch sensor capable of detecting the proximity of a finger or other such object as discussed herein.
- the computing device also includes various power components 814 known in the art for providing power to a computing device, which can include capacitive charging elements for use with a power pad or similar component.
- the computing device can include one or more communication elements or networking sub-systems 816 , such as a Wi-Fi, Bluetooth, RF, wired, or wireless communication system.
- the computing device in many embodiments can communicate with a network, such as the Internet, and may be able to communicate with other computing devices.
- the computing device can include at least one additional input component 818 able to receive conventional input from a user.
- This conventional input component can include, for example, a push button, touch pad, touchscreen, wheel, joystick, keyboard, mouse, keypad, or any other such component or element whereby a user can input a command to the computing device.
- a computing device might not include any buttons at all, and might be controlled only through a combination of visual and audio commands, such that a user can control the device without having to be in contact with the computing device.
- the computing device 800 also can include one or more orientation and/or motion determination sensors 812 .
- Such sensor(s) can include an accelerometer or gyroscope operable to detect an orientation and/or change in orientation, or an electronic or digital compass, which can indicate a direction in which the device is determined to be facing.
- the mechanism(s) also (or alternatively) can include or comprise a global positioning system (GPS) or similar positioning element operable to determine relative coordinates for a position of the computing device, as well as information about relatively large movements of the computing device.
- GPS global positioning system
- the computing device can include other elements as well, such as may enable location determinations through triangulation or another such approach. These mechanisms can communicate with the processor 802 , whereby the computing device can perform any of a number of actions described or suggested herein.
- the computing device 800 can include the ability to activate and/or deactivate detection and/or command modes, such as when receiving a command from a user or an application, or retrying to determine an audio input or video input, etc.
- a computing device might not attempt to detect or communicate with other computing devices when there is not a user in the room. If a proximity sensor of the computing device, such as an IR sensor, detects a user entering the room, for instance, the computing device can activate a detection or control mode such that the device can be ready when needed by the user, but conserve power and resources when a user is not nearby.
- the computing device 800 may include a light-detecting element that is able to determine whether the computing device is exposed to ambient light or is in relative or complete darkness.
- a light-detecting element can be beneficial in a number of ways.
- the light-detecting element can be used to determine when a user is holding the device up to the user's face (causing the light-detecting element to be substantially shielded from the ambient light), which can trigger an action such as the display element to temporarily shut off (since the user cannot see the display element while holding the device to the user's ear).
- the light-detecting element could be used in conjunction with information from other elements to adjust the functionality of the computing device.
- the computing device might determine that it has likely been set down by the user and might turn off the display element and disable certain functionality. If the computing device is unable to detect a user's view location, a user is not holding the computing device and the computing device is further not exposed to ambient light, the computing device might determine that the computing device has been placed in a bag or other compartment that is likely inaccessible to the user and thus might turn off or disable additional features that might otherwise have been available.
- a user must either be looking at the computing device, holding the computing device or have the computing device out in the light in order to activate certain functionality of the computing device.
- the computing device may include a display element that can operate in different modes, such as reflective (for bright situations) and emissive (for dark situations). Based on the detected light, the computing device may change modes.
- the computing device 800 can disable features for reasons substantially unrelated to power savings.
- the computing device can use voice recognition to determine people near the computing device, such as children, and can disable or enable features, such as Internet access or parental controls, based thereon.
- the computing device can analyze recorded noise to attempt to determine an environment, such as whether the computing device is in a car or on a plane, and that determination can help to decide which features to enable/disable or which actions are taken based upon other inputs. If speech or voice recognition is used, words can be used as input, either directly spoken to the computing device or indirectly as picked up through conversation.
- the computing device determines that it is in a car, facing the user and detects a word such as “hungry” or “eat,” then the computing device might turn on the display element and display information for nearby restaurants, etc.
- a user can have the option of turning off voice recording and conversation monitoring for privacy and other such purposes.
- the actions taken by the computing device relate to deactivating certain functionality for purposes of reducing power consumption. It should be understood, however, that actions can correspond to other functions that can adjust similar and other potential issues with use of the computing device. For example, certain functions, such as requesting Web page content, searching for content on a hard drive and opening various applications, can take a certain amount of time to complete. For computing devices with limited resources, or that have heavy usage, a number of such operations occurring at the same time can cause the computing device to slow down or even lock up, which can lead to inefficiencies, degrade the user experience and potentially use more power. In order to address at least some of these and other such issues, approaches in accordance with various embodiments can also utilize information such as user gaze direction to activate resources that are likely to be used in order to spread out the need for processing capacity, memory space and other such resources.
- the computing device can have sufficient processing capability, and the camera and associated image analysis algorithm(s) may be sensitive enough to distinguish between the motion of the computing device, motion of a user's head, motion of the user's eyes and other such motions, based on the captured images alone.
- the camera and associated image analysis algorithm(s) may be sensitive enough to distinguish between the motion of the computing device, motion of a user's head, motion of the user's eyes and other such motions, based on the captured images alone.
- the one or more orientation and/or motion sensors may comprise a single- or multi-axis accelerometer that is able to detect factors such as three-dimensional position of the device and the magnitude and direction of movement of the device, as well as vibration, shock, etc.
- the computing device can use the background in the images to determine movement. For example, if a user holds the computing device at a fixed orientation (e.g. distance, angle, etc.) to the user and the user changes orientation to the surrounding environment, analyzing an image of the user alone will not result in detecting a change in an orientation of the computing device. Rather, in some embodiments, the computing device can still detect movement of the device by recognizing the changes in the background imagery behind the user.
- a fixed orientation e.g. distance, angle, etc.
- the computing device can determine that the computing device has changed orientation, even though the orientation of the computing device with respect to the user has not changed.
- the computing device may detect that the user has moved with respect to the device and adjust accordingly. For example, if the user tilts their head to the left or right with respect to the computing device, the content rendered on the display element may likewise tilt to keep the content in orientation with the user.
- the various embodiments can be further implemented in a wide variety of operating environments, which in some cases can include one or more user computers or computing devices which can be used to operate any of a number of applications.
- User or client devices can include any of a number of general purpose personal computers, such as desktop or laptop computers running a standard operating system, as well as cellular, wireless and handheld devices running mobile software and capable of supporting a number of networking and messaging protocols.
- Such a system can also include a number of workstations running any of a variety of commercially-available operating systems and other known applications for purposes such as development and database management.
- These computing devices can also include other electronic devices, such as dummy terminals, thin-clients, gaming systems and other devices capable of communicating via a network.
- the operating environments can include a variety of data stores and other memory and storage media as discussed above. These can reside in a variety of locations, such as on a storage medium local to (and/or resident in) one or more of the computers or remote from any or all of the computers across the network. In a particular set of embodiments, the information may reside in a storage-area network (SAN) familiar to those skilled in the art. Similarly, any necessary files for performing the functions attributed to the computers, servers or other network component may be stored locally and/or remotely, as appropriate.
- SAN storage-area network
- each such device can include hardware elements that may be electrically coupled via a bus, the elements including, for example, at least one central processing unit (CPU), at least one input element (e.g., a mouse, keyboard, controller, touch-sensitive display element or keypad) and at least one output element (e.g., a display screen, printer, or speaker).
- CPU central processing unit
- input element e.g., a mouse, keyboard, controller, touch-sensitive display element or keypad
- at least one output element e.g., a display screen, printer, or speaker
- Such a system may also include one or more storage components, such as disk drives, optical storage components and solid-state storage systems such as random access memory (RAM) or read-only memory (ROM), as well as removable media components, memory cards, flash cards, etc.
- ROM read-only memory
- Such computing devices can also include a computer-readable storage media reader, a communications component (e.g., a modem, a network card (wireless or wired), an infrared communication element), and working memory, as described above.
- the computer-readable storage media reader can be connected with, or configured to receive, a computer-readable storage medium representing remote, local, fixed and/or removable storage components as well as storage media for temporarily and/or more permanently containing, storing, transmitting and retrieving computer-readable information.
- the system and various devices also typically will include a number of software applications, modules, services or other elements located within at least one working memory component, including an operating system and application programs such as a client application or Web browser. It should be appreciated that alternate embodiments may have numerous variations from that described above. For example, customized hardware might also be used and/or particular elements might be implemented in hardware, software (including portable software, such as applets) or both. Further, connection to other computing devices such as network input/output devices may be employed.
- Storage media and computer readable media for containing code, or portions of code can include any appropriate media known or used in the art, including storage media and communication media, such as but not limited to volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage and/or transmission of information such as computer readable instructions, data structures, program modules or other data, including RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage components or any other medium which can be used to store the desired information and which can be accessed by a system.
- RAM random access memory
- ROM read only memory
- EEPROM electrically erasable programmable read-only memory
- flash memory electrically erasable programmable read-only memory
- CD-ROM compact disc read-only memory
- DVD digital versatile disk
- magnetic cassettes magnetic tape
- magnetic disk storage magnetic disk storage components
Abstract
Systems and approaches enable concurrent interaction with multiple user applications in a multi-tasking environment. User input, such as voice commands, head movement, hand or finger gestures, device motion, can be received to a centralized component of a system. State information for each user application can be determined, and the centralized component can send a recognized command or gesture to the appropriate user application(s) based on the state information and/or rules for propagating user input. Additionally, users can configure the input modalities of each user application to customize interaction with systems.
Description
As personal electronic devices, such as laptop computers, tablets, smartphones, or portable media players, become increasingly sophisticated, people are able to interact with such devices in new and interesting ways. For example, some personal electronic devices are capable of detecting touches and other touch-based gestures, such as by capacitive touch sensors incorporated in a touchscreen. The tap of a virtual key of a soft keyboard displayed on the touchscreen may correspond to entry of the key into a device. A swipe of the touchscreen may navigate a user to a different portion of a graphical user interface presented on the touchscreen. Other devices can detect device motion via inertial sensors, such as accelerometers, gyroscopes, magnetometers, and/or inclinometers, and perform actions based on the detected motion. For instance, a device can detect a rotation of the device of approximately ninety degrees, interpret such motion as an intent of the user to change the orientation of content being displayed on the device from portrait mode to landscape mode (or vice versa), and re-display the content according to the changed orientation of the device. As electronic devices become more powerful and capable of sensing more of the world around them, new approaches can be developed for users to interact with such devices.
Various embodiments in accordance with the present disclosure will be described with reference to the drawings, in which:
In certain situations, users may desire to interact concurrently with multiple applications in a multi-tasking environment. Conventional systems and approaches may support multi-tasking, wherein a device can provide for concurrent execution of multiple user applications. However, conventional devices and techniques may be limited to direct interaction with a single application at a time. For example, a user may be operating a first user application, such as a web browser or an email application, while a music player application is concurrently executing. At a particular point in time, the user may wish to replay a song or skip a song playing on the music player. In conventional systems and approaches, the user may be required to halt interaction with the first user application, select the music player as the active or foreground application, direct the music player to replay the song or skip the song, and re-select the first user application to continue interacting with the first user application. As another example, the user may be interacting with a first user application while a second user application is concurrently running in the background. The user may change the orientation of a first graphical user interface corresponding to the first user application, such as by tilting the device to a new orientation. The user may then switch to operation of the second user application. In conventional devices and approaches, a second graphical user interface corresponding to the second user application may not immediately reflect the new orientation of the device. Instead, the user may have to re-tilt the device and/or there may be a delay associated with re-determining the new orientation of the device and re-displaying the second graphical interface to comport with the new orientation.
Systems and methods in accordance with various embodiments of the present disclosure may overcome one or more of the aforementioned and other deficiencies experienced in conventional approaches for managing user gestures and commands in a multi-tasking environment. In particular, various embodiments enable concurrent interaction with multiple applications in a multi-tasking environment via a global user input detection and management system. A device operating according to various embodiments can be configured to recognize an assortment of gestures and commands, such as touch-based gestures (e.g., taps, swipes, or other pointer gestures), auditory commands (e.g., voice commands, whistles, finger snaps), device motions and/or orientations (e.g., rotations or translations of the device, device gestures), visual gestures (e.g., hand gestures, facial movements, body movements), among others. User input recognition can be centralized instead of on an ad-hoc application-by-application basis. In this manner, gestures and commands may be better managed. For example, after a particular user input has been received and recognized, the device can determine a state of each user application currently executing on the device, including the types of input each user application supports.
A type of user input is a category of commands or gestures supported by an application, such as audio commands, touch gestures, device gestures, or visual gestures. A type of input can correspond to one more sensors or input devices. For example, audio or voice commands may be associated with a microphone, touch gestures may be associated with one or more touch sensors, device gestures may be associated with accelerometers, gyroscopes, magnetometers, and visual gestures may be associated with one more cameras or other optical input devices. It will be appreciated that certain types of user inputs may correspond to sensors or other input devices that are also associated with other types of use inputs. For instance, in certain embodiments, voice commands may be based on audio data captured by a microphone and image data of a user's lip movement captured by one or more cameras, which can be used to enhance voice recognition. Other sensors and input devices whose data can be influenced by a user or whose data can provide additional context for command/gesture recognition can also be used in various embodiments, such as thermal sensors (e.g., the user placing a device closer or further away from the user's body), location determination components (e.g., GPS, cellular network system, radio frequency (RF) antenna, NFC antenna, Bluetooth®, altimeter), ambient light sensors (e.g., influencing cameras and optical sensors), among others.
In various embodiments, a computing device can be configured to intelligently distribute user input received to the device to an appropriate application. The device may process a set of rules for propagating user input and select at least one of the user applications for receiving the recognized gesture or command based on the state of each user application and the propagation rules. In one embodiment, a user may be concurrently operating multiple applications on a computing device, with a first user application running in the foreground and a second user application running in the background. The user may change the orientation of content being displayed by the first user application by tilting the device. The new orientation of the device can be propagated to each user application configured to receive and recognize such user input. Instead of each user application having to re-execute code (separate or shared) to ascertain the orientation of the device, determination of the orientation of the device can occur once and be distributed to interested applications. This may reduce processing by the computing device and increase battery life. Further, there may be less latency associated with the change in the orientation of the second graphical user interface such that the device may be more responsive than conventional systems and techniques.
As another example, a user may be operating multiple applications in multiple windows, such as a video game in one window and an email application in a second window. The game may be a first-person perspective game wherein navigation is based on device motion (e.g., tilting the device forward, backward, right, or left causes the video game character to move forward, backward, right, or left, respectively). The email application may also include a motion-based interface. The user may interact with the email application by performing certain gestures with the device (e.g., tilting the device forward may cause an email to be opened, tilting the device to the right may result in selection of a next email, and tilting the device to the left may result in selection of a previous email). In an embodiment, a tilt of the device may be passed to the video game for consumption by the video game because propagation rules may prioritize the video game for receiving such user input. The video game may be paused, however, and the device motion may be distributed to the email application instead.
An electronic device that implements a global approach for handling user input may also improve device power usage by exercising greater control over activation and deactivation of cameras, sensors, and other input devices. Thus, a user application may request that certain types of user input or input modalities be available in specific instances. For example, an application may indicate that certain types of user input or input modalities must be available when the application is running (e.g., the user has launched the application and the application is running but could be running in the background), when the application is visible on the screen, or when the application has focus (e.g., the application is displayed on the screen and has priority over other applications for receiving input). The device could maintain state information for each executing user application and activate/deactivate sensors and other input devices based on the execution state of an application (e.g., the application is running, displayed, or focused). It will be appreciated that in at least some embodiments, multiple applications can be running and displayed simultaneously. In at least some embodiments, a user application may have focus but may not necessarily be displayed at the top-most layer of a graphical user interface. For example, a first user application may retain focus even when a pop-up window overlays the first user application. In some embodiments, whether a particular application has focus may also depend on input modality. For instance, a first user application may have focus with respect to visual gestures and a second user application may have focus with respect to entry via a keyboard.
As another example, a user application may have an interface that is based on visual gestures. The device may keep a camera turned on and continuously sample image data while the application is executing to monitor for a visual gesture from a user. This may quickly drain the battery of the device, especially if multiple applications are concurrently executing. A global user input management system could utilizes a different approach that uses power more efficiently, such as sampling images at a lower resolution, sampling over longer periods of time until an initial user motion is detected, sampling only portions of images, among other techniques. Alternatively, or in addition, the device could monitor a remaining amount of battery life and implement a more power-efficient approach for recognizing user input when the battery life is low.
Various other functions and advantages are described and suggested below in accordance with the various embodiments.
In this example, the computing device includes at least one camera 106 located on the front of the device and the on same surface as the display screen to capture image data of subject matter facing the front of the device, such as the user 102 viewing the display screen. It should be understood that, while the components of the example device are shown to be on a “front” of the device, there can be similar or alterative components on the “top,” “side,” or “back” of the device as well (or instead). Further, directions such as “top,” “side,” and “back” are used for purposes of explanation and are not intended to require specific orientations unless otherwise stated. In some embodiments, a computing device may also include more than one camera on the front of the device and/or one or more cameras on the back (and/or sides) of the device capable of capturing image data facing the back surface (and/or top, bottom, or side surface) of the computing device. In this example, the camera 106 comprises a digital camera incorporating a CMOS image sensor. In other embodiments, a camera of a device can incorporate other types of image sensors (such as a charged couple device (CCD)) and/or can incorporate multiple cameras, including at least one wide-angle optical element, such as a fish eye lens, that enables the camera to capture images over a wide range of angles, such as 180 degrees or more. Further, each camera can comprise a digital still camera, configured to capture subsequent frames in rapid succession, or a video camera able to capture streaming video. In still other embodiments, a computing device can include other types of imaging elements, such as ambient light sensors, IR sensors, and other optical, light, imaging, or photon sensors.
In this example, although not visible from the exterior of the device, the computing device also includes one or more motion or orientation determination elements, such as accelerometers, gyroscopes, magnetometers, inclinometers, proximity sensors, distance sensors, depth sensors, range finders, ultrasonic transceivers, among others. In other embodiments, motion or orientation can be determined using image analysis techniques. In still other embodiments, a combination of approaches, such as one or more techniques based on inertial sensors and one or more image analysis techniques can be aggregated or fused to estimate motion of the device.
The computing device 100 also includes one or more microphones 110 or other audio capture components capable of capturing audio data, such as words spoken by the user 102 of the device. In this example, the microphone 110 is placed on the same side of the device 100 as the display screen 108, such that the microphone 110 will typically be better able to capture words spoken by a user of the device. In at least some embodiments, the microphone can be a directional microphone that captures sound information from substantially directly in front of the device, and picks up only a limited amount of sound from other directions, which can help to better capture words spoken by a primary user of the device. In other embodiments, a computing device may include multiple microphones to capture 3D audio. In at least some embodiments, a computing device can also include an audio output element, such as internal speakers or one or more ports to support peripheral audio output components, such as headphones or loudspeakers.
Approaches in accordance various embodiments enable concurrent interaction with multiple applications in a multi-tasking environment. For example, a user may wish to interact with any one of applications 122, 126, and 128 by voice command, such as “Start up App A” for the home screen application, “Create a new email message” for the email application, or “Play the next song” for the music player. As another example, home screen application 122 may be configured to recognize the gaze of the user with respect to the device as input, such as for rendering the content of the home screen according to the user's gaze, and music player 128 may support hand or finger gestures. Shaking a thumb in front of the camera 106 in a leftward direction can cause the selection of a previous track of an album being played by the music player, shaking the thumb in a rightward direction can cause selection of the next track, shaking the thumb upward may cause the current track to be played, shaking the thumb downward may cause the music player to stop playing the current track, and shaking an open palm toward the front of the camera may cause the music player to pause the current track. In some embodiments, the device may be capable of concurrently recognizing head tracking gestures and hand gestures to enable the user to cause the contents of the home screen to be rendered according to a new direction of his gaze and perform thumb gestures to control music playback at substantially the same time. In other embodiments, the device can recognize a particular type of user input (e.g., one of facial movement or hand/finger gesture) and forward the user input to the appropriate user application for receiving the recognized user input. User input distribution may be based on propagation rules and/or a respective state of each user application, as discussed elsewhere herein.
It will be appreciated that other embodiments may recognize various other types of user gestures and commands as input for a computing device. In some embodiments, head or facial movements can be recognized as user input. Approaches for recognizing facial expressions or movements as input for a computing device are discussed in co-pending U.S. patent application Ser. No. 12/332,049, filed Dec. 8, 2010, entitled, “Movement Recognition as Input Mechanism,” which is incorporated by reference herein. Further, other facial features, such as a user's eyes, mouth, nose, or other facial features, can be analyzed over a set of images to determine whether changes in the user's facial features correspond to user input. For example, eye winks, patterns of eye winks, or other ocular motions can be recognized by a computing device to perform various actions. Approaches for detecting a user's eye movements as input for a computing device are discussed in co-pending U.S. patent application Ser. No. 13/791,265, filed Mar. 7, 2013, entitled, “User Eye Input to Display Content,” which is incorporated by reference herein. In addition, some embodiments can detect other bodily movements, such as motion of the arms, legs, and/or other parts of a user, as input for a computing device. Approaches for detecting bodily movements as user input for a computing device are discussed in co-pending U.S. patent application Ser. No. 13/914,306, filed Jun. 10, 2013, entitled, “Dynamic User Detection and Tracking,” which is incorporated by reference herein.
In some embodiments, a device may include one or more microphones for capturing audio data. The device may be capable of analyzing the received audio data to recognize auditory commands, such as voice commands, whistles, hand claps, finger snaps, among others. Approaches for recognizing auditory commands as user input are discussed in allowed U.S. patent application Ser. No. 12/879,981, filed Sep. 10, 2010, entitled, “Speech-Inclusive Device Interfaces,” which is incorporated by reference herein. In at least some embodiments, voice command recognition may be enhanced based on image analysis techniques performed on image data captured of the user's mouth or other user motion (e.g., nodding or shaking of the user's head). Such approaches are discussed in co-pending U.S. patent application Ser. No. 13/626,580, filed Sep. 25, 2012, entitled, “Gesture and Vocalization Recognition,” which is incorporated by reference herein.
As mentioned, in some embodiments, motion of a computing device can be recognized as user input. In at least some embodiments, motion of the device can be detected using one or more inertial sensors, such as accelerometers, gyroscopes, and/or magnetometers. In other embodiments, motion of the device can be estimated based on analyzing one or more objects captured over a sequence of images using image analysis techniques such as block-matching, optical flow, phase correlation, feature-based methods, among others. In still other embodiments, data from cameras, inertial sensors, and other input devices can be combined using sensor fusion techniques to estimate motion of the device. These various approaches are discussed in co-pending U.S. patent application Ser. No. 13/965,126, filed Aug. 12, 2013, entitled, “Robust User Detection and Tracking,” which is incorporated by reference herein.
The next layer in the software stack 200 is the system libraries layer 230 which can provide support for functionality such as windowing (e.g., Surface Manager), 2D and 3D graphics rendering, Secure Sockets Layer (SSL) communication, SQL database management, audio and video playback, font rendering, webpage rendering, System C libraries, among others. In an embodiment, system source libraries layer 230 can comprise open source libraries such as Skia Graphics Library (SGL) (e.g., 2D graphics rendering), Open Graphics Library (OpenGL) or OpenGL for Embedded Systems (OpenGL ES) (e.g., 3D graphics rendering), Open SSL (e.g., SSL communication), SQLite (e.g., SQL database management), Free Type (e.g., font rendering), WebKit (e.g., webpage rendering), and libc (e.g., System C libraries). In this example, the system libraries layer 230 can also include a hardware abstraction layer 220 comprising of a set of interfaces that hardware drivers are required to implement. Each hardware interface may loaded by the system at runtime on an as needed basis. The hardware abstraction layer 220 can provide interfaces for hardware components of a computing device, such as the graphics card, audio card, cameras, GPS, radio frequency (RF) modem, WiFi antenna, among others.
Located on the same level as the system libraries layer is the runtime layer 240, which can include core libraries and the virtual machine engine. In an embodiment, the virtual machine engine may be based on Dalvik®. The virtual machine engine provides a multi-tasking execution environment that allows for multiple processes to execute concurrently. Each application running on the device is executed as an instance of a Dalvik® virtual machine. To execute within a Dalvik® virtual machine, application code is translated from Java® class files (.class, .jar) to Dalvik® bytecode (.dex). The core libraries provide for interoperability between Java® and the Dalvik® virtual machine, and expose the core APIs for Java®, including data structures, utilities, file access, network access, graphics, among others.
The application framework 250 comprises a set of services through which user applications interact. These services manage the basic functions of a computing device, such as resource management, voice call management, data sharing, among others. In particular, the Activity Manager controls the activity life cycle of user applications. The Package Manager enables user applications to determine information about other user applications currently installed on a device. The Window Manager is responsible for organizing contents of a display screen. The Resource Manager provides access to various types of resources utilized by user application, such as strings and user interface layouts. Content Providers allow user applications to publish and share data with other user applications. The View System is an extensible set of views used to create user interfaces for user applications. The Notification Manager allows for user applications to display alerts and notifications to end users. The Telephony Manager manages voice calls. The Location Manager provides for location management, such as by GPS or cellular network. Other hardware managers in the application framework 250 include the Bluetooth Manager, WiFi Manager, USB Manager, Sensor Manager, among others (not shown here).
Located at the top of the software stack 200 are user applications, such as the home screen application, email application, music player, web browser, among others.
User applications, such as a home screen application, email application, music player, browser, among others, can interface with the User Input Manager service 352, including registering/unregistering the input modalities supported by each user application, defining the rules by which each user application receives gestures or commands, and providing information about the state of each application. The User Input Manager 352 may interact with other components 354 within the application framework 350, such as to determine state information for applications currently executing on a device. These other components 354 may include the Activity Manager, Package Manager, Window Manager, Resource Manager, View System, Notification Manager, Telephony Manager, Location Manager, among others. The global user input management system can include an extensible set of recognizers for the various types of inputs or modalities supported by a computing device, such as an Audio Command Recognizer, Visual Gesture Recognizer, and Device Motion Recognizer. The system can be extended to include new types of recognizers for other sensors and input devices of a computing device. Further, each of the recognizers can be extended in various embodiments. In this example, the system includes a Voice Command Recognizer which extends from the Audio Command Recognizer and a Head Gesture Recognizer and a Hand Gesture Recognizer which each extend from the Visual Gesture Recognizer. The recognizers interface with components of the hardware abstraction layer to detect and recognize user input. In various embodiments, recognizers can fuse data from multiple sensors to more accurately detect and recognize user gestures and commands. Here, the Voice Command Recognizer may enhance voice recognition by analyzing image data corresponding to a user's lip movement. Therefore, in addition to analyzing audio data captured by audio components, the Voice Command Recognizer may also analyze image data captured by a camera of a computing device.
In some embodiments, recognizers may also pre-process raw user input such as by translating speech to text or sampling a gesture spatially and rendering the gesture as a two-dimensional image. For example, a gesture may correspond to touches, a finger waving in the air, or motion of a device. The gesturing object, i.e., fingertip on a touchscreen, finger in the air, or device, can be pointillized and sampled in space such that the gesture forms a shape that can be represented as the 2-D image. In some embodiments, the recognizers may utilize a “library” or “dictionary” that maps data corresponding to user input, whether raw or pre-processed, to a higher level command. For instance, a media playing application may incorporate a visual gesture interface wherein particular gestures may be mapped to higher level commands such as skipping to a previous track or stopping play of a current track.
It will be appreciated by those of ordinary skill in the art that a global user input management system could operate equally well in a system having fewer or a greater number of components than are illustrated in FIG. 3 . Thus, the depiction of the system 300 in FIG. 3 should be taken as being illustrative in nature and not limiting to the scope of the disclosure.
Also illustrated in example 400 is music player 430 similarly exposing an input modality interface indicating the types of user input supported by the music player. Here, the input modalities capable of being recognized by the music player include touch gestures as represented by touch icon 432, voice commands as indicated by voice icon 434, device motions as indicated by motion icon 436, and visual gestures as indicated by visual icon 438. In this example, the music player has registered with the global user input management service to receive touch gestures, voice commands, and visual gestures but not device motions. It will be appreciated that user applications can be capable of supporting other input modalities in various embodiments. For instance, in other embodiments, gestures and commands supported by user applications can be broader. In some embodiments, user applications are not necessarily limited to voice commands and may be capable of responding to auditory commands generally, such as whistles, hand claps, tongue clicks, among others. Input modalities supported by user applications may also be more granular in other embodiments. For instance, visual gestures may be further categorized according to specific user features, such as the user's head, face, eyes, mouth, hand, finger(s), arms, legs, among others.
Provision of an input modality interface, such as interface 412, can be advantageous for users. A user may select or unselect certain modes of input for each user application to customize how she interacts with the device. For example, a user may have elected for voice commands to bypass email application 410 and/or selected voice commands to be received by music player 430 in order to concurrently interact with both applications. The user could maximize the graphical user interface corresponding to the email application on the touchscreen yet continue to interact with the music player via voice command. In addition, these user settings can be automatically saved for future use.
As mentioned, propagation rules can be used by a global user input management system to determine how to distribute user inputs that have been received and recognized by the system. Propagation rules can be defined by the device platform, user applications, or the user in various embodiments. An example of a propagation rule is to broadcast a type of user input to any executing application that has registered for that type of input. As another example, a propagation rule can forward a user input to the last active user application supporting the type of the user input. Some rules, such as rule 518, may require certain content to be included in the user input or a certain format for the user input in order to be propagated to a user application. Content can include keywords, image data, gestures, a change in sensor data meeting certain thresholds, among others. For example, a keyword could be a name of the application or a voice command that pertains to the application. A user application that is only interested in facial movement may require that the image data includes at least one instance of a person's face. Similar to keywords, certain gestures can act as a cue or indicator that the user intends for input to be directed to a specific application. A specified format for a propagation rule can be defined using a template, such as a phrase pattern for a voice command or a gesture pattern for a touch gesture or visual gesture. Propagation rules can also be based on threshold lengths of time (minimum and/or maximum). Certain propagation rules can depend on the state of an executing application, such as bypassing a user application when the application is in a paused or suspended state. Other propagation rules may be based on the detected command or gesture being within threshold confidence levels. Propagation rules can also be based on a priority of each executing application as determined by a category of the application (e.g., business, finance, games), a time the user last directly interacted with the application, the percentage of a display screen corresponding to the application, the frequency of usage of the application, among others. A propagation rule may dictate that a certain command or gesture or a type of command or gesture is “monolithic” and is to be propagated to every executing application. Various other examples should be apparent in light of the teachings and suggestions contained herein.
In some embodiments, certain input modalities may only be available when an application has focus or is directly being interacted with by the user. For example, two user applications may be concurrently executing on a device and a first application supports a touch interface and the second application does not support a touch interface. When the first application has focus, touch-related software and/or hardware may be activated to monitor touch interactions. However, when the second application has focus (or the first application is sent to the background), the touch software and/or hardware may be deactivated. Such an approach can potentially conserve power and free computing resources for active processes. A user application can declare, via a propagation rule, whether a certain input modality should be available when the application has focus, such as via touch, and/or whether an input modality should always be available even when the application is running in the background, such as via audio command or visual gesture. When a type of input should only be available under certain conditions, the global user input management system can monitor those conditions and deactivate software and/or hardware when those conditions are not met.
Further, the device may monitor for user input corresponding to the modalities supported by each executing user application (and when certain conditions are met) by capturing input data using a sensor or other input device corresponding to the supported modalities 608. In some embodiments, the input data must be capable of being responded to meaningfully by the user application. For example, a user application that does not recognize voice commands can hypothetically have voice data forwarded to the application. Such a user application may simply discard the voice data as it would be unintelligible by the user application. Such a response however is not a meaningful response as used herein. As another example, two user applications may be capable of recognizing touch gestures as a general matter. However, a touch outside of a window corresponding to a user application in a multi-window environment or a touch while a user application is in the background would not be meaningfully responded to by that user application.
In some embodiments, user applications may be multi-modal and one of the types of input supported by such applications may be de-selected. For instance, a user may be operating a word processor and a music player concurrently. The word processor and the music player may each include a touch-based interface as well as support voice commands. The user may wish to operate the word processor using the touch-based interface of the word processor and the music player using the voice-based interface of the music player. The user may configure the word processor to bypass voice commands. Using such an approach, the user may interact with the word processor via the touch-based interface without having to switch between the graphical user interface of the word processor and the graphical user interface of the music player. Further, the user can maximize the graphical user interface of the word processor while still being able to control the music player via voice command. Thus, the settings of the types of input corresponding to the types of user input supported by a user application can be configured by the user, and determination of the state of the user application can include identification of such settings.
In this example process, the device may determine at least one of the user applications for receiving data corresponding to the user input 610. In some embodiments, user input data can be pre-processed by the device and forwarded to a suitable user application. For example, audio data captured by a microphone of a device can be pre-processed by converting the audio data from an analog format to a digital format, converting digital voice data and/or mapping a voice command encapsulated in the audio data to a higher level command to the device. As another example, visual gestures can be pre-processed by pointillizing an object to be tracked for gesture recognition, sampling the tracked point/object in space, converting the sampled data to a 2-D image, and mapping the image to a higher-level command from a gesture dictionary or library. In some embodiments, pre-processing can include classifying or identifying the user input and correlating the user input to a higher level command. In other embodiments, the raw sensor data (e.g., voice data, image data, motion data) captured by the device can be forwarded to interested applications. In still other embodiments, an intermediate form of the user input can be forwarded to user applications, such as text corresponding to voice data or motion data corresponding to visual gestures.
In some embodiments, determination of the user application for receiving data corresponding to the user input can be based at least in part on a set of propagation rules. For example, one propagation rule may be based on ranking or prioritizing each executing user application for receiving user input. The ranking or sorting of user applications according may be based on a category of each user application, the last time the user directly interacted with each user application, the frequency of usage of each application, or the percentage of a display screen taken up by each application, among others. Another propagation rule may be based on the content of the user input, such as the user input including a cue or indicator or conforming to a specified format. Propagation rules can also direct the user input to be broadcast to multiple user applications. Various other examples should be apparent in light of the teachings and suggestions contained herein. After one or more of the user applications have been selected for receiving the data corresponding to the user input, the device can propagate the data to the selected user application(s) 612 and the user application may perform an action in response to receiving the data corresponding to the user input.
The computing device 800 includes at least one capacitive component or other proximity sensor, which can be part of, or separate from, the display assembly. In at least some embodiments the proximity sensor can take the form of a capacitive touch sensor capable of detecting the proximity of a finger or other such object as discussed herein. The computing device also includes various power components 814 known in the art for providing power to a computing device, which can include capacitive charging elements for use with a power pad or similar component. The computing device can include one or more communication elements or networking sub-systems 816, such as a Wi-Fi, Bluetooth, RF, wired, or wireless communication system. The computing device in many embodiments can communicate with a network, such as the Internet, and may be able to communicate with other computing devices. In some embodiments the computing device can include at least one additional input component 818 able to receive conventional input from a user. This conventional input component can include, for example, a push button, touch pad, touchscreen, wheel, joystick, keyboard, mouse, keypad, or any other such component or element whereby a user can input a command to the computing device. In some embodiments, however, such a computing device might not include any buttons at all, and might be controlled only through a combination of visual and audio commands, such that a user can control the device without having to be in contact with the computing device.
The computing device 800 also can include one or more orientation and/or motion determination sensors 812. Such sensor(s) can include an accelerometer or gyroscope operable to detect an orientation and/or change in orientation, or an electronic or digital compass, which can indicate a direction in which the device is determined to be facing. The mechanism(s) also (or alternatively) can include or comprise a global positioning system (GPS) or similar positioning element operable to determine relative coordinates for a position of the computing device, as well as information about relatively large movements of the computing device. The computing device can include other elements as well, such as may enable location determinations through triangulation or another such approach. These mechanisms can communicate with the processor 802, whereby the computing device can perform any of a number of actions described or suggested herein.
In some embodiments, the computing device 800 can include the ability to activate and/or deactivate detection and/or command modes, such as when receiving a command from a user or an application, or retrying to determine an audio input or video input, etc. For example, a computing device might not attempt to detect or communicate with other computing devices when there is not a user in the room. If a proximity sensor of the computing device, such as an IR sensor, detects a user entering the room, for instance, the computing device can activate a detection or control mode such that the device can be ready when needed by the user, but conserve power and resources when a user is not nearby.
In some embodiments, the computing device 800 may include a light-detecting element that is able to determine whether the computing device is exposed to ambient light or is in relative or complete darkness. Such an element can be beneficial in a number of ways. For example, the light-detecting element can be used to determine when a user is holding the device up to the user's face (causing the light-detecting element to be substantially shielded from the ambient light), which can trigger an action such as the display element to temporarily shut off (since the user cannot see the display element while holding the device to the user's ear). The light-detecting element could be used in conjunction with information from other elements to adjust the functionality of the computing device. For example, if the computing device is unable to detect a user's view location and a user is not holding the computing device but the computing device is exposed to ambient light, the computing device might determine that it has likely been set down by the user and might turn off the display element and disable certain functionality. If the computing device is unable to detect a user's view location, a user is not holding the computing device and the computing device is further not exposed to ambient light, the computing device might determine that the computing device has been placed in a bag or other compartment that is likely inaccessible to the user and thus might turn off or disable additional features that might otherwise have been available. In some embodiments, a user must either be looking at the computing device, holding the computing device or have the computing device out in the light in order to activate certain functionality of the computing device. In other embodiments, the computing device may include a display element that can operate in different modes, such as reflective (for bright situations) and emissive (for dark situations). Based on the detected light, the computing device may change modes.
In some embodiments, the computing device 800 can disable features for reasons substantially unrelated to power savings. For example, the computing device can use voice recognition to determine people near the computing device, such as children, and can disable or enable features, such as Internet access or parental controls, based thereon. Further, the computing device can analyze recorded noise to attempt to determine an environment, such as whether the computing device is in a car or on a plane, and that determination can help to decide which features to enable/disable or which actions are taken based upon other inputs. If speech or voice recognition is used, words can be used as input, either directly spoken to the computing device or indirectly as picked up through conversation. For example, if the computing device determines that it is in a car, facing the user and detects a word such as “hungry” or “eat,” then the computing device might turn on the display element and display information for nearby restaurants, etc. A user can have the option of turning off voice recording and conversation monitoring for privacy and other such purposes.
In some of the above examples, the actions taken by the computing device relate to deactivating certain functionality for purposes of reducing power consumption. It should be understood, however, that actions can correspond to other functions that can adjust similar and other potential issues with use of the computing device. For example, certain functions, such as requesting Web page content, searching for content on a hard drive and opening various applications, can take a certain amount of time to complete. For computing devices with limited resources, or that have heavy usage, a number of such operations occurring at the same time can cause the computing device to slow down or even lock up, which can lead to inefficiencies, degrade the user experience and potentially use more power. In order to address at least some of these and other such issues, approaches in accordance with various embodiments can also utilize information such as user gaze direction to activate resources that are likely to be used in order to spread out the need for processing capacity, memory space and other such resources.
In some embodiments, the computing device can have sufficient processing capability, and the camera and associated image analysis algorithm(s) may be sensitive enough to distinguish between the motion of the computing device, motion of a user's head, motion of the user's eyes and other such motions, based on the captured images alone. In other embodiments, such as where it may be desirable for an image process to utilize a fairly simple camera and image analysis approach, it can be desirable to include at least one motion and/or orientation determining element that is able to determine a current orientation of the computing device. In one example, the one or more orientation and/or motion sensors may comprise a single- or multi-axis accelerometer that is able to detect factors such as three-dimensional position of the device and the magnitude and direction of movement of the device, as well as vibration, shock, etc. Methods for using elements such as accelerometers to determine orientation or movement of a computing device are also known in the art and will not be discussed herein in detail. Other elements for detecting orientation and/or movement can be used as well within the scope of various embodiments for use as the orientation determining element. When the input from an accelerometer or similar element is used along with the input from the camera, the relative movement can be more accurately interpreted, allowing for a more precise input and/or a less complex image analysis algorithm.
When using a camera of the computing device to detect motion of the device and/or user, for example, the computing device can use the background in the images to determine movement. For example, if a user holds the computing device at a fixed orientation (e.g. distance, angle, etc.) to the user and the user changes orientation to the surrounding environment, analyzing an image of the user alone will not result in detecting a change in an orientation of the computing device. Rather, in some embodiments, the computing device can still detect movement of the device by recognizing the changes in the background imagery behind the user. So, for example, if an object (e.g., a window, picture, tree, bush, building, car, etc.) moves to the left or right in the image, the computing device can determine that the computing device has changed orientation, even though the orientation of the computing device with respect to the user has not changed. In other embodiments, the computing device may detect that the user has moved with respect to the device and adjust accordingly. For example, if the user tilts their head to the left or right with respect to the computing device, the content rendered on the display element may likewise tilt to keep the content in orientation with the user.
The various embodiments can be further implemented in a wide variety of operating environments, which in some cases can include one or more user computers or computing devices which can be used to operate any of a number of applications. User or client devices can include any of a number of general purpose personal computers, such as desktop or laptop computers running a standard operating system, as well as cellular, wireless and handheld devices running mobile software and capable of supporting a number of networking and messaging protocols. Such a system can also include a number of workstations running any of a variety of commercially-available operating systems and other known applications for purposes such as development and database management. These computing devices can also include other electronic devices, such as dummy terminals, thin-clients, gaming systems and other devices capable of communicating via a network.
The operating environments can include a variety of data stores and other memory and storage media as discussed above. These can reside in a variety of locations, such as on a storage medium local to (and/or resident in) one or more of the computers or remote from any or all of the computers across the network. In a particular set of embodiments, the information may reside in a storage-area network (SAN) familiar to those skilled in the art. Similarly, any necessary files for performing the functions attributed to the computers, servers or other network component may be stored locally and/or remotely, as appropriate. Where a system includes computerized devices, each such device can include hardware elements that may be electrically coupled via a bus, the elements including, for example, at least one central processing unit (CPU), at least one input element (e.g., a mouse, keyboard, controller, touch-sensitive display element or keypad) and at least one output element (e.g., a display screen, printer, or speaker). Such a system may also include one or more storage components, such as disk drives, optical storage components and solid-state storage systems such as random access memory (RAM) or read-only memory (ROM), as well as removable media components, memory cards, flash cards, etc.
Such computing devices can also include a computer-readable storage media reader, a communications component (e.g., a modem, a network card (wireless or wired), an infrared communication element), and working memory, as described above. The computer-readable storage media reader can be connected with, or configured to receive, a computer-readable storage medium representing remote, local, fixed and/or removable storage components as well as storage media for temporarily and/or more permanently containing, storing, transmitting and retrieving computer-readable information. The system and various devices also typically will include a number of software applications, modules, services or other elements located within at least one working memory component, including an operating system and application programs such as a client application or Web browser. It should be appreciated that alternate embodiments may have numerous variations from that described above. For example, customized hardware might also be used and/or particular elements might be implemented in hardware, software (including portable software, such as applets) or both. Further, connection to other computing devices such as network input/output devices may be employed.
Storage media and computer readable media for containing code, or portions of code, can include any appropriate media known or used in the art, including storage media and communication media, such as but not limited to volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage and/or transmission of information such as computer readable instructions, data structures, program modules or other data, including RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage components or any other medium which can be used to store the desired information and which can be accessed by a system. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will appreciate other ways and/or methods to implement the various embodiments.
The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that various modifications and changes may be made thereunto without departing from the broader spirit and scope of the invention as set forth in the claims.
Claims (13)
1. A computing system, comprising:
one or more processors;
one or more microphones;
one or more cameras; and
memory including instructions that, when executed by the one or more processors, cause the computing system to:
execute a first application of the computing system;
execute a second application of the computing system during a first period of time in which the first application is also executed;
capture audio data during the first period of time using the one or more microphones;
capture image data during the first period of time using the one or more cameras;
process the audio data to identify a first keyword;
process the image data to identify a first gesture;
determine that the first keyword corresponds to the first application, and the first gesture corresponds to the second application;
send, based on the first keyword corresponding to the first application, a first command to the first application; and
send, based on the first gesture corresponding to the second application, a second command to the second application.
2. The computing system of claim 1 , further comprising further instructions that, when executed by the one or more processors, further cause the computing system to:
receive, from the first application, a registration of the first keyword.
3. The computing system of claim 2 , further comprising further instructions that, when executed by the one or more processors, further cause the computing system to:
prioritize the first application for receiving the first command over the second application receiving the second command.
4. A computer-implemented method, comprising:
associating one or more first keywords with a first application;
associating one or more second gestures with a second application;
executing the first application on a computing device during a first period of time in which the second application is also executed on the computing device;
receiving audio input data captured during the first period of time by one or more audio input components of the computing device;
receiving image data captured during the first period of time by one or more cameras of the computing device;
processing the audio input data to identify a first keyword;
processing the image data to identify a first gesture;
determining that the first keyword corresponds to the first application, and the first gesture corresponds to the second application;
sending, based on the first keyword corresponding to the first application, a first command to the first application; and
sending, based on the first gesture corresponding to the second application, a second command to the second application.
5. The computer-implemented method of claim 4 , wherein the image data corresponds to lip movement, the method further comprising:
analyzing the audio input data and the image data corresponding to lip movement to enhance recognition of the audio input data.
6. The computer-implemented method of claim 4 , further comprising:
determining that the first application has focus; and
determining that the second application does not have focus.
7. The computer-implemented method of claim 4 , further comprising:
receiving, from the first application, a registration of the one or more first keywords.
8. The computer-implemented method of claim 7 , further comprising:
prioritizing the first application for receiving the first command over the second application receiving the second command.
9. The computer-implemented method of claim 8 , further comprising:
setting a prioritization of the first application over the second application based at least in part upon a category of the first application, a time a user last directly interacted with the first application, a percentage of a display screen corresponding to the first application, or a frequency of usage of the first application.
10. The computer-implemented method of claim 4 , further comprising:
capturing second input data using a second input component of the computing device; and
processing the second input data to increase a confidence level associated with identifying the first keyword.
11. A non-transitory computer-readable storage medium storing instructions, the instructions when executed by a processor causing a computing device to:
associate one or more first keywords with a first application;
associate one or more second gestures with a second application;
execute the first application on the computing device during a first period of time in which the second application is also executed on the computing device;
receive audio input data captured during the first period of time by one or more audio input components of the computing device;
receive image data captured during the first period of time by one or more cameras of the computing device;
process the audio input data to identify a first keyword;
process the image data to identify a first gesture;
determine that the first keyword corresponds to the first application, and the first gesture corresponds to the second application;
send, based on the first keyword corresponding to the first application, a first command to the first application; and
send, based on the first gesture corresponding to the second application, a second command to the second application.
12. The non-transitory computer-readable storage medium of claim 11 , wherein the image data corresponds to lip movement, further comprising further instructions that, when executed by the processor, further cause the computing device to:
analyze the audio input data and the image data corresponding to lip movement to enhance recognition of the audio input data.
13. The non-transitory computer-readable storage medium of claim 11 , further comprising further instructions that, when executed by the processor, further cause the computing device to:
determine that the first application has focus; and
determine that the second application does not have focus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/018,331 US11199906B1 (en) | 2013-09-04 | 2013-09-04 | Global user input management |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/018,331 US11199906B1 (en) | 2013-09-04 | 2013-09-04 | Global user input management |
Publications (1)
Publication Number | Publication Date |
---|---|
US11199906B1 true US11199906B1 (en) | 2021-12-14 |
Family
ID=78828702
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/018,331 Active 2034-11-27 US11199906B1 (en) | 2013-09-04 | 2013-09-04 | Global user input management |
Country Status (1)
Country | Link |
---|---|
US (1) | US11199906B1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220283694A1 (en) * | 2021-03-08 | 2022-09-08 | Samsung Electronics Co., Ltd. | Enhanced user interface (ui) button control for mobile applications |
US20230158886A1 (en) * | 2020-03-17 | 2023-05-25 | Audi Ag | Operator control device for operating an infotainment system, method for providing an audible signal for an operator control device, and motor vehicle having an operator control device |
Citations (201)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4836670A (en) | 1987-08-19 | 1989-06-06 | Center For Innovative Technology | Eye movement detector |
US4866778A (en) | 1986-08-11 | 1989-09-12 | Dragon Systems, Inc. | Interactive speech recognition apparatus |
US5563988A (en) | 1994-08-01 | 1996-10-08 | Massachusetts Institute Of Technology | Method and system for facilitating wireless, full-body, real-time user interaction with a digitally represented visual environment |
US5594469A (en) | 1995-02-21 | 1997-01-14 | Mitsubishi Electric Information Technology Center America Inc. | Hand gesture machine control system |
US5616078A (en) | 1993-12-28 | 1997-04-01 | Konami Co., Ltd. | Motion-controlled video entertainment system |
US5621858A (en) | 1992-05-26 | 1997-04-15 | Ricoh Corporation | Neural network acoustic and visual speech recognition system training method and apparatus |
US5632002A (en) | 1992-12-28 | 1997-05-20 | Kabushiki Kaisha Toshiba | Speech recognition interface system suitable for window systems and speech mail systems |
US5960394A (en) | 1992-11-13 | 1999-09-28 | Dragon Systems, Inc. | Method of speech command recognition with dynamic assignment of probabilities according to the state of the controlled applications |
GB2350712A (en) | 1998-03-10 | 2000-12-06 | Fujitsu Ltd | Document processor and recording medium |
US6185529B1 (en) | 1998-09-14 | 2001-02-06 | International Business Machines Corporation | Speech recognition aided by lateral profile image |
US6249763B1 (en) | 1997-11-17 | 2001-06-19 | International Business Machines Corporation | Speech recognition apparatus and method |
US6266059B1 (en) * | 1997-08-27 | 2001-07-24 | Microsoft Corporation | User interface for switching between application modes |
US6272231B1 (en) | 1998-11-06 | 2001-08-07 | Eyematic Interfaces, Inc. | Wavelet-based facial motion capture for avatar animation |
US6339758B1 (en) | 1998-07-31 | 2002-01-15 | Kabushiki Kaisha Toshiba | Noise suppress processing apparatus and method |
WO2002015560A2 (en) | 2000-08-12 | 2002-02-21 | Georgia Tech Research Corporation | A system and method for capturing an image |
US6385331B2 (en) | 1997-03-21 | 2002-05-07 | Takenaka Corporation | Hand pointing device |
JP2002164990A (en) | 2000-11-28 | 2002-06-07 | Kyocera Corp | Mobile communication terminal |
US6404438B1 (en) | 1999-12-21 | 2002-06-11 | Electronic Arts, Inc. | Behavioral learning for a visual representation in a communication environment |
US6434255B1 (en) | 1997-10-29 | 2002-08-13 | Takenaka Corporation | Hand pointing apparatus |
US20020135618A1 (en) | 2001-02-05 | 2002-09-26 | International Business Machines Corporation | System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input |
US20020178344A1 (en) | 2001-05-22 | 2002-11-28 | Canon Kabushiki Kaisha | Apparatus for managing a multi-modal user interface |
JP2002351603A (en) | 2001-05-25 | 2002-12-06 | Mitsubishi Electric Corp | Portable information processor |
US20020194005A1 (en) | 2001-03-27 | 2002-12-19 | Lahr Roy J. | Head-worn, trimodal device to increase transcription accuracy in a voice recognition system and to process unvocalized speech |
US20030023435A1 (en) * | 2000-07-13 | 2003-01-30 | Josephson Daryl Craig | Interfacing apparatus and methods |
US20030023953A1 (en) * | 2000-12-04 | 2003-01-30 | Lucassen John M. | MVC (model-view-conroller) based multi-modal authoring tool and development environment |
US20030028382A1 (en) * | 2001-08-01 | 2003-02-06 | Robert Chambers | System and method for voice dictation and command input modes |
US6518957B1 (en) * | 1999-08-13 | 2003-02-11 | Nokia Mobile Phones Limited | Communications device with touch sensitive screen |
US20030083872A1 (en) | 2001-10-25 | 2003-05-01 | Dan Kikinis | Method and apparatus for enhancing voice recognition capabilities of voice recognition software and systems |
US6594629B1 (en) | 1999-08-06 | 2003-07-15 | International Business Machines Corporation | Methods and apparatus for audio-visual speech detection and recognition |
US20030171921A1 (en) | 2002-03-04 | 2003-09-11 | Ntt Docomo, Inc. | Speech recognition system, speech recognition method, speech synthesis system, speech synthesis method, and program product |
US20030190076A1 (en) | 2002-04-05 | 2003-10-09 | Bruno Delean | Vision-based operating method and system |
US6633305B1 (en) | 2000-06-05 | 2003-10-14 | Corel Corporation | System and method for magnifying and editing images |
US20040046795A1 (en) * | 2002-03-08 | 2004-03-11 | Revelations In Design, Lp | Electric device control apparatus and methods for making and using same |
US6728680B1 (en) | 2000-11-16 | 2004-04-27 | International Business Machines Corporation | Method and apparatus for providing visual feedback of speed production |
US20040080487A1 (en) * | 2002-10-29 | 2004-04-29 | Griffin Jason T. | Electronic device having keyboard for thumb typing |
US20040107103A1 (en) | 2002-11-29 | 2004-06-03 | Ibm Corporation | Assessing consistency between facial motion and speech signals in video |
US20040105573A1 (en) | 2002-10-15 | 2004-06-03 | Ulrich Neumann | Augmented virtual environments |
US6750848B1 (en) | 1998-11-09 | 2004-06-15 | Timothy R. Pryor | More useful man machine interfaces and applications |
US20040122666A1 (en) | 2002-12-18 | 2004-06-24 | Ahlenius Mark T. | Method and apparatus for displaying speech recognition results |
US20040140956A1 (en) | 2003-01-16 | 2004-07-22 | Kushler Clifford A. | System and method for continuous stroke word-based text input |
US20040205482A1 (en) | 2002-01-24 | 2004-10-14 | International Business Machines Corporation | Method and apparatus for active annotation of multimedia content |
JP2004318826A (en) | 2003-04-04 | 2004-11-11 | Mitsubishi Electric Corp | Portable terminal device and character input method |
US20040260438A1 (en) * | 2003-06-17 | 2004-12-23 | Chernetsky Victor V. | Synchronous voice user interface/graphical user interface |
US6863609B2 (en) | 2000-08-11 | 2005-03-08 | Konami Corporation | Method for controlling movement of viewing point of simulated camera in 3D video game, and 3D video game machine |
US6868383B1 (en) | 2001-07-12 | 2005-03-15 | At&T Corp. | Systems and methods for extracting meaning from multimodal inputs using finite-state devices |
US20050064912A1 (en) | 2003-09-19 | 2005-03-24 | Ki-Gon Yang | Hand-held phone capable of providing various vibrations with only one vibration motor |
US20050133693A1 (en) | 2003-12-18 | 2005-06-23 | Fouquet Julie E. | Method and system for wavelength-dependent imaging and detection using a hybrid filter |
US6927694B1 (en) | 2001-08-20 | 2005-08-09 | Research Foundation Of The University Of Central Florida | Algorithm for monitoring head/eye motion for driver alertness with one camera |
US20050212754A1 (en) * | 2004-03-23 | 2005-09-29 | Marvit David L | Dynamic adaptation of gestures for motion controlled handheld devices |
US6959102B2 (en) | 2001-05-29 | 2005-10-25 | International Business Machines Corporation | Method for increasing the signal-to-noise in IR-based eye gaze trackers |
CN1694045A (en) | 2005-06-02 | 2005-11-09 | 北京中星微电子有限公司 | Non-contact type visual control operation system and method |
US20050278467A1 (en) | 2004-05-25 | 2005-12-15 | Gupta Anurag K | Method and apparatus for classifying and ranking interpretations for multimodal input fusion |
WO2006036069A1 (en) | 2004-09-27 | 2006-04-06 | Hans Gude Gudensen | Information processing system and method |
US7039198B2 (en) | 2000-11-10 | 2006-05-02 | Quindi | Acoustic source localization system and method |
US7069215B1 (en) | 2001-07-12 | 2006-06-27 | At&T Corp. | Systems and methods for extracting meaning from multimodal inputs using finite-state devices |
US20060143006A1 (en) | 2001-10-22 | 2006-06-29 | Yasuharu Asano | Speech recognition apparatus and speech recognition method |
US20060155546A1 (en) | 2005-01-11 | 2006-07-13 | Gupta Anurag K | Method and system for controlling input modalities in a multimodal dialog system |
US20060167784A1 (en) | 2004-09-10 | 2006-07-27 | Hoffberg Steven M | Game theoretic prioritization scheme for mobile ad hoc networks permitting hierarchal deference |
US20060197753A1 (en) * | 2005-03-04 | 2006-09-07 | Hotelling Steven P | Multi-functional hand-held device |
US20060224382A1 (en) | 2003-01-24 | 2006-10-05 | Moria Taneda | Noise reduction and audio-visual speech activity detection |
US20070002026A1 (en) * | 2005-07-01 | 2007-01-04 | Microsoft Corporation | Keyboard accelerator |
US20070025555A1 (en) | 2005-07-28 | 2007-02-01 | Fujitsu Limited | Method and apparatus for processing information, and computer product |
US20070061148A1 (en) * | 2005-09-13 | 2007-03-15 | Cross Charles W Jr | Displaying speech command input state information in a multimodal browser |
US20070071277A1 (en) | 2003-05-28 | 2007-03-29 | Koninklijke Philips Electronics | Apparatus and method for embedding a watermark using sub-band filtering |
US7199767B2 (en) | 2002-03-07 | 2007-04-03 | Yechezkal Evan Spero | Enhanced vision for driving |
JP2007121489A (en) | 2005-10-26 | 2007-05-17 | Nec Corp | Portable display device |
US20070118520A1 (en) | 2005-11-07 | 2007-05-24 | Google Inc. | Local Search and Mapping for Mobile Devices |
US20070164989A1 (en) | 2006-01-17 | 2007-07-19 | Ciaran Thomas Rochford | 3-Dimensional Graphical User Interface |
US7257575B1 (en) | 2002-10-24 | 2007-08-14 | At&T Corp. | Systems and methods for generating markup-language based expressions from multi-modal and unimodal inputs |
US20070260972A1 (en) * | 2006-05-05 | 2007-11-08 | Kirusa, Inc. | Reusable multimodal application |
US20070273611A1 (en) | 2004-04-01 | 2007-11-29 | Torch William C | Biosensors, communicators, and controllers monitoring eye movement and methods for using them |
US20080005418A1 (en) | 2006-05-09 | 2008-01-03 | Jorge Julian | Interactive interface for electronic devices |
US20080013826A1 (en) | 2006-07-13 | 2008-01-17 | Northrop Grumman Corporation | Gesture recognition interface system |
US20080019589A1 (en) | 2006-07-19 | 2008-01-24 | Ho Sub Yoon | Method and apparatus for recognizing gesture in image processing system |
GB2440348A (en) | 2006-06-30 | 2008-01-30 | Motorola Inc | Positioning a cursor on a computer device user interface in response to images of an operator |
US20080040692A1 (en) | 2006-06-29 | 2008-02-14 | Microsoft Corporation | Gesture input |
US20080059578A1 (en) * | 2006-09-06 | 2008-03-06 | Jacob C Albertson | Informing a user of gestures made by others out of the user's line of sight |
US20080072155A1 (en) * | 2006-09-19 | 2008-03-20 | Detweiler Samuel R | Method and apparatus for identifying hotkey conflicts |
JP2008097220A (en) | 2006-10-10 | 2008-04-24 | Nec Corp | Character input device, character input method and program |
US7379566B2 (en) | 2005-01-07 | 2008-05-27 | Gesturetek, Inc. | Optical flow based tilt sensor |
US20080136916A1 (en) | 2005-01-26 | 2008-06-12 | Robin Quincey Wolff | Eye tracker/head tracker/camera tracker controlled camera/weapon positioner control system |
US20080141181A1 (en) * | 2006-12-07 | 2008-06-12 | Kabushiki Kaisha Toshiba | Information processing apparatus, information processing method, and program |
US20080158096A1 (en) | 1999-12-15 | 2008-07-03 | Automotive Technologies International, Inc. | Eye-Location Dependent Vehicular Heads-Up Display System |
US20080167868A1 (en) | 2007-01-04 | 2008-07-10 | Dimitri Kanevsky | Systems and methods for intelligent control of microphones for speech recognition applications |
US7401783B2 (en) | 1999-07-08 | 2008-07-22 | Pryor Timothy R | Camera based man machine interfaces |
US20080174570A1 (en) | 2006-09-06 | 2008-07-24 | Apple Inc. | Touch Screen Device, Method, and Graphical User Interface for Determining Commands by Applying Heuristics |
JP2008186247A (en) | 2007-01-30 | 2008-08-14 | Oki Electric Ind Co Ltd | Face direction detector and face direction detection method |
US20080255850A1 (en) * | 2007-04-12 | 2008-10-16 | Cross Charles W | Providing Expressive User Interaction With A Multimodal Application |
US20080262849A1 (en) | 2007-02-02 | 2008-10-23 | Markus Buck | Voice control system |
US20080266257A1 (en) | 2007-04-24 | 2008-10-30 | Kuo-Ching Chiang | User motion detection mouse for electronic device |
US20080266530A1 (en) | 2004-10-07 | 2008-10-30 | Japan Science And Technology Agency | Image Display Unit and Electronic Glasses |
US20080276196A1 (en) | 2007-05-04 | 2008-11-06 | Apple Inc. | Automatically adjusting media display in a personal display system |
US20090031240A1 (en) | 2007-07-27 | 2009-01-29 | Gesturetek, Inc. | Item selection using enhanced control |
US20090079813A1 (en) | 2007-09-24 | 2009-03-26 | Gesturetek, Inc. | Enhanced Interface for Voice and Video Communications |
US7519223B2 (en) | 2004-06-28 | 2009-04-14 | Microsoft Corporation | Recognizing gestures and using gestures for interacting with software applications |
US20090153341A1 (en) | 2007-12-13 | 2009-06-18 | Karin Spalink | Motion activated user interface for mobile communications device |
US20090157206A1 (en) | 2007-12-13 | 2009-06-18 | Georgia Tech Research Corporation | Detecting User Gestures with a Personal Mobile Communication Device |
US20090203408A1 (en) * | 2008-02-08 | 2009-08-13 | Novarra, Inc. | User Interface with Multiple Simultaneous Focus Areas |
US20090216529A1 (en) | 2008-02-27 | 2009-08-27 | Sony Ericsson Mobile Communications Ab | Electronic devices and methods that adapt filtering of a microphone signal responsive to recognition of a targeted speaker's voice |
US7587053B1 (en) | 2003-10-28 | 2009-09-08 | Nvidia Corporation | Audio-based position tracking |
US7599712B2 (en) * | 2006-09-27 | 2009-10-06 | Palm, Inc. | Apparatus and methods for providing directional commands for a mobile computing device |
US7603143B2 (en) * | 2005-08-26 | 2009-10-13 | Lg Electronics Inc. | Mobile telecommunication handset having touch pad |
US20090265627A1 (en) | 2008-04-17 | 2009-10-22 | Kim Joo Min | Method and device for controlling user interface based on user's gesture |
US7613310B2 (en) | 2003-08-27 | 2009-11-03 | Sony Computer Entertainment Inc. | Audio input system |
US20090307726A1 (en) | 2002-06-26 | 2009-12-10 | Andrew Christopher Levin | Systems and methods for recommending age-range appropriate episodes of program content |
US20090313584A1 (en) | 2008-06-17 | 2009-12-17 | Apple Inc. | Systems and methods for adjusting a display based on the user's position |
US20100030400A1 (en) * | 2006-06-09 | 2010-02-04 | Garmin International, Inc. | Automatic speech recognition system and method for aircraft |
US20100063880A1 (en) | 2006-09-13 | 2010-03-11 | Alon Atsmon | Providing content responsive to multimedia signals |
US20100082341A1 (en) | 2008-09-30 | 2010-04-01 | Samsung Electronics Co., Ltd. | Speaker recognition device and method using voice signal analysis |
US20100092007A1 (en) | 2008-10-15 | 2010-04-15 | Microsoft Corporation | Dynamic Switching of Microphone Inputs for Identification of a Direction of a Source of Speech Sounds |
US20100105443A1 (en) * | 2008-10-27 | 2010-04-29 | Nokia Corporation | Methods and apparatuses for facilitating interaction with touch screen apparatuses |
US20100122167A1 (en) * | 2008-11-11 | 2010-05-13 | Pantech Co., Ltd. | System and method for controlling mobile terminal application using gesture |
US20100138680A1 (en) * | 2008-12-02 | 2010-06-03 | At&T Mobility Ii Llc | Automatic display and voice command activation with hand edge sensing |
US20100138224A1 (en) * | 2008-12-03 | 2010-06-03 | At&T Intellectual Property I, Lp. | Non-disruptive side conversation information retrieval |
US20100179811A1 (en) | 2009-01-13 | 2010-07-15 | Crim | Identifying keyword occurrences in audio data |
US7761302B2 (en) | 2005-06-03 | 2010-07-20 | South Manchester University Hospitals Nhs Trust | Method for generating output data |
US7760248B2 (en) | 2002-07-27 | 2010-07-20 | Sony Computer Entertainment Inc. | Selective sound source listening in conjunction with computer interactive processing |
US20100188328A1 (en) * | 2009-01-29 | 2010-07-29 | Microsoft Corporation | Environmental gesture recognition |
US20100188426A1 (en) | 2009-01-27 | 2010-07-29 | Kenta Ohmori | Display apparatus, display control method, and display control program |
US20100208914A1 (en) | 2008-06-24 | 2010-08-19 | Yoshio Ohtsuka | Microphone device |
US20100233996A1 (en) | 2009-03-16 | 2010-09-16 | Scott Herz | Capability model for mobile devices |
US20100241431A1 (en) | 2009-03-18 | 2010-09-23 | Robert Bosch Gmbh | System and Method for Multi-Modal Input Synchronization and Disambiguation |
US20100238323A1 (en) | 2009-03-23 | 2010-09-23 | Sony Ericsson Mobile Communications Ab | Voice-controlled image editing |
US20100280983A1 (en) | 2009-04-30 | 2010-11-04 | Samsung Electronics Co., Ltd. | Apparatus and method for predicting user's intention based on multimodal information |
US20100283735A1 (en) * | 2009-05-07 | 2010-11-11 | Samsung Electronics Co., Ltd. | Method for activating user functions by types of input signals and portable terminal adapted to the method |
US20100328319A1 (en) | 2009-06-26 | 2010-12-30 | Sony Computer Entertainment Inc. | Information processor and information processing method for performing process adapted to user motion |
US20100332229A1 (en) * | 2009-06-30 | 2010-12-30 | Sony Corporation | Apparatus control based on visual lip share recognition |
US20110032845A1 (en) | 2009-08-05 | 2011-02-10 | International Business Machines Corporation | Multimodal Teleconferencing |
US20110032182A1 (en) * | 2009-08-10 | 2011-02-10 | Samsung Electronics Co., Ltd. | Portable terminal having plural input devices and method for providing interaction thereof |
US20110035058A1 (en) * | 2009-03-30 | 2011-02-10 | Altorr Corporation | Patient-lifting-device controls |
US20110055846A1 (en) * | 2009-08-31 | 2011-03-03 | Microsoft Corporation | Techniques for using human gestures to control gesture unaware programs |
US20110071830A1 (en) | 2009-09-22 | 2011-03-24 | Hyundai Motor Company | Combined lip reading and voice recognition multimodal interface system |
US20110112921A1 (en) | 2009-11-10 | 2011-05-12 | Voicebox Technologies, Inc. | System and method for providing a natural language content dedication service |
US7949964B2 (en) | 2003-05-29 | 2011-05-24 | Computer Associates Think, Inc. | System and method for visualization of node-link structures |
US20110164105A1 (en) | 2010-01-06 | 2011-07-07 | Apple Inc. | Automatic video stream selection |
US20110184735A1 (en) | 2010-01-22 | 2011-07-28 | Microsoft Corporation | Speech recognition analysis via identification information |
US20110193939A1 (en) * | 2010-02-09 | 2011-08-11 | Microsoft Corporation | Physical interaction zone for gesture-based user interfaces |
US20110205156A1 (en) * | 2008-09-25 | 2011-08-25 | Movea S.A | Command by gesture interface |
EP2365422A2 (en) * | 2010-03-08 | 2011-09-14 | Sony Corporation | Information processing apparatus controlled by hand gestures and corresponding method and program |
US20110244924A1 (en) * | 2010-04-06 | 2011-10-06 | Lg Electronics Inc. | Mobile terminal and controlling method thereof |
US20110254691A1 (en) | 2009-09-07 | 2011-10-20 | Sony Corporation | Display device and control method |
US20110270609A1 (en) | 2010-04-30 | 2011-11-03 | American Teleconferncing Services Ltd. | Real-time speech-to-text conversion in an audio conference session |
US20110285807A1 (en) | 2010-05-18 | 2011-11-24 | Polycom, Inc. | Voice Tracking Camera with Speaker Identification |
US20110291926A1 (en) * | 2002-02-15 | 2011-12-01 | Canesta, Inc. | Gesture recognition system using depth perceptive sensors |
US20110313768A1 (en) | 2010-06-18 | 2011-12-22 | Christian Klein | Compound gesture-speech commands |
US20120015674A1 (en) * | 2010-05-20 | 2012-01-19 | Google Inc. | Automatic Routing of Search Results |
US20120030637A1 (en) * | 2009-06-19 | 2012-02-02 | Prasenjit Dey | Qualified command |
US20120057064A1 (en) | 2010-09-08 | 2012-03-08 | Apple Inc. | Camera-based orientation fix from portrait to landscape |
US8150063B2 (en) | 2008-11-25 | 2012-04-03 | Apple Inc. | Stabilizing directional audio input from a moving microphone array |
US20120131098A1 (en) | 2009-07-24 | 2012-05-24 | Xped Holdings Py Ltd | Remote control arrangement |
WO2012093779A2 (en) * | 2011-01-04 | 2012-07-12 | 목포대학교산학협력단 | User terminal supporting multimodal interface using user touch and breath and method for controlling same |
US8228292B1 (en) * | 2010-04-02 | 2012-07-24 | Google Inc. | Flipping for motion-based input |
US20120221929A1 (en) * | 2008-03-04 | 2012-08-30 | Gregory Dennis Bolsinga | Touch Event Processing for Web Pages |
US20120257121A1 (en) | 2011-04-07 | 2012-10-11 | Sony Corporation | Next generation user interface for audio video display device such as tv with multiple user input modes and hierarchy thereof |
US20120280900A1 (en) | 2011-05-06 | 2012-11-08 | Nokia Corporation | Gesture recognition using plural sensors |
US20120304132A1 (en) * | 2011-05-27 | 2012-11-29 | Chaitanya Dev Sareen | Switching back to a previously-interacted-with application |
US20130016129A1 (en) * | 2011-07-14 | 2013-01-17 | Google Inc. | Region-Specific User Input |
US20130021240A1 (en) | 2011-07-18 | 2013-01-24 | Stmicroelectronics (Rousset) Sas | Method and device for controlling an apparatus as a function of detecting persons in the vicinity of the apparatus |
WO2013021385A2 (en) * | 2011-08-11 | 2013-02-14 | Eyesight Mobile Technologies Ltd. | Gesture based interface system and method |
US20130044080A1 (en) * | 2010-06-16 | 2013-02-21 | Holy Stone Enterprise Co., Ltd. | Dual-view display device operating method |
US20130053007A1 (en) * | 2011-08-24 | 2013-02-28 | Microsoft Corporation | Gesture-based input mode selection for mobile devices |
US20130050458A1 (en) * | 2009-11-11 | 2013-02-28 | Sungun Kim | Display device and method of controlling the same |
US20130050263A1 (en) * | 2011-08-26 | 2013-02-28 | May-Li Khoe | Device, Method, and Graphical User Interface for Managing and Interacting with Concurrently Open Software Applications |
US20130050131A1 (en) * | 2011-08-23 | 2013-02-28 | Garmin Switzerland Gmbh | Hover based navigation user interface control |
US20130063346A1 (en) | 2009-08-28 | 2013-03-14 | Ian George Fletcher-Price | Point and click device for a computer workstation |
US8432366B2 (en) * | 2009-03-03 | 2013-04-30 | Microsoft Corporation | Touch discrimination |
US20130127719A1 (en) * | 2011-11-18 | 2013-05-23 | Primax Electronics Ltd. | Multi-touch mouse |
US20130138424A1 (en) | 2011-11-28 | 2013-05-30 | Microsoft Corporation | Context-Aware Interaction System Using a Semantic Model |
US20130169530A1 (en) | 2011-12-29 | 2013-07-04 | Khalifa University Of Science And Technology & Research (Kustar) | Human eye controlled computer mouse interface |
US20130182914A1 (en) | 2010-10-07 | 2013-07-18 | Sony Corporation | Information processing device and information processing method |
US20130187855A1 (en) * | 2012-01-20 | 2013-07-25 | Microsoft Corporation | Touch mode and input type recognition |
US20130190054A1 (en) | 2012-01-24 | 2013-07-25 | Charles J. Kulas | User interface for a portable device including detecting proximity of a finger near a touchscreen to prevent changing the display |
US20130191779A1 (en) * | 2012-01-20 | 2013-07-25 | Microsoft Corporation | Display of user interface elements based on touch or hardware input |
US20130207898A1 (en) * | 2012-02-14 | 2013-08-15 | Microsoft Corporation | Equal Access to Speech and Touch Input |
US20130227419A1 (en) * | 2012-02-24 | 2013-08-29 | Pantech Co., Ltd. | Apparatus and method for switching active application |
US20130265437A1 (en) * | 2012-04-09 | 2013-10-10 | Sony Mobile Communications Ab | Content transfer via skin input |
US20130293488A1 (en) | 2012-05-02 | 2013-11-07 | Lg Electronics Inc. | Mobile terminal and control method thereof |
US20130304479A1 (en) | 2012-05-08 | 2013-11-14 | Google Inc. | Sustained Eye Gaze for Determining Intent to Interact |
US20130311508A1 (en) * | 2012-05-17 | 2013-11-21 | Grit Denker | Method, apparatus, and system for facilitating cross-application searching and retrieval of content using a contextual user model |
US20130332160A1 (en) | 2012-06-12 | 2013-12-12 | John G. Posa | Smart phone with self-training, lip-reading and eye-tracking capabilities |
US20130344859A1 (en) * | 2012-06-21 | 2013-12-26 | Cellepathy Ltd. | Device context determination in transportation and other scenarios |
US20130342480A1 (en) * | 2012-06-21 | 2013-12-26 | Pantech Co., Ltd. | Apparatus and method for controlling a terminal using a touch input |
US20140007019A1 (en) | 2012-06-29 | 2014-01-02 | Nokia Corporation | Method and apparatus for related user inputs |
US20140043229A1 (en) | 2011-04-07 | 2014-02-13 | Nec Casio Mobile Communications, Ltd. | Input device, input method, and computer program |
US20140050370A1 (en) | 2012-08-15 | 2014-02-20 | International Business Machines Corporation | Ocular biometric authentication with system verification |
US8700392B1 (en) | 2010-09-10 | 2014-04-15 | Amazon Technologies, Inc. | Speech-inclusive device interfaces |
US20140132505A1 (en) | 2011-05-23 | 2014-05-15 | Hewlett-Packard Development Company, L.P. | Multimodal interactions based on body postures |
US8744645B1 (en) | 2013-02-26 | 2014-06-03 | Honda Motor Co., Ltd. | System and method for incorporating gesture and voice recognition into a single system |
US20140168074A1 (en) | 2011-07-08 | 2014-06-19 | The Dna Co., Ltd. | Method and terminal device for controlling content by sensing head gesture and hand gesture, and computer-readable recording medium |
US8788977B2 (en) | 2008-11-20 | 2014-07-22 | Amazon Technologies, Inc. | Movement recognition as input mechanism |
US20140214415A1 (en) | 2013-01-25 | 2014-07-31 | Microsoft Corporation | Using visual cues to disambiguate speech inputs |
US20140210727A1 (en) * | 2011-10-03 | 2014-07-31 | Sony Ericsson Mobile Communications Ab | Electronic device with touch-based deactivation of touch input signaling |
US20140223384A1 (en) | 2011-12-29 | 2014-08-07 | David L. Graumann | Systems, methods, and apparatus for controlling gesture initiation and termination |
US20140282272A1 (en) * | 2013-03-15 | 2014-09-18 | Qualcomm Incorporated | Interactive Inputs for a Background Task |
US20140337016A1 (en) | 2011-10-17 | 2014-11-13 | Nuance Communications, Inc. | Speech Signal Enhancement Using Visual Information |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US20150019227A1 (en) * | 2012-05-16 | 2015-01-15 | Xtreme Interactions, Inc. | System, device and method for processing interlaced multimodal user input |
US9007301B1 (en) | 2012-10-11 | 2015-04-14 | Google Inc. | User interface |
US9026939B2 (en) * | 2013-06-13 | 2015-05-05 | Google Inc. | Automatically switching between input modes for a user interface |
US9035874B1 (en) | 2013-03-08 | 2015-05-19 | Amazon Technologies, Inc. | Providing user input to a computing device with an eye closure |
US20150161992A1 (en) | 2012-07-09 | 2015-06-11 | Lg Electronics Inc. | Speech recognition apparatus and method |
-
2013
- 2013-09-04 US US14/018,331 patent/US11199906B1/en active Active
Patent Citations (205)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4866778A (en) | 1986-08-11 | 1989-09-12 | Dragon Systems, Inc. | Interactive speech recognition apparatus |
US4836670A (en) | 1987-08-19 | 1989-06-06 | Center For Innovative Technology | Eye movement detector |
US5621858A (en) | 1992-05-26 | 1997-04-15 | Ricoh Corporation | Neural network acoustic and visual speech recognition system training method and apparatus |
US5960394A (en) | 1992-11-13 | 1999-09-28 | Dragon Systems, Inc. | Method of speech command recognition with dynamic assignment of probabilities according to the state of the controlled applications |
US5632002A (en) | 1992-12-28 | 1997-05-20 | Kabushiki Kaisha Toshiba | Speech recognition interface system suitable for window systems and speech mail systems |
US5616078A (en) | 1993-12-28 | 1997-04-01 | Konami Co., Ltd. | Motion-controlled video entertainment system |
US5563988A (en) | 1994-08-01 | 1996-10-08 | Massachusetts Institute Of Technology | Method and system for facilitating wireless, full-body, real-time user interaction with a digitally represented visual environment |
US5594469A (en) | 1995-02-21 | 1997-01-14 | Mitsubishi Electric Information Technology Center America Inc. | Hand gesture machine control system |
US6385331B2 (en) | 1997-03-21 | 2002-05-07 | Takenaka Corporation | Hand pointing device |
US6266059B1 (en) * | 1997-08-27 | 2001-07-24 | Microsoft Corporation | User interface for switching between application modes |
US6434255B1 (en) | 1997-10-29 | 2002-08-13 | Takenaka Corporation | Hand pointing apparatus |
US6249763B1 (en) | 1997-11-17 | 2001-06-19 | International Business Machines Corporation | Speech recognition apparatus and method |
GB2350712A (en) | 1998-03-10 | 2000-12-06 | Fujitsu Ltd | Document processor and recording medium |
US6339758B1 (en) | 1998-07-31 | 2002-01-15 | Kabushiki Kaisha Toshiba | Noise suppress processing apparatus and method |
US6185529B1 (en) | 1998-09-14 | 2001-02-06 | International Business Machines Corporation | Speech recognition aided by lateral profile image |
US6272231B1 (en) | 1998-11-06 | 2001-08-07 | Eyematic Interfaces, Inc. | Wavelet-based facial motion capture for avatar animation |
US6750848B1 (en) | 1998-11-09 | 2004-06-15 | Timothy R. Pryor | More useful man machine interfaces and applications |
US7401783B2 (en) | 1999-07-08 | 2008-07-22 | Pryor Timothy R | Camera based man machine interfaces |
US6594629B1 (en) | 1999-08-06 | 2003-07-15 | International Business Machines Corporation | Methods and apparatus for audio-visual speech detection and recognition |
US6518957B1 (en) * | 1999-08-13 | 2003-02-11 | Nokia Mobile Phones Limited | Communications device with touch sensitive screen |
US20080158096A1 (en) | 1999-12-15 | 2008-07-03 | Automotive Technologies International, Inc. | Eye-Location Dependent Vehicular Heads-Up Display System |
US6404438B1 (en) | 1999-12-21 | 2002-06-11 | Electronic Arts, Inc. | Behavioral learning for a visual representation in a communication environment |
US6633305B1 (en) | 2000-06-05 | 2003-10-14 | Corel Corporation | System and method for magnifying and editing images |
US20030023435A1 (en) * | 2000-07-13 | 2003-01-30 | Josephson Daryl Craig | Interfacing apparatus and methods |
US6863609B2 (en) | 2000-08-11 | 2005-03-08 | Konami Corporation | Method for controlling movement of viewing point of simulated camera in 3D video game, and 3D video game machine |
WO2002015560A2 (en) | 2000-08-12 | 2002-02-21 | Georgia Tech Research Corporation | A system and method for capturing an image |
US7039198B2 (en) | 2000-11-10 | 2006-05-02 | Quindi | Acoustic source localization system and method |
US6728680B1 (en) | 2000-11-16 | 2004-04-27 | International Business Machines Corporation | Method and apparatus for providing visual feedback of speed production |
JP2002164990A (en) | 2000-11-28 | 2002-06-07 | Kyocera Corp | Mobile communication terminal |
US20030023953A1 (en) * | 2000-12-04 | 2003-01-30 | Lucassen John M. | MVC (model-view-conroller) based multi-modal authoring tool and development environment |
US20020135618A1 (en) | 2001-02-05 | 2002-09-26 | International Business Machines Corporation | System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input |
US20020194005A1 (en) | 2001-03-27 | 2002-12-19 | Lahr Roy J. | Head-worn, trimodal device to increase transcription accuracy in a voice recognition system and to process unvocalized speech |
US7082393B2 (en) | 2001-03-27 | 2006-07-25 | Rast Associates, Llc | Head-worn, trimodal device to increase transcription accuracy in a voice recognition system and to process unvocalized speech |
US20020178344A1 (en) | 2001-05-22 | 2002-11-28 | Canon Kabushiki Kaisha | Apparatus for managing a multi-modal user interface |
JP2002351603A (en) | 2001-05-25 | 2002-12-06 | Mitsubishi Electric Corp | Portable information processor |
US6959102B2 (en) | 2001-05-29 | 2005-10-25 | International Business Machines Corporation | Method for increasing the signal-to-noise in IR-based eye gaze trackers |
US6868383B1 (en) | 2001-07-12 | 2005-03-15 | At&T Corp. | Systems and methods for extracting meaning from multimodal inputs using finite-state devices |
US7069215B1 (en) | 2001-07-12 | 2006-06-27 | At&T Corp. | Systems and methods for extracting meaning from multimodal inputs using finite-state devices |
US20030028382A1 (en) * | 2001-08-01 | 2003-02-06 | Robert Chambers | System and method for voice dictation and command input modes |
US6927694B1 (en) | 2001-08-20 | 2005-08-09 | Research Foundation Of The University Of Central Florida | Algorithm for monitoring head/eye motion for driver alertness with one camera |
US20060143006A1 (en) | 2001-10-22 | 2006-06-29 | Yasuharu Asano | Speech recognition apparatus and speech recognition method |
US20030083872A1 (en) | 2001-10-25 | 2003-05-01 | Dan Kikinis | Method and apparatus for enhancing voice recognition capabilities of voice recognition software and systems |
US20040205482A1 (en) | 2002-01-24 | 2004-10-14 | International Business Machines Corporation | Method and apparatus for active annotation of multimedia content |
US20110291926A1 (en) * | 2002-02-15 | 2011-12-01 | Canesta, Inc. | Gesture recognition system using depth perceptive sensors |
US20030171921A1 (en) | 2002-03-04 | 2003-09-11 | Ntt Docomo, Inc. | Speech recognition system, speech recognition method, speech synthesis system, speech synthesis method, and program product |
US7199767B2 (en) | 2002-03-07 | 2007-04-03 | Yechezkal Evan Spero | Enhanced vision for driving |
US20040046795A1 (en) * | 2002-03-08 | 2004-03-11 | Revelations In Design, Lp | Electric device control apparatus and methods for making and using same |
US20030190076A1 (en) | 2002-04-05 | 2003-10-09 | Bruno Delean | Vision-based operating method and system |
US20090307726A1 (en) | 2002-06-26 | 2009-12-10 | Andrew Christopher Levin | Systems and methods for recommending age-range appropriate episodes of program content |
US7760248B2 (en) | 2002-07-27 | 2010-07-20 | Sony Computer Entertainment Inc. | Selective sound source listening in conjunction with computer interactive processing |
US20040105573A1 (en) | 2002-10-15 | 2004-06-03 | Ulrich Neumann | Augmented virtual environments |
US7257575B1 (en) | 2002-10-24 | 2007-08-14 | At&T Corp. | Systems and methods for generating markup-language based expressions from multi-modal and unimodal inputs |
US20040080487A1 (en) * | 2002-10-29 | 2004-04-29 | Griffin Jason T. | Electronic device having keyboard for thumb typing |
US20040107103A1 (en) | 2002-11-29 | 2004-06-03 | Ibm Corporation | Assessing consistency between facial motion and speech signals in video |
US20040122666A1 (en) | 2002-12-18 | 2004-06-24 | Ahlenius Mark T. | Method and apparatus for displaying speech recognition results |
US20040140956A1 (en) | 2003-01-16 | 2004-07-22 | Kushler Clifford A. | System and method for continuous stroke word-based text input |
US20060224382A1 (en) | 2003-01-24 | 2006-10-05 | Moria Taneda | Noise reduction and audio-visual speech activity detection |
JP2004318826A (en) | 2003-04-04 | 2004-11-11 | Mitsubishi Electric Corp | Portable terminal device and character input method |
US20070071277A1 (en) | 2003-05-28 | 2007-03-29 | Koninklijke Philips Electronics | Apparatus and method for embedding a watermark using sub-band filtering |
US7949964B2 (en) | 2003-05-29 | 2011-05-24 | Computer Associates Think, Inc. | System and method for visualization of node-link structures |
US20040260438A1 (en) * | 2003-06-17 | 2004-12-23 | Chernetsky Victor V. | Synchronous voice user interface/graphical user interface |
US7613310B2 (en) | 2003-08-27 | 2009-11-03 | Sony Computer Entertainment Inc. | Audio input system |
US20050064912A1 (en) | 2003-09-19 | 2005-03-24 | Ki-Gon Yang | Hand-held phone capable of providing various vibrations with only one vibration motor |
US7587053B1 (en) | 2003-10-28 | 2009-09-08 | Nvidia Corporation | Audio-based position tracking |
US20050133693A1 (en) | 2003-12-18 | 2005-06-23 | Fouquet Julie E. | Method and system for wavelength-dependent imaging and detection using a hybrid filter |
US7301526B2 (en) * | 2004-03-23 | 2007-11-27 | Fujitsu Limited | Dynamic adaptation of gestures for motion controlled handheld devices |
US20050212754A1 (en) * | 2004-03-23 | 2005-09-29 | Marvit David L | Dynamic adaptation of gestures for motion controlled handheld devices |
US20070273611A1 (en) | 2004-04-01 | 2007-11-29 | Torch William C | Biosensors, communicators, and controllers monitoring eye movement and methods for using them |
US20050278467A1 (en) | 2004-05-25 | 2005-12-15 | Gupta Anurag K | Method and apparatus for classifying and ranking interpretations for multimodal input fusion |
US7519223B2 (en) | 2004-06-28 | 2009-04-14 | Microsoft Corporation | Recognizing gestures and using gestures for interacting with software applications |
US20060167784A1 (en) | 2004-09-10 | 2006-07-27 | Hoffberg Steven M | Game theoretic prioritization scheme for mobile ad hoc networks permitting hierarchal deference |
WO2006036069A1 (en) | 2004-09-27 | 2006-04-06 | Hans Gude Gudensen | Information processing system and method |
US20080266530A1 (en) | 2004-10-07 | 2008-10-30 | Japan Science And Technology Agency | Image Display Unit and Electronic Glasses |
US7379566B2 (en) | 2005-01-07 | 2008-05-27 | Gesturetek, Inc. | Optical flow based tilt sensor |
US20060155546A1 (en) | 2005-01-11 | 2006-07-13 | Gupta Anurag K | Method and system for controlling input modalities in a multimodal dialog system |
US20080136916A1 (en) | 2005-01-26 | 2008-06-12 | Robin Quincey Wolff | Eye tracker/head tracker/camera tracker controlled camera/weapon positioner control system |
US20060197753A1 (en) * | 2005-03-04 | 2006-09-07 | Hotelling Steven P | Multi-functional hand-held device |
CN1694045A (en) | 2005-06-02 | 2005-11-09 | 北京中星微电子有限公司 | Non-contact type visual control operation system and method |
US7761302B2 (en) | 2005-06-03 | 2010-07-20 | South Manchester University Hospitals Nhs Trust | Method for generating output data |
US20070002026A1 (en) * | 2005-07-01 | 2007-01-04 | Microsoft Corporation | Keyboard accelerator |
US20070025555A1 (en) | 2005-07-28 | 2007-02-01 | Fujitsu Limited | Method and apparatus for processing information, and computer product |
US7603143B2 (en) * | 2005-08-26 | 2009-10-13 | Lg Electronics Inc. | Mobile telecommunication handset having touch pad |
US20070061148A1 (en) * | 2005-09-13 | 2007-03-15 | Cross Charles W Jr | Displaying speech command input state information in a multimodal browser |
JP2007121489A (en) | 2005-10-26 | 2007-05-17 | Nec Corp | Portable display device |
US20070118520A1 (en) | 2005-11-07 | 2007-05-24 | Google Inc. | Local Search and Mapping for Mobile Devices |
US20070164989A1 (en) | 2006-01-17 | 2007-07-19 | Ciaran Thomas Rochford | 3-Dimensional Graphical User Interface |
US20070260972A1 (en) * | 2006-05-05 | 2007-11-08 | Kirusa, Inc. | Reusable multimodal application |
US20080005418A1 (en) | 2006-05-09 | 2008-01-03 | Jorge Julian | Interactive interface for electronic devices |
US20100030400A1 (en) * | 2006-06-09 | 2010-02-04 | Garmin International, Inc. | Automatic speech recognition system and method for aircraft |
US20080040692A1 (en) | 2006-06-29 | 2008-02-14 | Microsoft Corporation | Gesture input |
GB2440348A (en) | 2006-06-30 | 2008-01-30 | Motorola Inc | Positioning a cursor on a computer device user interface in response to images of an operator |
US20080013826A1 (en) | 2006-07-13 | 2008-01-17 | Northrop Grumman Corporation | Gesture recognition interface system |
US20080019589A1 (en) | 2006-07-19 | 2008-01-24 | Ho Sub Yoon | Method and apparatus for recognizing gesture in image processing system |
US20080059578A1 (en) * | 2006-09-06 | 2008-03-06 | Jacob C Albertson | Informing a user of gestures made by others out of the user's line of sight |
US20080174570A1 (en) | 2006-09-06 | 2008-07-24 | Apple Inc. | Touch Screen Device, Method, and Graphical User Interface for Determining Commands by Applying Heuristics |
US20100063880A1 (en) | 2006-09-13 | 2010-03-11 | Alon Atsmon | Providing content responsive to multimedia signals |
US20080072155A1 (en) * | 2006-09-19 | 2008-03-20 | Detweiler Samuel R | Method and apparatus for identifying hotkey conflicts |
US7599712B2 (en) * | 2006-09-27 | 2009-10-06 | Palm, Inc. | Apparatus and methods for providing directional commands for a mobile computing device |
JP2008097220A (en) | 2006-10-10 | 2008-04-24 | Nec Corp | Character input device, character input method and program |
US20080141181A1 (en) * | 2006-12-07 | 2008-06-12 | Kabushiki Kaisha Toshiba | Information processing apparatus, information processing method, and program |
US20080167868A1 (en) | 2007-01-04 | 2008-07-10 | Dimitri Kanevsky | Systems and methods for intelligent control of microphones for speech recognition applications |
JP2008186247A (en) | 2007-01-30 | 2008-08-14 | Oki Electric Ind Co Ltd | Face direction detector and face direction detection method |
US20080262849A1 (en) | 2007-02-02 | 2008-10-23 | Markus Buck | Voice control system |
US20080255850A1 (en) * | 2007-04-12 | 2008-10-16 | Cross Charles W | Providing Expressive User Interaction With A Multimodal Application |
US20080266257A1 (en) | 2007-04-24 | 2008-10-30 | Kuo-Ching Chiang | User motion detection mouse for electronic device |
US20080276196A1 (en) | 2007-05-04 | 2008-11-06 | Apple Inc. | Automatically adjusting media display in a personal display system |
US20090031240A1 (en) | 2007-07-27 | 2009-01-29 | Gesturetek, Inc. | Item selection using enhanced control |
US20090079813A1 (en) | 2007-09-24 | 2009-03-26 | Gesturetek, Inc. | Enhanced Interface for Voice and Video Communications |
US20090157206A1 (en) | 2007-12-13 | 2009-06-18 | Georgia Tech Research Corporation | Detecting User Gestures with a Personal Mobile Communication Device |
US20090153341A1 (en) | 2007-12-13 | 2009-06-18 | Karin Spalink | Motion activated user interface for mobile communications device |
US20090203408A1 (en) * | 2008-02-08 | 2009-08-13 | Novarra, Inc. | User Interface with Multiple Simultaneous Focus Areas |
US20090216529A1 (en) | 2008-02-27 | 2009-08-27 | Sony Ericsson Mobile Communications Ab | Electronic devices and methods that adapt filtering of a microphone signal responsive to recognition of a targeted speaker's voice |
US20120221929A1 (en) * | 2008-03-04 | 2012-08-30 | Gregory Dennis Bolsinga | Touch Event Processing for Web Pages |
US20090265627A1 (en) | 2008-04-17 | 2009-10-22 | Kim Joo Min | Method and device for controlling user interface based on user's gesture |
US20090313584A1 (en) | 2008-06-17 | 2009-12-17 | Apple Inc. | Systems and methods for adjusting a display based on the user's position |
US20100208914A1 (en) | 2008-06-24 | 2010-08-19 | Yoshio Ohtsuka | Microphone device |
US20110205156A1 (en) * | 2008-09-25 | 2011-08-25 | Movea S.A | Command by gesture interface |
US20100082341A1 (en) | 2008-09-30 | 2010-04-01 | Samsung Electronics Co., Ltd. | Speaker recognition device and method using voice signal analysis |
US20100092007A1 (en) | 2008-10-15 | 2010-04-15 | Microsoft Corporation | Dynamic Switching of Microphone Inputs for Identification of a Direction of a Source of Speech Sounds |
US20100105443A1 (en) * | 2008-10-27 | 2010-04-29 | Nokia Corporation | Methods and apparatuses for facilitating interaction with touch screen apparatuses |
US20100122167A1 (en) * | 2008-11-11 | 2010-05-13 | Pantech Co., Ltd. | System and method for controlling mobile terminal application using gesture |
US9304583B2 (en) | 2008-11-20 | 2016-04-05 | Amazon Technologies, Inc. | Movement recognition as input mechanism |
US8788977B2 (en) | 2008-11-20 | 2014-07-22 | Amazon Technologies, Inc. | Movement recognition as input mechanism |
US8150063B2 (en) | 2008-11-25 | 2012-04-03 | Apple Inc. | Stabilizing directional audio input from a moving microphone array |
US20100138680A1 (en) * | 2008-12-02 | 2010-06-03 | At&T Mobility Ii Llc | Automatic display and voice command activation with hand edge sensing |
US20100138224A1 (en) * | 2008-12-03 | 2010-06-03 | At&T Intellectual Property I, Lp. | Non-disruptive side conversation information retrieval |
US20100179811A1 (en) | 2009-01-13 | 2010-07-15 | Crim | Identifying keyword occurrences in audio data |
US20100188426A1 (en) | 2009-01-27 | 2010-07-29 | Kenta Ohmori | Display apparatus, display control method, and display control program |
US20100188328A1 (en) * | 2009-01-29 | 2010-07-29 | Microsoft Corporation | Environmental gesture recognition |
US8432366B2 (en) * | 2009-03-03 | 2013-04-30 | Microsoft Corporation | Touch discrimination |
US20100233996A1 (en) | 2009-03-16 | 2010-09-16 | Scott Herz | Capability model for mobile devices |
US20100241431A1 (en) | 2009-03-18 | 2010-09-23 | Robert Bosch Gmbh | System and Method for Multi-Modal Input Synchronization and Disambiguation |
US20100238323A1 (en) | 2009-03-23 | 2010-09-23 | Sony Ericsson Mobile Communications Ab | Voice-controlled image editing |
US20110035058A1 (en) * | 2009-03-30 | 2011-02-10 | Altorr Corporation | Patient-lifting-device controls |
US20100280983A1 (en) | 2009-04-30 | 2010-11-04 | Samsung Electronics Co., Ltd. | Apparatus and method for predicting user's intention based on multimodal information |
US20100283735A1 (en) * | 2009-05-07 | 2010-11-11 | Samsung Electronics Co., Ltd. | Method for activating user functions by types of input signals and portable terminal adapted to the method |
US20120030637A1 (en) * | 2009-06-19 | 2012-02-02 | Prasenjit Dey | Qualified command |
US20100328319A1 (en) | 2009-06-26 | 2010-12-30 | Sony Computer Entertainment Inc. | Information processor and information processing method for performing process adapted to user motion |
US20100332229A1 (en) * | 2009-06-30 | 2010-12-30 | Sony Corporation | Apparatus control based on visual lip share recognition |
US20120131098A1 (en) | 2009-07-24 | 2012-05-24 | Xped Holdings Py Ltd | Remote control arrangement |
US20110032845A1 (en) | 2009-08-05 | 2011-02-10 | International Business Machines Corporation | Multimodal Teleconferencing |
US20110032182A1 (en) * | 2009-08-10 | 2011-02-10 | Samsung Electronics Co., Ltd. | Portable terminal having plural input devices and method for providing interaction thereof |
US20130063346A1 (en) | 2009-08-28 | 2013-03-14 | Ian George Fletcher-Price | Point and click device for a computer workstation |
US20110055846A1 (en) * | 2009-08-31 | 2011-03-03 | Microsoft Corporation | Techniques for using human gestures to control gesture unaware programs |
US20110254691A1 (en) | 2009-09-07 | 2011-10-20 | Sony Corporation | Display device and control method |
US20110071830A1 (en) | 2009-09-22 | 2011-03-24 | Hyundai Motor Company | Combined lip reading and voice recognition multimodal interface system |
US20110112921A1 (en) | 2009-11-10 | 2011-05-12 | Voicebox Technologies, Inc. | System and method for providing a natural language content dedication service |
US20130050458A1 (en) * | 2009-11-11 | 2013-02-28 | Sungun Kim | Display device and method of controlling the same |
US20110164105A1 (en) | 2010-01-06 | 2011-07-07 | Apple Inc. | Automatic video stream selection |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US20110184735A1 (en) | 2010-01-22 | 2011-07-28 | Microsoft Corporation | Speech recognition analysis via identification information |
US20110193939A1 (en) * | 2010-02-09 | 2011-08-11 | Microsoft Corporation | Physical interaction zone for gesture-based user interfaces |
EP2365422A2 (en) * | 2010-03-08 | 2011-09-14 | Sony Corporation | Information processing apparatus controlled by hand gestures and corresponding method and program |
US8228292B1 (en) * | 2010-04-02 | 2012-07-24 | Google Inc. | Flipping for motion-based input |
US20110244924A1 (en) * | 2010-04-06 | 2011-10-06 | Lg Electronics Inc. | Mobile terminal and controlling method thereof |
US20110270609A1 (en) | 2010-04-30 | 2011-11-03 | American Teleconferncing Services Ltd. | Real-time speech-to-text conversion in an audio conference session |
US20110285807A1 (en) | 2010-05-18 | 2011-11-24 | Polycom, Inc. | Voice Tracking Camera with Speaker Identification |
US20120015674A1 (en) * | 2010-05-20 | 2012-01-19 | Google Inc. | Automatic Routing of Search Results |
US20130044080A1 (en) * | 2010-06-16 | 2013-02-21 | Holy Stone Enterprise Co., Ltd. | Dual-view display device operating method |
US8296151B2 (en) | 2010-06-18 | 2012-10-23 | Microsoft Corporation | Compound gesture-speech commands |
US20110313768A1 (en) | 2010-06-18 | 2011-12-22 | Christian Klein | Compound gesture-speech commands |
US20120057064A1 (en) | 2010-09-08 | 2012-03-08 | Apple Inc. | Camera-based orientation fix from portrait to landscape |
US8700392B1 (en) | 2010-09-10 | 2014-04-15 | Amazon Technologies, Inc. | Speech-inclusive device interfaces |
US20130182914A1 (en) | 2010-10-07 | 2013-07-18 | Sony Corporation | Information processing device and information processing method |
WO2012093779A2 (en) * | 2011-01-04 | 2012-07-12 | 목포대학교산학협력단 | User terminal supporting multimodal interface using user touch and breath and method for controlling same |
US20120257121A1 (en) | 2011-04-07 | 2012-10-11 | Sony Corporation | Next generation user interface for audio video display device such as tv with multiple user input modes and hierarchy thereof |
US20140043229A1 (en) | 2011-04-07 | 2014-02-13 | Nec Casio Mobile Communications, Ltd. | Input device, input method, and computer program |
US20120280900A1 (en) | 2011-05-06 | 2012-11-08 | Nokia Corporation | Gesture recognition using plural sensors |
US20140132505A1 (en) | 2011-05-23 | 2014-05-15 | Hewlett-Packard Development Company, L.P. | Multimodal interactions based on body postures |
US20120304132A1 (en) * | 2011-05-27 | 2012-11-29 | Chaitanya Dev Sareen | Switching back to a previously-interacted-with application |
US20140168074A1 (en) | 2011-07-08 | 2014-06-19 | The Dna Co., Ltd. | Method and terminal device for controlling content by sensing head gesture and hand gesture, and computer-readable recording medium |
US20130016129A1 (en) * | 2011-07-14 | 2013-01-17 | Google Inc. | Region-Specific User Input |
US20130021240A1 (en) | 2011-07-18 | 2013-01-24 | Stmicroelectronics (Rousset) Sas | Method and device for controlling an apparatus as a function of detecting persons in the vicinity of the apparatus |
WO2013021385A2 (en) * | 2011-08-11 | 2013-02-14 | Eyesight Mobile Technologies Ltd. | Gesture based interface system and method |
US20130050131A1 (en) * | 2011-08-23 | 2013-02-28 | Garmin Switzerland Gmbh | Hover based navigation user interface control |
US20130053007A1 (en) * | 2011-08-24 | 2013-02-28 | Microsoft Corporation | Gesture-based input mode selection for mobile devices |
US20130050263A1 (en) * | 2011-08-26 | 2013-02-28 | May-Li Khoe | Device, Method, and Graphical User Interface for Managing and Interacting with Concurrently Open Software Applications |
US20140210727A1 (en) * | 2011-10-03 | 2014-07-31 | Sony Ericsson Mobile Communications Ab | Electronic device with touch-based deactivation of touch input signaling |
US20140337016A1 (en) | 2011-10-17 | 2014-11-13 | Nuance Communications, Inc. | Speech Signal Enhancement Using Visual Information |
US20130127719A1 (en) * | 2011-11-18 | 2013-05-23 | Primax Electronics Ltd. | Multi-touch mouse |
US20130138424A1 (en) | 2011-11-28 | 2013-05-30 | Microsoft Corporation | Context-Aware Interaction System Using a Semantic Model |
US20140223384A1 (en) | 2011-12-29 | 2014-08-07 | David L. Graumann | Systems, methods, and apparatus for controlling gesture initiation and termination |
US20130169530A1 (en) | 2011-12-29 | 2013-07-04 | Khalifa University Of Science And Technology & Research (Kustar) | Human eye controlled computer mouse interface |
US20130191779A1 (en) * | 2012-01-20 | 2013-07-25 | Microsoft Corporation | Display of user interface elements based on touch or hardware input |
US20130187855A1 (en) * | 2012-01-20 | 2013-07-25 | Microsoft Corporation | Touch mode and input type recognition |
US20130190054A1 (en) | 2012-01-24 | 2013-07-25 | Charles J. Kulas | User interface for a portable device including detecting proximity of a finger near a touchscreen to prevent changing the display |
US20130207898A1 (en) * | 2012-02-14 | 2013-08-15 | Microsoft Corporation | Equal Access to Speech and Touch Input |
US20130227419A1 (en) * | 2012-02-24 | 2013-08-29 | Pantech Co., Ltd. | Apparatus and method for switching active application |
US20130265437A1 (en) * | 2012-04-09 | 2013-10-10 | Sony Mobile Communications Ab | Content transfer via skin input |
US20130293488A1 (en) | 2012-05-02 | 2013-11-07 | Lg Electronics Inc. | Mobile terminal and control method thereof |
US20130304479A1 (en) | 2012-05-08 | 2013-11-14 | Google Inc. | Sustained Eye Gaze for Determining Intent to Interact |
US20150019227A1 (en) * | 2012-05-16 | 2015-01-15 | Xtreme Interactions, Inc. | System, device and method for processing interlaced multimodal user input |
US20130311508A1 (en) * | 2012-05-17 | 2013-11-21 | Grit Denker | Method, apparatus, and system for facilitating cross-application searching and retrieval of content using a contextual user model |
US20130332160A1 (en) | 2012-06-12 | 2013-12-12 | John G. Posa | Smart phone with self-training, lip-reading and eye-tracking capabilities |
US20130342480A1 (en) * | 2012-06-21 | 2013-12-26 | Pantech Co., Ltd. | Apparatus and method for controlling a terminal using a touch input |
US20130344859A1 (en) * | 2012-06-21 | 2013-12-26 | Cellepathy Ltd. | Device context determination in transportation and other scenarios |
US20140007019A1 (en) | 2012-06-29 | 2014-01-02 | Nokia Corporation | Method and apparatus for related user inputs |
US20150161992A1 (en) | 2012-07-09 | 2015-06-11 | Lg Electronics Inc. | Speech recognition apparatus and method |
US20140050370A1 (en) | 2012-08-15 | 2014-02-20 | International Business Machines Corporation | Ocular biometric authentication with system verification |
US9007301B1 (en) | 2012-10-11 | 2015-04-14 | Google Inc. | User interface |
US20140214415A1 (en) | 2013-01-25 | 2014-07-31 | Microsoft Corporation | Using visual cues to disambiguate speech inputs |
US8744645B1 (en) | 2013-02-26 | 2014-06-03 | Honda Motor Co., Ltd. | System and method for incorporating gesture and voice recognition into a single system |
US9035874B1 (en) | 2013-03-08 | 2015-05-19 | Amazon Technologies, Inc. | Providing user input to a computing device with an eye closure |
US20140282272A1 (en) * | 2013-03-15 | 2014-09-18 | Qualcomm Incorporated | Interactive Inputs for a Background Task |
US9026939B2 (en) * | 2013-06-13 | 2015-05-05 | Google Inc. | Automatically switching between input modes for a user interface |
Non-Patent Citations (50)
Title |
---|
"Face Detection: Technology Puts Portraits in Focus", Consumerreports.org, http://www.comsumerreports.org/cro/electronics-computers/camera-photograph/cameras, 2007, 1 page. |
"Final Office Action dated Apr. 16, 2013", U.S. Appl. No. 12/902,986, filed Apr. 16, 2013, 31 pages. |
"Final Office Action dated Feb. 26, 2013", U.S. Appl. No. 12/879,981, filed Feb. 26, 2013, 29 pages. |
"Final Office Action dated Jun. 6, 2013", U.S. Appl. No. 12/332,049, 70 pages. |
"Final Office Action dated Oct. 27, 2011", U.S. Appl. No. 12/332,049, 66 pages. |
"First Office Action dated Mar. 22, 2013", China Application 200980146841.0, 40 pages. |
"International Search Report dated Apr. 7, 2010", International Application PCT/US09/65364, dated Apr. 7, 2010, 2 pages. |
"International Supplementary Search Report dated Aug. 19, 2014" Europe Application 09828299.9, 3 pages. |
"International Supplementary Search Report dated Jul. 23, 2014" Europe Application 09828299.9, 16 pages. |
"International Written Opinion dated Apr. 7, 2010", International Application PCT/US09/65364, Apr. 7, 2010, 7 pages. |
"Introducing the Wii MotionPlus, Nintendo's Upcoming Accessory for The Revolutionary Wii Remote at Nintendo:: What's New", Nintendo Games, http://www.nintendo.com/whatsnew/detail/eMMuRj_N6vntHPDycCJAKWhE09zBvyPH, Jul. 14, 2008, 2 pages. |
"Non Final Office Action dated Apr. 2, 2013", Japan Application 2011-537661, 2 pages. |
"Non Final Office Action dated Aug. 8, 2014" U.S. Appl. No. 13/791,265, 25 pages. |
"Non Final Office Action dated Dec. 26, 2012", U.S. Appl. No. 12/902,986, filed Dec. 26, 2012, 27 pages. |
"Non Final Office Action dated Jun. 11, 2011", U.S. Appl. No. 12/332,049, 53 pages. |
"Non Final Office Action dated Nov. 13, 2012", U.S. Appl. No. 12/879,981, filed Nov. 13, 2012, 27 pages. |
"Non Final Office Action dated Nov. 7, 2012", U.S. Appl. No. 12/332,049, 64 pages. |
"Non Final Office Action dated Oct. 6, 2014" U.S. Appl. No. 14/298,577, 9 pages. |
"Non Final Office Action dated Oct. 8, 2014" U.S. Appl. No. 12/902,986, 37 pages. |
"Notice of Allowance dated May 13, 2013", U.S. Appl. No. 12/879,981, filed May 13, 2013, 9 pages. |
"Office Action dated May 13, 2013", Canada Application 2,743,914, 2 pages. |
"Reexamination Report dated Sep. 9, 2014" Japan Application 2011-537661, 3 pages. |
"Third Office Action dated Jun. 3, 2014" China Application 20098014641.0, 17 pages. |
Blimes, Jeff A. , "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models", International Computer Science Institute 4, No. 510, Email from D. Nguyen to J.O'Neill (Amazon) sent Jun. 5, 2013, 1998, 15 pages. |
Brashear, Helene et al., "Using Multiple Sensors for Mobile Sign Language Recognition", International Symposium on Wearable Computers, 2003, 8 pages. |
Cappelletta, Luca et al.; "Phoneme-to Viseme mapping for visual speech recognition", Department of Electronic and Electrical Engineering, Trinity College Dublin, Ireland, 2012. |
Cornell, Jay , "Does this Headline Know You're Reading It?", h+Magazine,located at <located at <http:l/hplusmagazine.comiarticles/ai/does-headline-know-you%E2%80%99re-reading-it>, last accessed on Jun. 7, 2010, Mar. 19, 2010, 4 pages. |
D. Weimer and S. K. Ganapathy. 1989. A synthetic visual environment with hand gesturing and voice input. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '89), K. Bice and C. Lewis (Eds.). ACM, New York, NY, USA, 235-240. DOI=http://dx.doi.org/10.1145/67449.67495. * |
Faceshift Documentation: Faceshift Studio Beta, http://www.faceshift.com/help/studio/beta/,2012. |
Final Office Action dated Oct. 23, 2013; in corresponding U.S. Appl. No. 12/786,297. |
Haro, Antonio et al., "Mobile Camera-Based Adaptive Viewing", MUM '05 Proceedings of the 4th International Conference on Mobile and Ubiquitous Mulitmedia., 2005, 6 pages. |
Hartley, Richard et al.; "Multiple View Geometry in Computer Vision", vol. 2; Cambridge, 2000. |
Hjelmas, Erik, et al., "Face Detection: A Survey," Computer Vision and Image Understanding 83, No. 3, 2001, pp. 236-274 (previously listed in the IDS filed Nov. 11, 2013 but lined through in the corresponding 1449 because document was omitted). |
Horn, Berthold et al.; "Determining Optical Flow", Artificial Intelligence 17, No. 1, 1981, pp. 185-203. |
International Preliminary Examination Reporton Patentability dated Oct. 17, 2013; in corresponding PCT patent application No. PCT/US2012/032148. |
Lucas, Bruce et al. ; "An Iterative Image Registration Technique with an application to stereo vision", Proceedings of the 7th International Conference on Artificial Intelligence (IJCAI) Aug. 24-28, 1981; Vancouver, British Columbia, 1981, pp. 674-679. |
Niklfeld, Georg, Robert Finan, and Michael Pucher. "Architecture for adaptive multimodal dialog systems based on voiceXML." INTERSPEECH. 2001. * |
Nokia N95 8GB Data Sheet, Nokia, 2007, 1 page. |
Non-Final Office Action dated Mar. 28, 2013; in corresponding U.S. Appl. No. 12/786,297. |
Notice of Allowance and Fee(s) Due dated Jan. 6, 2014; in corresponding U.S. Appl. No. 12/879,981. |
Notice of Allowance and Fee(s) Due dated Mar. 4, 2014; in corresponding U.S. Appl. No. 12/332,049. |
Padilla, Raymond , "Eye Toy (PS2)", <http://www.archive.gamespy.com/hardware/august03/eyetoyps2/index.shtml, Aug. 16, 2003, 2 pages. |
Purcell, , "Maximum Liklihood Estimation Primer", http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_1.html, May 20, 2007. |
Schneider, Jason , "Does Face Detection Technology Really Work? Can the hottest new digital camera feature of 2007 actually improve your people pictures? Here's the surprising answer!", http://www.adorama.com/catalog.tpl?article=052107op=academy_new, May 21, 2007, 5 pages. |
Tyser, Peter , "Control an iPod with Gestures", http://www.videsignline.com/howto/170702555, Sep. 11, 2005, 4 pages. |
Valin, Jean-Marc et al., "Robust Sound Source Localization Using a Microphone Array on a Mobile Robot", Research Laboratory on Mobile Robotics and Intelligent Systems; Department of Electrical Engineering and Computer Engineering; Universite de Sherbrooke, Quebec, Canada, 9 pages. |
Van Veen, Barry D et al., "Beamforming A Versatile Approach to Spatial Filtering", IEEE ASSP Magazine, 1988. |
Vanden Berg, Thomas,; "Near Infrared Light Absorption in the Human Eye Media", Vision Res., vol. 37, No. 2. 1997, pp. 249-253. |
Yang, Ming-Hsuan et al., "Detecting Faces in Images: A Survey", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, No. 1, 2002, pp. 34-58. |
Zyga, Lisa , "Hacking the Wii Remote for Physics Class", PHYSorg.com, http://www.physorg.com/news104502773.html, Jul. 24, 2007, 2 pages. |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230158886A1 (en) * | 2020-03-17 | 2023-05-25 | Audi Ag | Operator control device for operating an infotainment system, method for providing an audible signal for an operator control device, and motor vehicle having an operator control device |
US20220283694A1 (en) * | 2021-03-08 | 2022-09-08 | Samsung Electronics Co., Ltd. | Enhanced user interface (ui) button control for mobile applications |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102423826B1 (en) | User termincal device and methods for controlling the user termincal device thereof | |
US11175726B2 (en) | Gesture actions for interface elements | |
WO2021244443A1 (en) | Split-screen display method, electronic device, and computer readable storage medium | |
US9268407B1 (en) | Interface elements for managing gesture control | |
US9483113B1 (en) | Providing user input to a computing device with an eye closure | |
US10891005B2 (en) | Electronic device with bent display and method for controlling thereof | |
US10139898B2 (en) | Distracted browsing modes | |
US9378581B2 (en) | Approaches for highlighting active interface elements | |
JP6275706B2 (en) | Text recognition driven functionality | |
US9075514B1 (en) | Interface selection element display | |
US9213436B2 (en) | Fingertip location for gesture input | |
US9501218B2 (en) | Increasing touch and/or hover accuracy on a touch-enabled device | |
US20140282269A1 (en) | Non-occluded display for hover interactions | |
US9377860B1 (en) | Enabling gesture input for controlling a presentation of content | |
US9201585B1 (en) | User interface navigation gestures | |
US9411412B1 (en) | Controlling a computing device based on user movement about various angular ranges | |
US11803233B2 (en) | IMU for touch detection | |
WO2021213449A1 (en) | Touch operation method and device | |
US9110541B1 (en) | Interface selection approaches for multi-dimensional input | |
US9400575B1 (en) | Finger detection for element selection | |
US20140354564A1 (en) | Electronic device for executing application in response to user input | |
US9471154B1 (en) | Determining which hand is holding a device | |
KR102030669B1 (en) | Login management method and mobile terminal for implementing the same | |
US9350918B1 (en) | Gesture control for managing an image view display | |
US11199906B1 (en) | Global user input management |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |