WO2014149700A1 - Système et procédé d'affectation de zones de commandes vocales et gestuelles - Google Patents

Système et procédé d'affectation de zones de commandes vocales et gestuelles Download PDF

Info

Publication number
WO2014149700A1
WO2014149700A1 PCT/US2014/020479 US2014020479W WO2014149700A1 WO 2014149700 A1 WO2014149700 A1 WO 2014149700A1 US 2014020479 W US2014020479 W US 2014020479W WO 2014149700 A1 WO2014149700 A1 WO 2014149700A1
Authority
WO
WIPO (PCT)
Prior art keywords
user input
user
voice
air
gesture
Prior art date
Application number
PCT/US2014/020479
Other languages
English (en)
Inventor
Glen J. Anderson
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to EP14769838.5A priority Critical patent/EP2972685A4/fr
Priority to KR1020157021980A priority patent/KR101688359B1/ko
Priority to CN201480009014.8A priority patent/CN105074620B/zh
Priority to JP2015558234A priority patent/JP2016512632A/ja
Publication of WO2014149700A1 publication Critical patent/WO2014149700A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures

Definitions

  • the present disclosure relates to the user interfaces, and, more particularly, to a system and method for assigning voice and air-gesture command areas for interacting with and controlling multiple applications in a computing environment.
  • GUIs graphical user interfaces
  • each window may display information and/or contain an interface for interacting with and controlling corresponding applications executed on the computing system.
  • one window may correspond to a word processing application and display a letter in progress
  • another window may correspond to a web browser and display web page
  • another window may correspond to a media player application and display a video.
  • Windows may be presented on a user's computer display in an area metaphorically referred to as the "desktop".
  • Current computing systems allow a user to maintain a plurality of open windows on the display, such that information associated with each window is continuously and readily available to the user.
  • multiple windows When multiple windows are displayed simultaneously, they may be independently displayed at the same time or may be partially or completely overlapping one another.
  • the presentation of multiple windows on the display may result in a display cluttered with windows and may require the user to continuously manipulate each window to control the content associated with each window.
  • the management of and user interaction with multiple windows within a display may further be complicated in computing systems incorporating user-performed air-gesture input technology.
  • Some current computing systems accept user input through user-performed air- gestures for interacting with and controlling applications on the computing system.
  • these user-performed air-gestures are referred to as air-gestures (as opposed to touch screen gestures).
  • extraneous air-gestures may cause unwanted interaction and input with one of a plurality running applications. This may be particularly true when a user attempts air- gestures in a multi-windowed display, wherein the user intends to interact with only one of the plurality of open windows. For example, a user may wish to control playback of a song on a media player window currently open on a display having additional open windows. The user may perform an air-gesture associated with the "play" command for the media player, such as a wave of the user's hand in a predefined motion. However, the same air-gesture may represent a different command for another application. For example, the air-gesture representing the "play" command on the media player may also represent an "exit" command for the web browser.
  • a user's air-gesture may be ambiguous with regard to the particular application the user intends to control.
  • the computing system may not be able to recognize that the user's air-gesture was intended to control the media player, and instead may cause the user's air-gesture to control a different and unintended application. This may particularly frustrating for the user and require a greater degree of user interaction with the computing system in order to control desired applications and programs.
  • FIG. 1 is a block diagram illustrating one embodiment of a system for assigning voice and air-gesture command areas consistent with the present disclosure
  • FIG. 2 is a block diagram illustrating another embodiment of a system for assigning voice and air-gesture command areas consistent with the present disclosure
  • FIG. 3 is a block diagram illustrating the system of FIG. 1 in greater detail
  • FIG. 4 illustrates an electronic display including an exemplary graphical user interface (GUI) having multiple windows displayed thereon and assigned voice and air-gesture command areas for interacting with the multiple windows consistent with the present disclosure
  • GUI graphical user interface
  • FIG. 5 illustrates a perspective view of a computing environment including the electronic display and GUI and assigned voice and air-gesture command areas of FIG. 4 and a user for interacting with the GUI via the command areas consistent with various embodiments of the present disclosure
  • FIG. 6 is a flow diagram illustrating one embodiment for assigning voice and air-gesture command areas consistent with present disclosure.
  • the present disclosure is generally directed to a system and method for assigning user input command areas for receiving user voice and air-gesture commands and allowing user interaction and control of a plurality of applications based on assigned user input command areas.
  • the system includes a voice and air-gesture capturing system configured to monitor user interaction with one or more applications via a GUI within a computing environment.
  • the GUI may include, for example, multiple open windows presented on an electronic display, wherein each window corresponds to an open and running application.
  • the voice and air-gesture capturing system is configured to allow a user to assign user input command areas for one or more applications corresponding to, for example, each of the multiple windows, wherein each user input command area defines a three-dimensional space within the computing environment and in relation to at least the electronic display.
  • the voice and air-gesture capturing system is configured to receive data captured by one or more sensors in the computing environment, wherein the data includes user speech and/or air- gesture commands within one or more user input command areas.
  • the voice and air-gesture capturing system is further configured to identify user input based on analysis of the captured data. More specifically, the voice and air-gesture capturing system is configured to identify specific voice and/or air-gesture commands performed by the user, as well as corresponding user input command areas in which the voice and/or air-gesture commands occurred.
  • the voice and air-gesture capturing system is further configured to identify an application corresponding to the user input based, at least in part, on the identified user input command area and allow the user to interact with and control the identified application based on the user input.
  • a system consistent with the present disclosure provides a user with an improved means of managing and interacting with a variety of applications by way of assigned user input command areas within a computing environment.
  • the system is configured to provide an efficient and effective means of controlling the applications associated with each window.
  • the system is configured to allow a user to assign three-dimensional command area corresponding to each window presented on the display, such that the user may interact with and control each window and an associated application based on voice and/or air-gesture commands performed within the corresponding three-dimensional command area.
  • a system consistent with the present disclosure allows a user to utilize the same voice and/or air-gesture command to control a variety of different windows by performing such command within one of the assigned user input command areas, thereby lessening the chance for ambiguity and interaction with an unintended window and associated application.
  • the system includes a computing device 12, a voice and air-gesture capturing system 14, one or more sensors 16 and an electronic display 18.
  • the voice and air-gesture capturing system 14 is configured to monitor a computing environment and identify user input and interaction with a graphical user interface (GUI) presented on the electronic display 18 within the computing environment. More specifically, the voice and air-gesture capturing system 14 is configured to allow a user to efficiently and effectively manage multiple open windows of the GUI presented on the electronic display 18, wherein each window corresponds to an open and running application of the computing device 12.
  • GUI graphical user interface
  • the voice and air-gesture capturing system 14 is configured to allow a user to assign user input command areas for each of the multiple windows, wherein each user input command area defines a three-dimensional space within the computing environment and in relation to at least the electronic display 18 (shown in FIGS. 4 and 5).
  • the voice and air-gesture capturing system 14 is configured to receive data captured by the one or more sensors 16 in the computing environment.
  • the one or more sensors 16 may be configured to capture at least one of user speech and air-gesture commands within one or more assigned user input command areas of the computing environment, described in greater detail herein.
  • the voice and air-gesture capturing system 14 Upon receiving and processing data captured by the one or more sensors 16, the voice and air-gesture capturing system 14 is configured to identify user input based on the captured data.
  • the identified user input may include specific voice and/or air-gesture commands performed by the user, as well as corresponding user input command areas in which the voice and/or air- gesture commands occurred.
  • the voice and air-gesture capturing system 14 is further configured to identify a window corresponding to the user input based, at least in part, on the identified user input command area and allow the user to interact with and control the identified window and associated application based on the user input.
  • the computing device 12, voice and air-gesture capturing system 14, one or more sensors 16 and electronic display 18 may be configured to communicate with one another via any known wired or wireless communication transmission protocol.
  • the computing device 12 may include hardware components and/or software components such that the computing device 12 may be used to execute applications, such as gaming applications, non-gaming applications, or the like.
  • applications such as gaming applications, non-gaming applications, or the like.
  • one or more running applications may include associated windows presented on a user interface of the electronic display 18.
  • the computing device 12 may include, but is not limited to, a personal computer (PC) (e.g. desktop or notebook computer), tablet computer, netbook computer, smart phone, portable video game device, video game console, portable digital assistant (PDA), portable media player (PMP), e-book, mobile internet device, personal navigation device, and other computing device.
  • PC personal computer
  • PDA portable digital assistant
  • PMP portable media player
  • e-book mobile internet device
  • personal navigation device personal navigation device
  • the electronic display 18 may include any audiovisual display device configured to receive input from the computing device 12 and voice and air-gesture capturing system 14 and provide visual and/or audio information related to the input.
  • the electronic display 18 is configured to provide visuals and/or audio of one or more applications executed on the computing device 12 and based on user input from the voice and air-gesture capturing system 14.
  • the electronic display 18 may include, but is not limited to, a television, a monitor, electronic billboard, high-definition television (HDTV), or the like.
  • the voice and air-gesture capturing system 14, one or more sensors 16 and electronic display 18 are separate from one another.
  • the computing device 12 may optionally include the one or more sensors 16 and/or electronic display 18, as shown in the system 10a of FIG. 2, for example.
  • the optional inclusion of the one or more sensors 16 and/or electronic display 18 as part of the computing device 12, rather than elements external to computing device 12, is denoted in FIG. 2 with broken lines.
  • the voice and air-gesture capturing system 14 may be separate from the computing device 12.
  • the voice and air-gesture capturing system 14 is configured to receive data captured from at least one sensor 16.
  • the system 10 may include a variety of sensors configured to capture various attributes of at least one user within a computing environment such as, for example physical characteristics of the user, including movement of one or more parts of the user's body, and audible characteristics, including voice input from the user.
  • the system 10 includes at least one camera 20 configured to capture digital images of the computing environment and one or more users within and at least one microphone 22 configured to capture sound data of the environment, including voice data of the one or more users.
  • FIG. 3 further illustrates the voice and air-gesture capturing system 14 of FIG. 1 in greater detail.
  • voice and air-gesture capturing system 14 shown in FIG. 3 is one example of a voice and air-gesture capturing system 14 consistent with the present disclosure.
  • a voice and air-gesture capturing system consistent with the present disclosure may have more or fewer components than shown, may combine two or more components, or a may have a different configuration or arrangement of the components.
  • the various components shown in FIG. 3 may be implemented in hardware, software or a
  • the voice and air-gesture capturing system 14 further includes a speech and gesture recognition module 26 configured to receive data captured by at least one of the sensors 16 and establish user input 28 based on the captured data.
  • the speech and gesture recognition module 26 is configured to receive one or more digital images captured by the at least one camera 20.
  • the camera 20 includes any device (known or later discovered) for capturing digital images representative of a computing environment and one or more users within the computing environment.
  • the camera 20 may include a still camera (i.e., a camera configured to capture still photographs) or a video camera (i.e., a camera configured to capture a plurality of moving images in a plurality of frames).
  • the camera 20 may be configured to capture images in the visible spectrum or with other portions of the electromagnetic spectrum (e.g., but not limited to, the infrared spectrum, ultraviolet spectrum, etc.).
  • the camera 20 may be further configured to capture digital images with depth information, such as, for example, depth values determined by any technique (known or later discovered) for determining depth values, described in greater detail herein.
  • the camera 20 may include a depth camera that may be configured to capture the depth image of a scene within the computing environment.
  • the camera 20 may also include a three-dimensional (3D) camera and/or a RGB camera configured to capture the depth image of a scene.
  • the camera 20 may be incorporated within the computing device 12 and/or voice and air- gesture capturing device 14 or may be a separate device configured to communicate with the computing device 12 and voice and air-gesture capturing system 14 via wired or wireless communication.
  • Specific examples of cameras 120 may include wired (e.g., Universal Serial Bus (USB), Ethernet, Firewire, etc.) or wireless (e.g., WiFi, Bluetooth, etc.) web cameras as may be associated with computers, video monitors, etc., mobile device cameras (e.g., cell phone or smart phone cameras integrated in, for example, the previously discussed example computing devices), integrated laptop computer cameras, integrated tablet computer cameras, etc.
  • wired e.g., Universal Serial Bus (USB), Ethernet, Firewire, etc.
  • wireless e.g., WiFi, Bluetooth, etc.
  • the system 10 may include a single camera 20 within the computing environment positioned in a desired location, such as, for example, adjacent the electronic display 18 (shown in FIG. 5) and configured to capture images of the computing environment and one or more users within the computing environment within close proximity to the electronic display 18.
  • the system 10 may include multiple cameras 20 positioned in various positions within the computing environment to capture images of one or more users within the environment from different angles so as to obtain visual stereo, for example, to be used in determining depth information.
  • the speech and gesture recognition module 26 may be configured to identify one or more parts of a user's body within image(s) provided by the camera 20 and track movement of such identified body parts to determine one or more air-gestures performed by the user.
  • the speech and gesture recognition module 26 may include custom, proprietary, known and/or after-developed identification and detection code (or instruction sets), hardware, and/or firmware that are generally well-defined and operable to receive an image (e.g., but not limited to, a RGB color image) and identify, at least to a certain extent, a user's hand in the image and track the detected hand through a series of images to determine an air-gesture based on hand movement.
  • the speech and gesture recognition module 26 may be configured to identify and track movement of a variety of body parts and regions, including, but not limited to, head, torso, arms, hands, legs, feet and the overall position of a user within a scene.
  • the speech and gesture recognition module 26 may further be configured to identify a specific spatial area within the computing environment in which movement of the user's identified body part occurred.
  • the speech and gesture recognition module 26 may include custom, proprietary, known and/or after-developed spatial recognition code (or instruction sets), hardware, and/or firmware that are generally well-defined and operable to identify, at least to a certain extent, one of a plurality user input command areas in which movement of an identified user body part, such as the user's hand, occurred.
  • the speech and gesture recognition module 26 is further configured to receive voice data of a user in the computing environment captured by the at least one microphone 22.
  • the microphone 22 includes any device (known or later discovered) for capturing voice data of one or more persons, and may have adequate digital resolution for voice analysis of the one or more persons. It should be noted that the microphone 22 may be incorporated within computing device 12 and/or voice and air-gesture capturing system 14 or may be a separate device configured to communicate with the media voice and air-gesture capturing system 14 via any known wired or wireless communication.
  • the speech and gesture recognition module 26 may be configured to use any known speech analyzing methodology to identify particular subject matter of the voice data.
  • the speech and gesture recognition module 26 may include custom, proprietary, known and/or after-developed speech recognition and characteristics code (or instruction sets), hardware, and/or firmware that are generally well- defined and operable to receive voice data and translate speech into text data.
  • the speech and gesture recognition module 26 may be configured to identify one or more spoken commands from the user for interaction with one or more windows of the GUI on the electronic display, as generally understood by one skilled in the art.
  • the speech and gesture recognition module 26 may be further configured to identify a specific spatial area within the computing environment in which user's voice input was projected or occurred within.
  • the speech and gesture recognition module 26 may include custom, proprietary, known and/or after-developed spatial recognition code (or instruction sets), hardware, and/or firmware that are generally well-defined and operable to identify, at least to a certain extent, one of a plurality user input command areas in which a user's voice input was projected towards or within.
  • the system 10 may include a single microphone configured to capture voice data within the computing environment.
  • the system 10 may include an array of microphones positioned throughout the computing environment, each microphone configured to capture voice data of a particular area of the computing environment, thereby enabling spatial recognition.
  • a first microphone may be positioned on one side of the electronic display 18 and configured to capture only voice input directed towards that side of the display 18.
  • a second microphone may be positioned on the opposing side of the display 18 and configured to capture only voice input directed towards that opposing side of the display.
  • the speech and gesture recognition module 26 Upon receiving and analyzing the captured data, including images and/or voice data, from the sensors 16, the speech and gesture recognition module 26 is configured to generate user input 28 based on the analysis of the captured data.
  • the user input 28 may include, but is not limited to, identified air-gestures based on user movement, corresponding user input command areas in which air-gestures occurred, voice commands and corresponding user input command areas in which voice commands were directed towards or occurred within.
  • the voice and gesture capturing system 14 further includes an application control module 30 configured to allow a user to interact with each window and associated application presented on the electronic display 18. More specifically, the application control module 30 is configured to receive user input 28 from the speech and recognition module 26 and identify one or more applications to be controlled based on the user input 28.
  • the voice and gesture capturing system 14 includes an input mapping module 32 configured to allow a user to assign user input command areas for a corresponding one of a plurality of applications or functions configured to be executed on the computing device 12.
  • the input mapping module 32 may include custom, proprietary, known and/or after- developed training code (or instruction sets), hardware, and/or firmware that are generally well- defined and operable to allow a user to assign a predefined user input command area of the computing environment to a corresponding application from an application database 34, such that any user input (e.g. voice and/or air-gesture commands) within an assigned user input command area will result in control of one or more parameters of the corresponding application.
  • any user input e.g. voice and/or air-gesture commands
  • the application control module 30 may be configured to compare data related to the received user input 28 with data associated one or more assignment profiles 33(l)-33(n) stored in the input mapping module 32 to identify an application associated with the user input 28.
  • the application control module 30 may be configured to compare the identified user input command areas of the user input 28 with assignment profiles 33(l)-33(n) in order to find a profile that has matching user input command area.
  • Each assignment profile 33 may generally include data related to one of a plurality of user input command areas of the computing environment and the corresponding application to which the one input command area is assigned.
  • a computing environment may include six different user input command areas, wherein each command area may be associated with a separate application. As such, any voice and/or air-gestures performed within a particular user input command area will only control parameters of the application associated with that particular user input command area.
  • the application control module 30 Upon finding a matching profile in the input mapping module 32, by any known or later discovered matching technique, the application control module 30 is configured to identify an application from the application database 34 to which a user input command area in which voice and/or gesture commands occurred is assigned based on the data of the matching profile. The application control module 30 is further configured to permit user control of one or more parameters of the running application based on the user input 28 (e.g. voice and/or air-gesture commands). As generally understood, each application may have a predefined set of known voice and gesture commands from a corresponding voice and gesture database 36 for controlling various parameters of the application.
  • the voice and air-gesture capturing system 14 further includes a display rendering module 38 configured to receive input from the application control module 30, including user input commands for controlling one or more running applications, and provide audiovisual signals to the electronic display 18 and allow user interaction and control of windows associated with the running applications.
  • the voice and air-gesture capturing system 14 may further include one or more processor(s) 40 configured to perform operations associated with voice and air-gesture capturing system 14 and one or more of the modules included therein.
  • FIG. 4 depicts a front view of one embodiment of an electronic display 18 having an exemplary graphical user interface (GUI) 102 with multiple windows 104(l)-104(n) displayed thereon.
  • GUI graphical user interface
  • each window 104 generally corresponds to an application executed on the computing device 102.
  • window 104(1) may correspond to a media player application
  • window 104(2) may correspond to a video game application
  • window 104(3) may corresponding to a web browser
  • window 104(n) may correspond to a word processing application.
  • some applications configured to be executed on the computing device 12 may not include an associated window presented on the display 18. As such, some user input command areas may be assigned to such applications.
  • user input command areas A-D are included within the computing environment 100.
  • the user input command areas A-D generally define three- dimensional (shown in FIG. 5) spaces in relation to the electronic display 18 and one or more sensor 16 in which the user may perform specific voice and/or air-gesture commands to control one or more applications and corresponding windows 104(l)-104(n).
  • FIG. 5 a perspective view of the computing environment 100 of FIG. 4 is generally illustrated.
  • the computing environment 100 includes the electronic display 18 having a GUI 102 with multiple windows 104(l)-104(n) presented thereon.
  • the one or more sensors 16 are positioned within the computing
  • the computing environment 100 further includes assigned voice and air-gesture command areas A-E and a user 106 interacting with the multi- window GUI 102 via the command areas A-E.
  • each user input command area A-E defines a three-dimensional space within the computing environment 100 and in relation to at least the electronic display 18.
  • the user need only perform one or more voice and/or air-gesture commands within an assigned user input command area A-E associated with the specific window 104.
  • the user 106 may wish to interact with a media player application of window 104(1) and interact with a web browser of window 104(3).
  • the user may have utilized the voice and air-gesture capturing system 14 to assign user input command area C to correspond to window 104(1) and user input command area E to correspond to window 104(3), as previously described.
  • the user may speak and/or perform one or more motions with one or more portions of their body, such as their arms and hands within the computing environment 100.
  • the user 106 may speak predefined voice command in a direction towards user input command area C and perform a predefined air-gesture (e.g. wave their arm upwards) within user input command area E.
  • a predefined air-gesture e.g. wave their arm upwards
  • the camera 20 and microphone 22 are configured to capture data related to user's voice and/or air-gesture commands.
  • the voice and air-gesture capturing system 14 is configured to receive and process the captured data to identify user input, including the predefined voice and air-gesture commands performed by the user 106 and the specific user input command areas (areas C and E, respectively) in which the user's voice and air-gesture commands were performed.
  • the voice and air-gesture capturing system 14 is configured to identify windows 104(1) and 104(3) corresponding to the identified user input command areas (areas C and E, respectively) and further allow the user 106 to control one or more parameters of the applications associated with windows 104(1) and 104(3) (e.g. media player and web browser, respectively) based on the user input.
  • the user input command areas A-E are positioned on all sides of the electronic display 18 (e.g. top, bottom, left and right) as well as the center of the electronic display 18. It should be noted that in other embodiments, the voice and air gesture capturing system 14 may be configured to assign a plurality of different user input command areas in a variety of different dimensions and positions in relation to the electronic display 18 and are not limited to the arrangement depicted in FIGS. 4 and 5.
  • the method includes monitoring a computing environment and at least one user within attempting to interact with a user interface (operation 610).
  • the computing environment may include an electronic display upon which the user interface is displayed.
  • the user interface may have a plurality of open windows, wherein each open window may correspond to an open and running application.
  • the method further includes capturing data related to user speech and/or air air-gesture interaction with the user interface (operation 620).
  • the data may be captured by one or more sensors in the computing environment, wherein the data includes user speech and/or air-gesture commands within one or more assigned user input command areas.
  • Each user input command area defines a three- dimensional space within the computing environment and in relation to at least the electronic display.
  • the method further includes identifying user input and one of a plurality of user input command areas based on analysis of the captured data (operation 630).
  • the user input includes identified voice and/or air-gesture commands performed by the user, as well as corresponding user input command areas in which the identified voice and/or air-gesture commands occurred.
  • the method further includes identifying an associated application presented on the electronic display based, at least in part, on the identified user input command area (operation 640).
  • the method further includes providing user control of the identified associated application based on the user input (operation 650).
  • FIG. 6 illustrates method operations according various embodiments, it is to be understood that in any embodiment not all of these operations are necessary. Indeed, it is fully contemplated herein that in other embodiments of the present disclosure, the operations depicted in FIG. 6 may be combined in a manner not specifically shown in any of the drawings, but still fully consistent with the present disclosure. Thus, claims directed to features and/or operations that are not exactly shown in one drawing are deemed within the scope and content of the present disclosure.
  • FIG. 1 Some of the figures may include a logic flow. Although such figures presented herein may include a particular logic flow, it can be appreciated that the logic flow merely provides an example of how the general functionality described herein can be implemented. Further, the given logic flow does not necessarily have to be executed in the order presented unless otherwise indicated. In addition, the given logic flow may be implemented by a hardware element, a software element executed by a processor, or any combination thereof. The embodiments are not limited to this context.
  • module may refer to software, firmware and/or circuitry configured to perform any of the aforementioned operations.
  • Software may be embodied as a software package, code, instructions, instruction sets and/or data recorded on non- transitory computer readable storage medium.
  • Firmware may be embodied as code, instructions or instruction sets and/or data that are hard-coded (e.g., nonvolatile) in memory devices.
  • Circuitry may comprise, for example, singly or in any combination, hardwired circuitry, programmable circuitry such as computer processors comprising one or more individual instruction processing cores, state machine circuitry, and/or firmware that stores instructions executed by programmable circuitry.
  • the modules may, collectively or individually, be embodied as circuitry that forms part of a larger system, for example, an integrated circuit (IC), system on-chip (SoC), desktop computers, laptop computers, tablet computers, servers, smart phones, etc. Any of the operations described herein may be implemented in a system that includes one or more storage mediums having stored thereon, individually or in combination, instructions that when executed by one or more processors perform the methods.
  • the processor may include, for example, a server CPU, a mobile device CPU, and/or other programmable circuitry.
  • the storage medium may include any type of tangible medium, for example, any type of disk including hard disks, floppy disks, optical disks, compact disk read-only memories (CD-ROMs), compact disk rewritables (CD-RWs), and magneto-optical disks, semiconductor devices such as read-only memories (ROMs), random access memories (RAMs) such as dynamic and static RAMs, erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), flash memories, Solid State Disks (SSDs), magnetic or optical cards, or any type of media suitable for storing electronic instructions.
  • ROMs read-only memories
  • RAMs random access memories
  • EPROMs erasable programmable read-only memories
  • EEPROMs electrically erasable programmable read-only memories
  • SSDs Solid State Disks
  • inventions may be implemented as software modules executed by a programmable control device.
  • the storage medium may be non-transitory.
  • various embodiments may be implemented using hardware elements, software elements, or any combination thereof.
  • hardware elements may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
  • an apparatus for assigning voice and air-gesture command areas may include a recognition module configured to receive data captured by at least one sensor related to a computing environment and at least one user within and identify one or more attributes of the user based on the captured data.
  • the recognition module is further configured to establish user input based on the user attributes, wherein the user input includes at least one of a voice command and air-gesture command and a corresponding one of a plurality of user input command areas in which the voice or air-gesture command occurred.
  • the apparatus may further include an application control module configured to receive and analyze the user input and an application to be controlled by the user input based, at least in part, on the user input command area in which the user input occurred.
  • the application control module is further configured to permit user interaction with and control of one or more parameters of the identified application based on the user input.
  • the above example apparatus may be further configured, wherein the at least one sensor is a camera configured to capture one or more images of the computing environment and the at least one user within.
  • the example apparatus may be further configured, wherein the recognition module is configured to identify and track movement of one or more user body parts based on the captured images and determine one or more air-gesture commands corresponding to the identified user body part movements and identify a corresponding user input command area in which each air-gesture command occurred.
  • the above example apparatus may be further configured, alone or in combination with the above further configurations, wherein the at least one sensor is a microphone configured to capture voice data of the user within the computing environment.
  • the example apparatus may be further configured, wherein the recognition module is configured to identify one or more voice commands from the user based on the captured voice data and identify a corresponding user input command area in which each voice command occurred or was directed towards.
  • the above example apparatus may further include, alone or in combination with the above further configurations, an input mapping module configured to allow a user to assign one of the plurality of user input command areas to a corresponding one of a plurality of applications.
  • the example apparatus may be further configured, wherein the input mapping module includes one or more assignment profiles, each assignment profile includes data related to one of the plurality of user input command areas and a corresponding application to which the one user input command area is assigned.
  • the example apparatus may be further configured, wherein the application control module is configured to compare user input received from the recognition module with each of the assignment profiles to identify an application associated the user input.
  • the example apparatus may be further configured, wherein the application control module is configured to compare identified user input command areas of the user input with user input command areas of each of the assignment profiles and identify a matching assignment profile based on the comparison.
  • each user input command area includes a three- dimensional space within the computing environment and is positioned relative to an electronic display upon which a multi-window user interface is presented, wherein some of the windows correspond to applications.
  • the method may include monitoring a computing environment and at least one user within the computing environment attempting to interact with a user interface, receiving data captured by at least one sensor within the computing environment, identifying one or more attributes of the at least one user in the computing environment based on the captured data and establishing user input based on the user attributes, the user input including at least one of a voice command and an air-gesture command and a corresponding one of a plurality of user input command areas in which the voice or air-gesture command occurred and identifying an application to be controlled by the user input based, at least in part, on the corresponding user input command area.
  • the above example method may further include permitting user control of one or more parameters of the identified associated application based on the user input.
  • the above example method may further include, alone or in combination with the above further configurations, assigning one of the plurality of user input command areas to a corresponding one of a plurality of applications and generating an assignment profile having data related to the one of the plurality of user input command areas and the corresponding application to which the user input command area is assigned.
  • the example method may be further configured, wherein the identifying an application to be controlled by the user input includes comparing user input with a plurality of assignment profiles having data related to an application and one of the plurality of user input command areas assigned to the application and identifying an assignment profile having data matching the user input based on the comparison.
  • the example method may be further configured, wherein the identifying a matching assignment profile includes comparing identified user input command areas of the user input with user input command areas of each of the assignment profiles and identifying an assignment profile having a matching user input command area.
  • At least one computer accessible medium storing instructions which, when executed by a machine, cause the machine to perform the operations of any of the above example methods.
  • the system may include means for monitoring a computing environment and at least one user within the computing environment attempting to interact with a user interface, means for receiving data captured by at least one sensor within the computing environment, means for identifying one or more attributes of the at least one user in the computing
  • the user input including at least one of a voice command and an air-gesture command and a corresponding one of a plurality of user input command areas in which the voice or air-gesture command occurred and means for identifying an application to be controlled by the user input based, at least in part, on the corresponding user input command area.
  • the above example system may further include means for permitting user control of one or more parameters of the identified associated application based on the user input.
  • the above example system may further include, alone or in combination with the above further configurations, means for assigning one of the plurality of user input command areas to a corresponding one of a plurality of applications and means for generating an assignment profile having data related to the one of the plurality of user input command areas and the corresponding application to which the user input command area is assigned.
  • the example system may be further configured, wherein the identifying an application to be controlled by the user input includes means for comparing user input with a plurality of assignment profiles having data related to an application and one of the plurality of user input command areas assigned to the application and means for identifying an assignment profile having data matching the user input based on the comparison.
  • the example system may be further configured, wherein the identifying a matching assignment profile includes means for comparing identified user input command areas of the user input with user input command areas of each of the assignment profiles and identifying an assignment profile having a matching user input command area.

Abstract

L'invention concerne un système et un procédé destinés à affecter des zones de commandes de saisie d'utilisateur servant à recevoir des commandes de l'utilisateur par la voix et des gestes dans l'air et à permettre à l'utilisateur d'interagir avec et de commander des applications multiples d'un dispositif informatique. Le système comprend un système de capture de voix et de gestes dans l'air, configuré pour permettre à un utilisateur d'affecter des zones tridimensionnelles de commandes par saisie d'utilisateur au sein de l'environnement informatique pour chacune des applications multiples. Le système de capture de voix et de gestes dans l'air est configuré pour recevoir des données capturées par un ou plusieurs capteurs dans l'environnement informatique et identifier une saisie d'utilisateur en se basant sur les données, notamment des commandes par la voix et/ou par des gestes dans l'air de l'utilisateur à l'intérieur d'une ou de plusieurs zones de commandes par saisie d'utilisateur. Le système de capture de voix et de gestes dans l'air est en outre configuré pour identifier une application correspondant à la saisie d'utilisateur en se basant sur la zone identifiée de commandes par saisie d'utilisateur, et de permettre une interaction de l'utilisateur avec l'application identifiée sur la base de la saisie d'utilisateur.
PCT/US2014/020479 2013-03-15 2014-03-05 Système et procédé d'affectation de zones de commandes vocales et gestuelles WO2014149700A1 (fr)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP14769838.5A EP2972685A4 (fr) 2013-03-15 2014-03-05 Système et procédé d'affectation de zones de commandes vocales et gestuelles
KR1020157021980A KR101688359B1 (ko) 2013-03-15 2014-03-05 음성 및 제스처 명령 영역들을 할당하기 위한 시스템 및 방법
CN201480009014.8A CN105074620B (zh) 2013-03-15 2014-03-05 用于指派语音和手势命令区域的系统和方法
JP2015558234A JP2016512632A (ja) 2013-03-15 2014-03-05 音声およびジェスチャー・コマンド領域を割り当てるためのシステムおよび方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201313188405A 2013-03-15 2013-03-15
US18/840,525 2013-03-15

Publications (1)

Publication Number Publication Date
WO2014149700A1 true WO2014149700A1 (fr) 2014-09-25

Family

ID=51580645

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/020479 WO2014149700A1 (fr) 2013-03-15 2014-03-05 Système et procédé d'affectation de zones de commandes vocales et gestuelles

Country Status (1)

Country Link
WO (1) WO2014149700A1 (fr)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002093344A1 (fr) 2001-05-14 2002-11-21 Koninklijke Philips Electronics N.V. Procede destine a interagir avec des flux de contenu en temps reel
JP2011192081A (ja) 2010-03-15 2011-09-29 Canon Inc 情報処理装置及びその制御方法
US20110254791A1 (en) * 2008-12-29 2011-10-20 Glenn A Wong Gesture Detection Zones
US20120127072A1 (en) * 2010-11-22 2012-05-24 Kim Hyeran Control method using voice and gesture in multimedia device and multimedia device thereof
US20130009861A1 (en) * 2011-07-04 2013-01-10 3Divi Methods and systems for controlling devices using gestures and related 3d sensor
US20130027296A1 (en) * 2010-06-18 2013-01-31 Microsoft Corporation Compound gesture-speech commands
US20130033644A1 (en) * 2011-08-05 2013-02-07 Samsung Electronics Co., Ltd. Electronic apparatus and method for controlling thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002093344A1 (fr) 2001-05-14 2002-11-21 Koninklijke Philips Electronics N.V. Procede destine a interagir avec des flux de contenu en temps reel
US20110254791A1 (en) * 2008-12-29 2011-10-20 Glenn A Wong Gesture Detection Zones
JP2011192081A (ja) 2010-03-15 2011-09-29 Canon Inc 情報処理装置及びその制御方法
US20130027296A1 (en) * 2010-06-18 2013-01-31 Microsoft Corporation Compound gesture-speech commands
US20120127072A1 (en) * 2010-11-22 2012-05-24 Kim Hyeran Control method using voice and gesture in multimedia device and multimedia device thereof
US20130009861A1 (en) * 2011-07-04 2013-01-10 3Divi Methods and systems for controlling devices using gestures and related 3d sensor
US20130033644A1 (en) * 2011-08-05 2013-02-07 Samsung Electronics Co., Ltd. Electronic apparatus and method for controlling thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2972685A4 *

Similar Documents

Publication Publication Date Title
US20140282273A1 (en) System and method for assigning voice and gesture command areas
US11354825B2 (en) Method, apparatus for generating special effect based on face, and electronic device
US11710351B2 (en) Action recognition method and apparatus, and human-machine interaction method and apparatus
US11625841B2 (en) Localization and tracking method and platform, head-mounted display system, and computer-readable storage medium
US20140281975A1 (en) System for adaptive selection and presentation of context-based media in communications
US20150088515A1 (en) Primary speaker identification from audio and video data
US10438588B2 (en) Simultaneous multi-user audio signal recognition and processing for far field audio
US20170046965A1 (en) Robot with awareness of users and environment for use in educational applications
US10831440B2 (en) Coordinating input on multiple local devices
WO2020220809A1 (fr) Procédé et dispositif de reconnaissance d'action pour objet cible, et appareil électronique
US20190155484A1 (en) Method and apparatus for controlling wallpaper, electronic device and storage medium
TW201250609A (en) Gesture recognition using depth images
KR20110076458A (ko) 디스플레이 장치 및 그 제어방법
US20190026548A1 (en) Age classification of humans based on image depth and human pose
US10649536B2 (en) Determination of hand dimensions for hand and gesture recognition with a computing interface
US20180174586A1 (en) Speech recognition using depth information
US20170177087A1 (en) Hand skeleton comparison and selection for hand and gesture recognition with a computing interface
US20240104744A1 (en) Real-time multi-view detection of objects in multi-camera environments
US10861169B2 (en) Method, storage medium and electronic device for generating environment model
KR20210124313A (ko) 인터랙티브 대상의 구동 방법, 장치, 디바이스 및 기록 매체
KR20200054354A (ko) 전자 장치 및 그 제어 방법
JP7268063B2 (ja) 低電力のリアルタイムオブジェクト検出用のシステム及び方法
US11057549B2 (en) Techniques for presenting video stream next to camera
KR20210000671A (ko) 헤드 포즈 추정
WO2017052861A1 (fr) Entrée de calcul perceptive pour déterminer des effets post-production

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201480009014.8

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14769838

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20157021980

Country of ref document: KR

Kind code of ref document: A

Ref document number: 2015558234

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2014769838

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE