WO2018186832A1

WO2018186832A1 - Headsets to activate digital assistants

Info

Publication number: WO2018186832A1
Application number: PCT/US2017/025833
Authority: WO
Inventors: David H. Hanes; Jon R. Dory; John Michael Main
Original assignee: Hewlett-Packard Development Company, L.P.
Priority date: 2017-04-04
Filing date: 2017-04-04
Publication date: 2018-10-11

Abstract

In example implementations, headsets to activate digital assistants and an apparatus for performing the same is provided. The method is performed by a processor of a headset. The method includes detecting a user invoked event. A first signal is transmitted to a host device to block audio to a currently activated application and to activate a digital assistant application on the host device. A second signal is received from the host device that the digital assistant application is activated. An audio cue that the digital assistant application is activated is played.

Description

HEADSETS TO ACTIVATE DIGITAL ASSISTANTS

BACKGROUND

[0001] Technology-based, or smart, devices are being used within the home. The smart devices may include a smartphone, a tablet computer, a desktop computer, or a smart television that can perform different tasks using voice control. For example, a user can speak to the smart device to perform a task. The smart device may be centrally located within a user's home. The user may speak to the smart device to activate a voice control as described above. By speaking to the smart device the user may obtain certain information or perform a task without having to grab the device out of his or her pocket or looking at a display on the device.

[0002] The tasks may include personal assistant type functions to check "to- do" items, appointments on a calendar, obtain travel times, check the weather, obtain the latest news, and the like. Other tasks may include turning lights on in the house, adjusting a thermostat, and the like.

BRIEF DESCRIPTION OF THE DRAWINGS

[0003] FIG. 1 is a block diagram of an example system of the present disclosure;

[0004] FIG. 2 is a block diagram of an example headset of the present disclosure;

[0005] FIG. 3 is a flow diagram of an example method for activating a digital assistant; and

[0006] FIG. 4 is an example non-transitory computer readable medium storing instructions executed by a processor of the present disclosure.

DETAILED DESCRIPTION

[0007] The present disclosure discloses a headset that can detect user invoked events to automatically activate a digital assistant and methods for performing the same. As discussed above, users can speak to a smart device to perform a task. The digital assistant may be executed on a host device (e.g., a desktop computer, a smart phone, a tablet computer, and the like). The user may be using a headset connected to the host device to communicate using the headset.

[0008] Examples of the present disclosure allow a user to invoke an event to automatically direct all voice input to the digital assistant on the host device. For example, the user invoked event may be touching the headset, waving a finger by the headset, pressing a button on the headset, and the like. When the user invoked event is detected, a signal may be sent to the host device to cause the host device to detect which applications are currently using the audio input received from the headset, block the audio from going to the applications that are detected and activate the digital assistant on the host device. As a result, the headset may be used to interact with the digital assistant even when the headset is being used for other applications (e.g., a telephone call, a video call, and the like).

[0009] FIG. 1 illustrates a block diagram of a system 100 of the present disclosure. The system 100 may include a headset 102 and a host device 104. In one example, the headset 102 of the present disclosure may be modified to detect gestures as discussed in further detail below with respect to FIG. 2. The host device 104 may be a desktop computer, a laptop computer, a tablet computer, and the like. In some implementations, the host device 104 may have a display 1 16 (e.g., a monitor, a touch screen, and the like). In other implementations, the host device 104 may be a voice activated device that does not include a display 1 16.

[0010] The host device 104 may include a digital assistant application (DAA) 106 and other applications 108. The DAA 106 may be a voice activated assistant that can perform various voice activated functions. In some implementations, a user may speak an activation word that activates the DAA 106. The user may then interact with the DAA 106 via audio commands. For example, a user may verbally ask the DAA 106 for directions, ask the DAA 106 a question that can be searched on the Internet, ask the DAA 106 to generate a text message based on audio input, and the like.

[0011] The present disclosure provides the headset 102 that is modified, or configured, to activate the DAA 106 on the host device 104 using user invoked events (e.g., non-contact gestures, laterally moving gestures, and the like) detected by the headset 102. In one implementation, the headset 102 may establish a two-way communications path 1 10 with the host device 104. The two-way communications path 1 10 may be a wireless, or wired, communication path. When the headset 102 detects a user invoked event, the headset 102 may send a signal via the two-way communications path 1 10 to activate the DAA 106.

[0012] The detection of the user invoked events may be advantageous for various scenarios. For example, the user may want to activate the DAA 106, but be away from the host device 104. The host device 104 may be out of audible range of the user, or the host device 104 may be in the user's pocket. Thus, the user may use the user invoked event to activate the DAA 106.

[0013] In another scenario, the user may be using an application 108 on the host device 104. For example, the application 108 may be a video conference application and the user is using the headset 102 to communicate over the video conference application. Thus, the user may be not be able to speak the activation word associated with the DAA 106 without interrupting the conversation in the video conference application. However, using the headset 102, the user may use the user invoked event that is detected by the headset 102 without speaking to activate the DAA 106.

[0014] In one example, when the user invoked event is detected by the headset 102, the headset 102 may transmit a first signal over the two-way communications path 1 10 to the host device 104 to activate the DAA 106. In one example, the host device may detect the application 108 that is currently using audio signals from the headset in response to receiving the first signal. The host device 104 may then block audio to the application 108 that is currently using the audio signal from the headset.

[0015] In one implementation, the host device 104 may block audio that is from received from the headset 102 (e.g., from a user speaking through a microphone on the headset), block audio that is transmitted to the headset 102 from the host device 104 (e.g., audio generated by the application 108 or collected from a microphone of the host device 104), or both. In one

implementation, the user may set which audio sources to block using an interface of the host device 104 (e.g., a setting menu or a control panel window).

[0016] The headset 102 may receive a second signal from the host device 104 that the DAA 106 is activated. For example, the second signal may be an audio signal that includes the voice of the DAA 106 indicating that the DAA 106 is ready to receive a command. In another example, the audio signal may be a tone, a beep, or a similar audio signal.

[0017] The user may then interact with the DAA 106 using audio commands that are blocked to the application 108 (e.g., the video conference application). For example, other users on the video conference application would not hear the voice commands issued by the user that is interacting with the DAA 106. When the user has completed interaction with the DAA 106, the DAA 106 may be deactivated and the audio may be unblocked to the application 108 that was activated.

[0018] In one implementation, the user may provide an audio command to deactivate the DAA 106. In another implementation, the DAA 106 may be automatically deactivated after a period of inactivity, or silence (e.g., 15 seconds, 30 seconds, 1 minute, and the like).

[0019] In one example, when the host device 104 blocks the audio signals from the headset 102 or the application 108, the audio signals may be paused or delayed during interaction with the DAA 106. For example, the audio signals from the headset 102 or the application 108 may be temporarily stored in memory or a buffer while the user interacts with the DAA 106. After the DAA 106 is deactivated, the audio signals from the headset 102 or the application 108 may then be played. Any subsequent audio signals to the headset 102 or from the application 108 may also be stored in the memory buffer and played in sequence until the audio signals are "live". This may help prevent audio signals from being heard by users out of sequence (e.g., playing a current audio signal and then interjecting the stored audio signals that were buffered during interaction with the DAA 106).

[0020] In one implementation, when the DAA 106 is activated after the host device 104 receives the first signal, the host device 104 may also determine whether audio signals from the headset 102 should be directed towards the DAA 106 or the application 108. In one example, the host device 104 may make the determination based on detecting an activation word associated with the DAA 106. For example, the DAA 106 may activate when a name is called. The host device 104 may determine that an audio signal that beings with the name should be directed towards the DAA 106 and any other audio signals are directed towards the application 108 and should be blocked or temporarily stored, as described above.

[0021] In another example, the host device 104 may make the determination by analyzing the audio signal. The host device 104 may determine whether the audio signal is more relevant to the context of a conversation with a remote person using the application 108 or to the context of the DAA 106. For example, if the conversation with the remote person contains work related dialogue and suddenly the audio signal asks "what is the weather today?" the host device 104 may determine that the audio signal "what is the weather today?" was directed towards the DAA 106.

[0022] In one implementation, the user invoked event may be a gesture made with a hand 1 12 of the user. In one example, the user invoked event may be a non-contact gesture. For example, the hand 1 12 may be waved near the headset 102 (e.g., side-to-side as shown by an arrow 1 14) that can be detected by the headset 102. In another example, the user invoked event may be a side- to-side swipe against a surface of the headset 102.

[0023] It should be noted that the side-to-side motion is one example. For example, the non-contact gestures may also be made up-down, diagonally, towards and away from the headset 102, and the like. Additional examples of non-contact gestures may include different hand positions (e.g., holding a user's hand still with two fingers up and two fingers down) or holding a hand still while the fingers are moving (e.g., wiggling fingers, a pinching motion with the fingers, spreading fingers, rubbing fingers, pointing fingers up, and the like). Other non- contact gestures may include stationary gestures. For example, a user may hold his or hand or fingers near the headset 102.

[0024] In one example, the user invoked event may be depressing a physical button. For example, a user may press a button that is associated with activating the DAA 106 on the host device 104.

[0025] Notably, the user invoked event includes non-tapping gestures. In other words, the user invoked events do not include tapping gestures such as single taps, double taps, and the like. Rather, the user invoked events may include non-contact gestures or contact gestures that move laterally.

[0026] In one implementation, the headset 102 may be programmed to launch different DAAs 106 using different user invoked events. For example, the host device 104 may include a plurality of different DAAs 106. The user may assign a non-contact gesture to activate a first one of the plurality of different DAAs 106, a contact gesture that slides side-to-side to activate a second one of the plurality of different DAAs 106, and the like.

[0027] In one example, the headset 102 may have a user interface to set, or to assign, the different user invoked events to different DAAs 106. In another example, the host device 104 may include configuration software that provides a user interface for the user to set, or to assign, the different user invoked events to different DAAs 106. The configuration software may be executed when the headset 102 has established the two-way communications path 1 10 with the host device 104. The settings may be established by exchanging data and control signals over the two-way communications path 1 10.

[0028] FIG. 2 illustrates a block diagram of the headset 102 of the present disclosure. In one implementation, the headset 102 may include a processor 202, a communication module 204, a microphone 206, a speaker 208 and a gesture detection sensor 210. The processor 202 may be in communication with the communication module 204, the microphone 206, the speaker 208 and the gesture detection sensor 210. The processor 202 may control operation of the communication module 204, the microphone 206, the speaker 208 and the gesture detection sensor 210.

[0029] In one example, the gesture detection sensor 210 may be used to detect the user invoked events that cause the DAA 106 on the host device 104 to automatically be activated. The gesture detection sensor 210 may be a capacitive sensor that can detect a finger slide, or another gesture motion that moves from side-to-side on a surface of the headset 102. The gesture detection sensor 210 may be an electromagnetic force-based sensor, or a miniature radar, that can detect non-contact gestures, or motion, near the headset 102.

[0030] The communication module 204 may be a Bluetooth® radio, a network adapter, universal serial bus (USB) interface, and the like that can establish the two-way communications path 1 10. The processor 202 may send the first signal via the communication module 204 in response to the gesture detection sensor 210 detecting the user invoked event.

[0031] The speaker 208 may emit an audio cue indicating that the DAA 106 is ready to receive input. For example, the audio cue may be a voice of the DAA 106 telling the user that the DAA 106 is ready to receive input. In another example, the audio cue may be an audible tone, such as a beep, ringtone, and the like.

[0032] The microphone 206 may capture audio commands that are spoken by the user. The processor 202 may receive the audio commands captured by the microphone 206 and process the audio commands for transmission to the host device 104 via the communications module 204.

[0033] In one implementation, the headset 102 may also include memory that stores information. For example, when the host device 104 has a plurality of different DAAs 106 and the user assigns different user invoked events to the plurality of different DAAs 106, the assignments may be stored in memory.

[0034] FIG. 3 illustrates a flow diagram of an example method 300 for activating a digital assistant. In one example, the method 300 may be performed by the headset 102 or an apparatus 400 described below and illustrated in FIG. 4.

[0035] At block 302, the method 300 begins. At block 304, the method 300 detects a user invoked event. For example, a user event detection sensor or a gesture detection sensor on the headset may detect the user invoked event. In one example, the user invoked event may be a non-contact gesture, such as for example, a hand wave near (e.g., a few centimeters to a few inches) a surface of the headset. In one example, the user invoked event may be a laterally moving gesture (e.g., a side-to-side swipe) that contacts a surface of the headset. In one example, the user invoked event may be a button depression.

[0036] At block 306, the method 300 transmits a first signal to a host device connected to the headset to block audio to a currently activated application and to activate a digital assistant application on the host device. For example, the host device may identify an application that is currently using the audio from the headset. After the application is identified, the host device may block subsequently received audio from the headset to the application.

[0037] At block 308, the method 300 receives a second signal from the host device that the digital assistant application is activated. In one implementation, the digital assistant application that is activated may be dependent on the type of user invoked event that is detected. For example, the host device may have several different digital assistant applications. The user may configure different user invoked events to activate different digital assistant applications. In one example, a non-contact gesture may be used to activate a first digital assistant application, a finger swipe against the headset may be used to activate a second digital assistant application, a button depression may be used to activate a third digital assistant application, and the like.

[0038] At block 310, the method 300 plays an audio cue that the digital assistant application is activated. For example, the audio cue may be contained in the second signal that was received in block 308. The audio cue may include an identification of which digital assistant application was activated. For example, the audio cue may include a voice of the digital assistant application indicating that the digital assistant application is ready to receive input. Each digital assistant application may have a unique voice that can be used to identify the digital assistant application.

[0039] In one example, the audio cue may be an audible tone. For example, the audio cue may be a beep, a hngtone, a customized tone selected for a particular digital assistant application, and the like.

[0040] In one implementation, the headset may monitor the interaction of the user with the digital assistant application to detect when the interaction with the digital assistant application is complete. In one example, the user may provide an audio command to close the digital assistant application.

[0041] In another example, a pre-defined period of time (e.g., 15 seconds, 30 seconds, 1 minute, and the like) may be used. When no interaction (e.g., no audio commands) is detected within expiration of the pre-defined period of time, the headset may assume that the interaction with the digital assistant application is complete.

[0042] In another example, a second user invoked event may be used. For example, if the digital assistant application is activated and the user invoked event is detected, then the headset may determine that the user invoked event was to deactivate the digital assistant application. Similarly, if the digital assistant application is not activated and the user invoked event is detected, then the headset may determine that the user invoked event was to activate the digital assistant application to perform blocks 304 and 306, described above.

[0043] After the headset determines that the interaction with the digital assistant application is complete, the headset may send a third signal to the host device. The third signal may cause the host device to deactivate the digital assistant application and pass audio from the headset to the current application. In other words, the blocking of audio to the current application initiated in block 306 may be removed and the audio to the current application may restored. At block 312, the method 300 ends.

[0044] FIG. 4 illustrates an example of an apparatus 400. In one example, the apparatus 400 may be the headset 102. In one example, the apparatus 400 may include a processor 402 and a non-transitory computer readable storage medium 404. The non-transitory computer readable storage medium 404 may include instructions 406, 408, 410 and 412 that when executed by the processor 402, cause the processor 402 to perform various functions.

[0045] In one example, the instructions 406 may include instructions to detect a laterally moving gesture. The instructions 408 may include instructions to transmit a first signal to a host device connected to the headset to activate a digital assistant application on the host device in response to the laterally moving gesture that is detected. The instructions 410 may include instructions to receive a second signal from the host device that the digital assistant application is ready to receive an input. The instructions 412 may include instructions to play an audio cue indicating that the digital assistant application is ready to receive the input.

[0046] It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations, or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.

Claims

1 . A method, comprising:

detecting, by a processor of a headset, a user invoked event;

transmitting, by the processor, a first signal to a host device connected to the headset to block audio to a currently activated application and to activate a digital assistant application on the host device;

receiving, by the processor, a second signal from the host device that the digital assistant application is activated; and

playing, by the processor, an audio cue that the digital assistant application is activated.

2. The method of claim 1 , wherein user invoked event comprises a touching of the headset.

3. The method of claim 2, wherein the touching comprises at least one of: a tap or a swipe.

4. The method of claim 1 , wherein the user invoked event comprises a non- contact motion.

5. The method of claim 1 , wherein the user invoked event comprises a button depression.

6. The method of claim 1 , wherein the transmitting the first signal to the host device causes the host device to detect the application that is currently using audio signals from the headset.

7. The method of claim 1 , comprising:

detecting, by the processor, that interaction with the digital assistant application is complete; and

sending, by the processor, a third signal to the host device to cause the host device to deactivate the digital assistant application and pass the audio from the headset to current application.

8. The method of claim 7, wherein the detecting that the interaction with the digital assistant application is complete based on expiration of a pre-defined period of time or detecting a second user invoked event.

9. A non-transitory computer readable storage medium encoded with instructions executable by a processor of a headset, the non-transitory computer-readable storage medium comprising:

instructions to detect a laterally moving gesture;

instructions to transmit a first signal to a host device connected to the headset to activate a digital assistant application on the host device in response to the laterally moving gesture that is detected;

instructions to receive a second signal from the host device that the digital assistant application is ready to receive an input; and

instructions to play an audio cue indicating that the digital assistant application is ready to receive the input.

10. The non-transitory computer readable storage medium of claim 9, comprising:

instructions to configure a different laterally moving gesture for each one of a plurality of different digital assistant applications on the host device.

1 1 . The non-transitory computer readable storage medium of claim 10, wherein the laterally moving gesture comprises a swipe on a surface of the headset.

12. The non-transitory computer readable storage medium of claim 9, wherein the laterally moving gesture comprises a non-contact movement near the headset.

13. A headset, comprising:

a communication module to establish a two-way communications path with a host device;

a microphone to receive audio inputs from a user;

a speaker to output audio outputs from the host device;

a gesture detection sensor that detects a non-contact movement or a sliding movement on a surface of the headset; and

a processor in communication with the communication module, the microphone, the speaker and the gesture detection sensor, wherein the processor transmits a signal to activate a digital assistant application on the host device in response to a gesture that is detected by the gesture detection sensor.

14. The apparatus of claim 13, wherein the gesture detection sensor comprises a capacitive resistor.

15. The headset of claim 13, wherein the gesture detection sensor comprises an electromagnetic force-based sensor.