WO2022031260A1

WO2022031260A1 - A gesture input for a wearable device

Info

Publication number: WO2022031260A1
Application number: PCT/US2020/044711
Authority: WO
Inventors: Yao Ding; Matteo CARRARA
Original assignee: Google Llc
Priority date: 2020-08-03
Filing date: 2020-08-03
Publication date: 2022-02-10

Abstract

The present disclosure provides devices and methods for determining an input command based on a gesture of a user. The device may be a hearable device such that the device has one or more microphones configured to receive audio input. The audio input may be the noise created by the gesture. For example, the gesture may be a swipe gesture or a tap gesture on the skin in the region of the device. Each microphone on the device may receive the audio input at different times and/or different intensities. Based on the time and/or intensity each microphone receives the audio input, the type of gesture may be determined. The type of gesture may be used to determine an input command.

Description

A GESTURE INPUT FOR A WEARABLE DEVICE

BACKGROUND

[0001] Wearable and hearable devices have inputs that allow a user to provide commands. For example, when the device is a pair of earbuds, the earbuds may have inputs that allow the user to start or stop content, adjust the playback volume, or fast forward or rewind content. These inputs may be a touchpad that receives a physical input or a microphone that receives a voice command. A physical input, such as a touch or a tap, may provide undesirable noise or discomfort for the user when the user swipes or taps the earbud. Moreover, a touch or a tap on the earbud may affect the performance of the antenna or active noise cancellation (ANC), as the antenna or ANC microphones may be in the same location or nearby the touch area. A voice command may be embarrassing or disruptive for a user when in a quiet or discreet location. BRIEF SUMMARY

[0002] The present disclosure provides systems and methods for determining an input command based on a gesture of a user. The gesture may be a swipe gesture or a tap gesture. The gesture may be performed on the skin of the body of the user in a region near a wearable device. For example, the wearable device may have one or more microphones that receive audio input created by the gesture of the user. The wearable device may determine the type of gesture. In examples where the gesture is a swipe, the device may determine the direction of the gesture. Based on the type of gesture and/or direction of the gesture, the device may determine an input command. The input command may be increasing or decreasing the playback volume, fast-forwarding or rewinding the content, answering or ending a call, muting a notification, etc.

[0003] One aspect of the disclosure includes a wearable electronic device comprising one or more microphones and one or more processors in communication with the one or more microphones. The one or more processors may be configured to receive, at each of the one or more microphones, an audio input based on a gesture of a user, determine, based on the received audio input, a first location of the gesture and a second location of the gesture, and determine, based on the first location and second location of the gesture, an input command. The gesture may be a swipe gesture or a tap gesture. When the gesture is the swipe gesture, the first location may be different than the second location. When the gesture is the swipe or tap gesture, the user may swipe or tape an area of skin in the region of the device.

[0004] Determining the first location and the second location may include comparing audio signals captured by each of the one or more microphones. The one or more processors may be configured to determine, based on the first location and the second location, a trajectory of the gesture.

[0005] The device may be an earbud. The earbud may include a housing shaped to be worn on a human body, wherein at least one first surface of the housing is shaped to come in contact with the human body and at least one second surface of the housing is shaped to be exposed when worn on the body, and wherein at least two of the one or more microphones are located on the second surface. The gesture may be a tap or swipe on an area of skin on an ear or the face of the user.

[0006] Another aspect of the disclosure includes a method comprising receiving, at each of one or more microphones, an audio input based on a gesture of a user, determining, by one or more processors based on the received audio input, a first location of the gesture and a second location of the gesture, and determining, by the one or more processors based on the first location and second location of the gesture, an input command.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] Figure 1 A is a functional diagram of an example device according to aspects of the disclosure.

[0008] Figure IB is a functional block diagram of a device system in accordance with aspects of the disclosure.

[0009] Figure 2A is a pictorial diagram illustrating an example use of the device according to aspects of the disclosure.

[0010] Figure 2B is a pictorial diagram illustrating an example use of the device according to aspects of the disclosure.

[0011] Figure 3 is a pictorial diagram illustrating an example use of the device according to aspects of the disclosure.

[0012] Figures 4A and 4B are graphical representations illustrating an example use of the device according to aspects of the disclosure.

[0013] Figure 5 is a flow diagram illustrating a method of adjusting the playback speed of an accessory according to aspects of the disclosure.

DETAILED DESCRIPTION

[0014] A wearable, or hearable, device may use one or more microphones to determine an input command based on a gesture of a user. The wearable or hearable device may be any device that is capable of receiving an audible input, such as earbuds, smart glasses, smart watch, AR/VR headsets, helmets, etc. The one or more microphones of the device may receive an audio input from the gesture of the user. The gesture may be a swipe or a tap gesture. The user may swipe or tap the skin in the region around the device. The friction between the user’s finger(s) and the skin of the user as the user swipes the skin near the device may be received as an audio input, or signal, by the one or more microphones. Additionally or alternatively, the noise created by the user tapping their finger! s) on the skin may be received as audio input by the one or more microphones. The user may tap or swipe the skin in a region near the device. For example, if the device is a pair of earbuds, the user may provide a gesture on the skin of the ear wearing the earbud or the skin on the face of the user near the ear wearing the earbud. In examples where the device is a smartwatch, the user may provide a gesture on the skin near the wrist, forearm, or hand in the region of the smartwatch. [0015] The device may determine, based on the time each of the one or more microphones receives the audio input and/or the sound intensity of the audio input received by each microphone, the type of gesture. For example, the device may determine a first location of the gesture and a second location of the gesture based on the received audio input. In examples where the gesture is a swipe gesture, the first location may be different than the second location. In examples where the gesture is a tap gesture, the first location may be the same, or substantially the same, as the second location.

[0016] Different gestures may trigger the device to perform any of a variety of functions. By way of example only, a swipe gesture may adjust the playback volume of the device, while a tap gesture may start or stop the audio content being played by the device. While these are merely a few examples, it should be understood that any number of different gestures are possible, each triggering any of a variety of functions. [0017] Providing a gesture input on the skin in a region near the device may provide the user a better user experience as compared to making contact with the device itself to provide input or using voice commands. A gesture input on the skin of the user in the region of the device may provide for a larger surface for the user to provide the input. This may allow for adaptive use for a plurality of users. Additionally or alternatively, the larger surface for the gesture input may make it easier for the user to provide an up-down swipe, left-right swipe, or a tapping gesture without having to worry about where to provide the input on the device. According to some examples, a gesture input on the skin in the region near the device may prevent any loud or uncomfortable sounds in the ear canal as compared to when the gesture input is provided on the device itself. Additionally or alternatively, a gesture input on the skin in the region near the device may reduce the risk of moving the device. In examples where the device is an earbud, the gesture input on the skin may reduce the risk of moving the earbud in the user’ s ear. According to some examples, providing a gesture on the skin in a region of the device may provide for a more discreet user experience as compared to using voice commands.

[0018] Figure 1A illustrates an example system 100A in which the features described herein may be implemented. It should not be considered limiting the scope of the disclosure or usefulness of the features described herein. In this example, system 100A may include a wearable device 130. A wearable device may be a device that is capable of detecting and/or receiving audio input using one or more microphones. For example, a wearable device may be earbuds, a smartwatch, a headset, smartglasses, a VR/AR headset, etc. As shown, the wearable device 130 may be a pair of earbuds 110, 120.

[0019] Figure IB illustrates an example system 100B in which the features described above and herein may be implemented. In this example, system 100B may include wearable devices 110, 120. Wearable device 110 may contain one or more processors 114, memory 116, instructions 111, data 119, and one or more microphones 118.

[0020] The one or more processors 114 may be any conventional processors, such as commercially available microprocessors. Alternatively, the one or more processors may be a dedicated device such as an application specific integrated circuit (ASIC) or other hardware-based processor. Although Figure IB functionally illustrates the processor, memory, and other elements of wearable device 110 as being within the same block, it will be understood by those of ordinary skill in the art that the processor, computing device, or memory may actually include multiple processors, computing devices, or memories that may or may not be stored within the same physical housing. Similarly, the memory may be a hard drive or other storage media located in a housing different from that of wearable device 110. Accordingly, references to a processor or computing device will be understood to include references to a collection of processors or computing devices or memories that may or may not operate in parallel.

[0021] Memory 116 may store information that is accessible by the processors, including instructions 111 that may be executed by the processors 114, and data 119. The memory 116 may be a type of memory operative to store information accessible by the processors 114, including a non-transitory computer- readable medium, or other medium that stores data that may be read with the aid of an electronic device, such as a hard-drive, memory card, read-only memory ("ROM"), random access memory ("RAM"), optical disks, as well as other write-capable and read-only memories. The subject matter disclosed herein may include different combinations of the foregoing, whereby different portions of the instructions 101 and data 119 are stored on different types of media.

[0022] Memory 116 may be retrieved, stored or modified by processors 114 in accordance with the instructions 111. For instance, although the present disclosure is not limited by a particular data structure, the data 119 may be stored in computer registers, in a relational database as a table having a plurality of different fields and records, XML documents, or flat files. The data 119 may also be formatted in a computer-readable format such as, but not limited to, binary values, ASCII or Unicode. By further way of example only, the data 119 may be stored as bitmaps comprised of pixels that are stored in compressed or uncompressed, or various image formats (e.g., JPEG), vector-based formats (e.g., SVG) or computer instructions for drawing graphics. Moreover, the data 119 may comprise information sufficient to identify the relevant information, such as numbers, descriptive text, proprietary codes, pointers, references to data stored in other memories (including other network locations) or information that is used by a function to calculate the relevant data.

[0023] The instructions 111 can be any set of instructions to be executed directly, such as machine code, or indirectly, such as scripts, by the processor 114. In that regard, the terms “instructions,” “application,” “steps,” and “programs” can be used interchangeably herein. The instructions can be stored in object code format for direct processing by the processor, or in any other computing device language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance. Functions, methods and routines of the instructions are explained in more detail below.

[0024] The wearable device 110 may further include one or more microphones 118. The microphones 118 of wearable device may be located on a surface of the housing of wearable device 110 that is exposed when wearable device 110 is worn on the body.

[0025] The microphones 118 may be able to receive audio input. The audio input may be the sound created by a user tapping or swiping on the skin near the wearable device 110. For example, as the user swipes the skin near wearble device 110, friction between the skin in the region near the device and the object being used to swipe the skin may create an audible sound. The sound created by the swiping motion may be received by the microphones 118 as audio input. In examples where the user taps the skin near wearable device 110, the microphones 118 may receive the tapping noise as audio input.

[0026] Wearable device 120 may include one or more processors 124, memory 126, instructions 121, data 129, and one or more microphones 128 that are substantially similar to those described herein with respect to wearable device 110.

[0027] Figure 2A illustrates a user wearing the wearable device. User 240 may be wearing at least one wearable device 210. For example, the user 240 may be wearing earbud 210. Earbud 210 may have a housing shaped to be worn on a human body and, more specifically, within the inner portion of ear 242. The housing may include a first surface shaped to be in contact with ear 242 and at least one second surface 215 shaped to be exposed when worn on the body. Two microphones 218A, 218B may be located on the second surface 215. While only two microphones 218 A, 218B are shown, the earbud 210 may have more than two microphones on the second surface 215. For example, the earbud 210 may have three, five, etc. microphones on the second surface 215. Moreover, the position of the microphones 218A, 218B may be modified. For example, while one microphone may be placed on the second surface 215, another microphone may be placed on a third surface, different than the first surface and the second surface 215. Thus, two microphones 218A, 218B, as shown, is merely one example and is not meant to be limiting.

[0028] The microphones 218A, 218B may be located at the top of earbud 210 and bottom of earbud 210. In some examples, the top of earbud 210 may be the considered the area or region of the earbud 210 that is closest to the helix of the ear 242 and the bottom of earbud 210 may be the area or region of the earbud 210 that is closest to the lobule, or ear lobe, of ear 242. However, the top and bottom of the earbud 210 may be relative and, therefore, may be used for descriptive purposes only.

[0029] The microphones 218A, 218B may receive audio input. For example, the audio input may be ambient sounds occurring around user 240 and, therefore, around earbud 210. The ambient sounds may include receiving the audio input of a gesture performed by user 240 on the skin in the region around earbud 210.

[0030] Each microphone 218A, 218B may receive the audio input at a different time. For example, if the gesture, such as a swipe, begins closer to microphone 218A than 218B, microphone 218A may receive the audio input before microphone 218B, and vice versa. Additionally or alternatively, each microphone 218A, 218B may receive the audio input at a different sound intensity, or volume level. Sound intensity may be inversely proportional to the reciprocal of the square of the distance between the sound and the microphone. For example, if the gesture, such as a tap, is closer to microphone 218A than 218B, microphone 218A may receive audio input with a greater sound intensity than the audio input received by microphone 218B, and vice versa.

[0031] The type of gesture may be determined based on the time the audio input is received by each microphone 218A, 218B. For example, if microphone 218A receives audio input before microphone 218B but, as the gesture continues, microphone 218B begins to receive the audio input before 218 A, the wearable device, or earbud 210, may determine that the gesture is a swipe. In some examples, if microphone 218A receives audio input before microphone 218B and, as the gesture continues, microphone 218 A continues to receive audio input before microphone 218B, with the same time of flight difference, earbud 210 may determine that the gesture is a tap.

[0032] The type of gesture may be determined based on the sound intensity of the audio input received by each microphone 218A, 218B. According to some examples, if microphone 218A receives audio input with a greater sound intensity than microphone 218B at the beginning of a gesture but, as the gesture continues, microphone 218B receives the audio input with a greater sound intensity than microphone 218B, earbud 210 may determine that the gesture is a swipe.

[0033] The wearable device, or earbud 210, may use either or both the time the audio input is received by each microphone and the sound intensity of the audio input received by each microphone to determine the type of gesture. In examples where the gesture is a swipe, the earbud 210 may use either or both the time the audio input is received and the sound intensity of the audio input received to determine the direction of the swipe.

[0034] For example, at the beginning of the gesture, microphone 218A may receive the audio input first and/or with a higher intensity as compared to the audio input received by microphone 218B. As the gesture continues, microphone 218B may receive the audio input before microphone 218A and/or with a higher intensity as compared to microphone 218B. This may indicate that the gesture is a downward swipe. According to some examples, microphone 218B may receive the audio input before and/or with a greater sound intensity than the audio input received by microphone 218 A at the beginning of the gesture. As the gesture continues, microphone 218 A may receive the audio input before and/or with a greater sound intensity than 218B. This may indicate that the gesture is an upward swipe.

[0035] According to some examples, the direction of the swipe gesture may be determined based on the sound intensity received by microphones 218A, 218B and/or the time at which microphones 218A, 218B receive the audio input. The audio input created by the swipe gesture may not be constant throughout the gesture. For example, the user may provide uneven pressure when performing the gesture or the skin surface may be uneven. The audio created by the swipe signal may be first received by one microphone as compared to the other microphone as the swipe gesture may begin closer to the first microphone. As the swipe gesture progresses, the time the audio input arrives at each microphone may become similar. For example, the swipe gesture may be at a location equidistant from microphones 218A, 218B, or substantially at a midpoint between microphones 218A, 218B. At or near the end of the swipe gesture, the audio created by the swipe gesture may arrive at the second microphone earlier as compared to when the first microphone receives the audio input. This may indicate that the user swiped away from the first microphone.

[0036] In examples where the swipe gesture is a sideways swipe, microphones 218 A, 218B may notice a similar change in sound intensity or time at which the audio input is received. For example, the gesture may begin at a first location and end at a second location. The first location may be farther away from the earbud 210 than the second location. For example, the first location may be a region of the user’s face, such as an area below the temple or the cheek. The second location may be a region of the user’s face closest to the ear 242. Additionally or alternatively, the first location may be a region of the user’ s ear 242 farthest away from earbud 210 and the second location may be a region of the user’s ear 242 closest to earbud 210. If the swipe gesture begins at the first location and ends at the second location, the sound intensity of the audio input received by microphones 218A, 218B may increase during the gesture. In examples where the swipe gesture begins at the second location and ends at the first location, the sound intensity of the audio input received by microphones 218A, 218B may decrease during the gesture.

[0037] Additionally or alternatively, when the gesture is a sideways swipe, the time difference between when microphone 218 A and microphone 218B receives the audio input may be consistent throughout the duration of the swipe.

[0038] According to some examples, the audio input received by microphones 218A, 218B may be compared against details of gestures using a trained machine learning model to determine the type and/or direction of the gesture. A machine learning (“ML”) model may be trained to determine the type and/or direction of the gesture. Each training example may consist of a gesture provided by user. The input features to the ML model may include the time the audio input is received by each microphone, the sound intensity of the audio input received by each microphone, the speed at which the user provides the gesture, the direction and/or location of the gesture, etc. The ML model may use the input features to predict the type and/or direction of the gesture. The output of the ML model may be a predicted type and/or direction of the gesture provided by the user. In some examples, the device may request feedback from the user. For example, the user may be asked whether the predicted type and/or direction of the gesture is accurate. The user may provide feedback, such as a yes or no, indicating that the predicted type and/or direction of the gesture is accurate.

[0039] In examples where the wearable device, or earbud 210, includes three or more microphones, the time the audio input is received by each of the microphones may be different. Using the time the audio input is received by each microphone and the known location of the microphones, the location of the gesture may be determined. For example, based on the time each microphones receives the audio created by the gesture, a distance between the location of the gesture that created the audio and the location of the microphone may be determined. Using the distance between the location of the gesture and each microphone, the location of the gesture may be determined. According to some examples, the location may be a first distance from the first microphone, a second distance from the second microphone, and a third distance from the third microphone. Based on the first, second, and third distances, the location of the gesture may be triangulated.

[0040] Figure 2B illustrates example touch regions near the wearable device. For example, a first touch region 244 may be on the ear 242 of user 240. A second touch region 246 may be on the face of the user 240, such as the flat area below the temple. While the touch regions 244, 246 are shown as rectangular, the touch regions can be of any shape or size. For example, the touch regions may be oblong, polygonal, circular, irregular, etc. and may take up more or less room on the user’s ear 242 or face than shown in Figure 2B. The touch regions may, additionally or alternatively, be located on different regions of the user’s ear 242, face, or other nearby areas. Thus, touch regions 244, 246 are only one example and are not meant to be limiting.

[0041] The touch regions may provide a larger surface area for a user to provide an input command as compared to a surface area for touch input on the device. For example, the touch input on earbud 210 may be on the second surface 215 of earbud 210. By providing a gesture input to a region of skin near the wearable device, any negative effects of touching the device itself or issues with providing an input command on a small surface may be negated.

[0042] As described above, a swipe gesture may be from the top (“T”) of the touch region 244, 246 to the bottom (“B”) of touch region 244, 246, or from the bottom “B” to the top “T.” In examples where the swipe gesture goes from the top “T” to the bottom “B”, microphone 218A may receive audio input related to the swipe gesture at a time before microphone 218B receives the same audio input. Additionally or alternatively, microphone 218A may receive audio input with a greater sound intensity than microphone 218B. As the swipe gesture continues from the top “T” to bottom “B,” microphone 218B may begin receiving the audio input at a time before microphone 218A receives the audio input. In some examples, as the gesture approaches bottom “B,” microphone 218B may receive the audio input at a greater sound intensity than microphone 218 A. This may indicate that the swipe is a downward swipe. The opposite may occur when the gesture begins at bottom “B” and goes towards top “T.”

[0043] A swipe gesture may, additionally or alternatively, be from the left side (“L”) of the touch region 244, 246 to the right side (“R”) of the touch region 244, 246, or from the right side “R” to the left side “L.” In examples where the swipe gesture goes from the left side “L” to the right side “R,” the sound intensity of the audio input received by microphones 218 A, 218B may increase or decrease based on the touch region 244, 246. For example, the sound intensity of the audio input received by microphones 218A, 218B from swiping from left side “L” to right side “R” in touch region 244 may increase as the gesture will end in a position closer to the microphones 218A, 218B as compared to the position where the gesture started. The sound intensity of the audio input received from swiping from left side “L” to right side “R” in touch region 246 may decrease as the gesture will end in a position farther from the microphones 218A, 218B as compared to the position where the gesture started. The opposite may occur when the gesture goes from the right side “R” to the left side “L” in touch regions 244, 246.

[0044] Figure 3 illustrates an example where the wearable device is a smartwatch. Smartwatch 310 may be worn on the wrist 350 of a user. Smartwatch 310 may include all the components as described above with respect to wearable devices 110, 120, 210. In particular, smartwatch 310 may include two or more microphones 318A, 318B on a second surface 315 of smartwatch 310. The second surface 315 may be the surface of the housing that is not in contact with the human body, or wrist.

[0045] The touch regions 344, 346 may be located on the wrist 350 of the user and/or on the back of the hand 348 of the user. As shown, the touch regions 344, 346 may be ellipses. However, as described above, the touch regions 344, 346 may be any shape and size. The touch regions 344, 346 may have a top “T,” bottom “B,” left side “L,” and right side “R” for purposes of determining the direction of a swipe gesture. Smartwatch 310 may determine the type of gesture and, in some examples, the direction of a swipe gesture in the same or substantially the same manner that was described with respect to earbud 210.

[0046] Figures 4A and 4B illustrate graphical representations of the sound intensity of the audio input received by each of the microphones due to a gesture. The gesture may be a tap gesture or a swipe gesture. [0047] Figure 4 A illustrates a graph 400A of audio input 450A, 450B received by a first microphone 418A and a second microphone 418B, respectively, in response to a tap gesture. The first microphone 418A may be located near the top of the device and the second microphone 418B may be located near the bottom of the device. A tap gesture may create a short, distinctive burst of sound. The first and second microphones 418 A, 418B may receive the sound created by the tap as audio input 450A, 450B. As shown on graph 400 A, the audio input 450A received by the first microphone 418 A may be short and distinctive. The audio input 450B received by the second microphone 418B may have the same or similar shape on the graph 400A as the audio input 450A received by the first microphone 418A. Audio input 450A received by the first microphone 418A may have a different sound intensity as compared to the audio input 450B received by the second microphone 418B. According to some examples, the differences in sound intensities may be due to the location of the tap gesture. For example, if the tap gesture was in a region of the body closer to the first microphone 418 A than the second microphone 418B, the sound intensity of audio input 450A may be greater than the sound intensity of audio input 450B.

[0048] Figure 4B illustrates a graph 400B of audio input 452A, 452B received by the first and second microphones 418A, 418B, respectively, in response to a swipe gesture. A swipe gesture may create a sound having a duration equal, or substantially equal, to the length of the gesture. Each microphone 418A, 418B may receive audio input 452A, 452B starting at a different time as compared to when the swipe gesture starts. Additionally or alternatively, each microphone 418A, 418B may stop receiving audio input 452 A, 452B at a different time as compared to when the swipe gesture ends.

[0049] In one example, a user may perform a swipe gesture from the bottom of the touch region to the top of the touch region. In such an example, the second microphone 418B may receive audio input 452B, corresponding to the start of the swipe gesture, at time “tl.” The first microphone 418A may receive audio input 452A at time “t2.” As shown on graph 400B, t2 may be after tl such that the second microphone 418B received audio input 452B before the first microphone 418A received audio input 452A. This may be due to the second microphone 418B being located at the bottom of the device, which may correspond to the bottom of the touch region. As the swipe gesture reaches or approaches the top of the region, the first microphone 418 A may experience a decreasing arrival time of audio input 452A. According to some examples, a decreasing arrival time of audio input 452A may indicate that the sound is coming from a location closer to the first microphone 418A than it was previously. This may be due to the swipe gesture approaching the top of the touch region, corresponding to the top of the device and, therefore, the location of the first microphone 418A.

[0050] Figure 5 illustrates an example method of determining a gesture input using one or more microphones of a wearable device. The following operations do not have to be performed in the precise order described below. Rather, various operations can be handled in a different order or simultaneously, and operations may be added or omitted.

[0051] For example, in block 510 the device may receive an audio input from a gesture of a user. The audio input may be received by each of the one or more microphones. For example, the audio input may be received by a first microphone located at a first location on a surface of the device that is exposed when worn on the body and the audio input may be received by a second microphone located at a second location on the surface of the device that is exposed. The first location may be different than the second location. For example, the first location may be the relative top of the device and the second location may be the relative bottom of the device. According to some examples, the first location may be the relative left side of the device and the second location may be the relative right side of the device. These examples are not intended to be limiting as the first and second locations may be any location on the surface of the device that is exposed when worn on the body.

[0052] In block 520, the device may determine, based on the received audio input, a first location of the gesture. In block 530, the device may determine, based on the received audio input, a second location of the gesture. For example, the gesture may be performed by the user in a touch region. The touch region may be a region of the skin of the human body at or near the device. The touch region may have a top, bottom, left, and right side. In examples where the gesture is a swipe gesture, the gesture may begin at one side of the touch region and end in another. In examples where the gesture is a tap gesture, the gesture may occur in substantially the same location within the touch region such that the first location of the gesture and the second location of the gesture are the same or substantially the same location.

[0053] In block 540, the device may determine, based on the first location and the second location of the gesture, an input command. For example, if the first location is on the top of the touch region and the second location is on the bottom of the touch region, the gesture may be a downward swipe. According to some examples, a downward swipe may be an input command, such as to decrease the playback volume of the content. If the first location is on the bottom of the touch region and the second location is on the top of the touch region, the gesture may be an upward swipe. An upward swipe may be an input command, such as to increase the playback volume of the content. In some examples, the first location may be the left or right side of the touch region and the second location may be the right or left side, respectively. This may be a sideways swipe, which may be an input command to fast-forward or rewind the content. In examples where the first location and the second location are the same or substantially the same location, the gesture may be a tap. A tap may be an input command start or stop content from being output. According to some examples, a tap may answer or hang up a phone call. Therefore, the types of input commands described herein, i.e. increasing volume, decreasing volume, fast-forwarding, rewinding, stopping, and playing, are merely examples and are not meant to be limiting.

[0054] While the device described herein discusses having one or more microphones on the surface of the device that is exposed when worn on the body, all or some of the one or more microphones may be within the device and/or on the surface of the device that is in contact with the human body when the device is worn on the body.

[0055] Unless otherwise stated, the foregoing alternative examples are not mutually exclusive, but may be implemented in various combinations to achieve unique advantages. As these and other variations and combinations of the features discussed above can be utilized without departing from the subject matter defined by the claims, the foregoing description of the embodiments should be taken by way of illustration rather than by way of limitation of the subject matter defined by the claims. In addition, the provision of the examples described herein, as well as clauses phrased as "such as," "including" and the like, should not be interpreted as limiting the subject matter of the claims to the specific examples; rather, the examples are intended to illustrate only one of many possible embodiments. Further, the same reference numbers in different drawings can identify the same or similar elements.

Claims

1. A wearable electronic device, comprising: one or more microphones; and one or more processors in communication with the one or more microphones, the one or more processors configured to: receive, at each of the one or more microphones, an audio input based on a gesture of a user; determine, based on the received audio input, a first location of the gesture and a second location of the gesture; and determine, based on the first location and second location of the gesture, an input command.

2. The wearable electronic device of claim 1, wherein the gesture is a swipe gesture or a tap gesture.

3. The wearable electronic device of claim 2, wherein when the gesture is the swipe gesture, the first location of the gesture is different than the second location of the gesture.

4. The wearable electronic device of claim 2, wherein when the gesture is the swipe or tap gesture, the swipes or tap is an area of skin in a region of the device.

5. The wearable electronic device of claim 1, wherein determining the first location and the second location includes comparing an audio signal captured by each of the one or more microphones.

6. The wearable electronic device of claim 1, wherein the one or more processors are configured to determine, based on the first location and the second location, a trajectory of the gesture.

7. The wearable electronic device of claim 1, wherein the wearable device is an earbud.

8. The wearable electronic device of claim 7, wherein the earbud includes a housing shaped to be worn on a human body, wherein at least one first surface of the housing is shaped to come in contact with the human body and at least one second surface of the housing is shaped to be exposed when worn on the body, and wherein at least two of the one or more microphones are located on the second surface.

9. The wearable electronic device of claim 7, wherein the gesture of the user is a tap or swipe on an area of skin on an ear or face of the user.

10. The wearable electronic device of claim 1, wherein the one or more processors are configured to perform the determined input command.

11. A method, comprising : receiving, at each of one or more microphones of a wearable device, an audio input based on a gesture of a user; determining, by one or more processors based on the received audio input, a first location of the gesture and a second location of the gesture; and determining, by the one or more processors based on the first location and the second location of the gesture, an input command.

12. The method of claim 11, wherein the gesture is a swipe gesture or a tap gesture.

13. The method of claim 12, wherein when the gesture is the swipe gesture, the first location of the gesture is different than the second location of the gesture.

14. The method of claim 12, wherein when the gesture is the swipe or tap gesture, the swipe or tap is an area of skin in a region of the wearable device.

15. The method of claim 11, wherein determining the first location and the second location includes comparing, by the one or more processors, an audio signal captured by each of the one or more microphones.

16. The method of claim 11, further comprising determining, by the one or more processors based on the first location and the second location, a trajectory of the gesture.

17. The method of claim 11, wherein the wearable device is an earbud.

18. The method of claim 17, wherein the earbud includes a housing shaped to be worn on a human body, wherein at least one first surface of the housing is shaped to come in contact with the human body and at least one second surface of the housing is shaped to be exposed when worn on the body, and wherein at least two of the one or more microphones are located on the second surface.

19. The method of claim 17, wherein the gesture of the user is a tap or swipe on an area of skin on an ear or face of the user.

20. The method of claim 11, further comprising performing, by the one or more processors, the determined input command.