US11270706B1 - Voice controlled assistant with coaxial speaker and microphone arrangement - Google Patents

Voice controlled assistant with coaxial speaker and microphone arrangement Download PDF

Info

Publication number
US11270706B1
US11270706B1 US15/930,967 US202015930967A US11270706B1 US 11270706 B1 US11270706 B1 US 11270706B1 US 202015930967 A US202015930967 A US 202015930967A US 11270706 B1 US11270706 B1 US 11270706B1
Authority
US
United States
Prior art keywords
audio data
speech
determining
processors
data represents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US15/930,967
Inventor
Timothy Theodore List
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Amazon Technologies Inc
Original Assignee
Amazon Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Amazon Technologies Inc filed Critical Amazon Technologies Inc
Priority to US15/930,967 priority Critical patent/US11270706B1/en
Priority to US17/027,155 priority patent/US11521624B1/en
Assigned to AMAZON TECHNOLOGIES, INC. reassignment AMAZON TECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: RAWLES LLC
Assigned to RAWLES LLC reassignment RAWLES LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LIST, TIMOTHY T.
Application granted granted Critical
Publication of US11270706B1 publication Critical patent/US11270706B1/en
Priority to US18/074,798 priority patent/US12014742B1/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/08Mouthpieces; Microphones; Attachments therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/323Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/326Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/34Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by using a single transducer with sound reflecting, diffracting, directing or guiding means
    • H04R1/345Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by using a single transducer with sound reflecting, diffracting, directing or guiding means for loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/002Damping circuit arrangements for transducers, e.g. motional feedback circuits
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2227/00Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
    • H04R2227/003Digital PA systems using, e.g. LAN or internet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/07Applications of wireless loudspeakers or wireless microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R27/00Public address systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/12Circuits for transducers, loudspeakers or microphones for distributing signals to two or more loudspeakers

Definitions

  • Homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, and portable communication devices.
  • computing devices such as desktops, tablets, entertainment systems, and portable communication devices.
  • many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, and gesture.
  • Another way to interact with computing devices is through speech.
  • the device When using speech as an input, the device is commonly equipped with microphones to receive voice input and a speech recognition component that attempts to recognize the voice input.
  • This voice input often competes with other audible sounds that might be received by the microphones, such as background voices, ambient noise, and perhaps even double talk.
  • Double talk refers to a situation where sound from the near end talker reaches the microphones simultaneously with sound from the far end talker that is played out through the device loudspeakers. That is, sound played out of the loudspeaker (e.g., sound corresponding to signals received from the far end talker) echoes and reaches the microphones, along with sound from the near end talker.
  • FIG. 1 shows an illustrative voice interactive computing architecture set in an example environment that includes a near end talker communicating with a far end talker or cloud service through use of a voice controlled assistant.
  • FIG. 2 shows a block diagram of selected functional components implemented in the voice controlled assistant of FIG. 1 .
  • FIG. 3 is a cross sectional view of the voice controlled assistant of FIG. 1 according to one example implementation in which the speakers and microphone array are coaxially aligned.
  • FIG. 4 shows a top down view of the voice controlled assistant of FIG. 1 to illustrate one example arrangement of microphones in the microphone array.
  • FIG. 5 is a flow diagram showing an illustrative process of operating the voice controlled assistant of FIG. 1 .
  • a voice controlled assistant having a coaxially aligned speaker and microphone arrangement is described.
  • the voice controlled assistant is described in the context of an architecture in which the assistant is connected to far end talkers or a network accessible computing platform, or “cloud service”, via a network.
  • the voice controlled assistant may be implemented as a hands-free device equipped with a wireless LAN (WLAN) interface.
  • WLAN wireless LAN
  • the voice controlled assistant may be positioned in a room (e.g., at home, work, store, etc.) to receive user input in the form of voice interactions, such as spoken requests or a conversational dialogue. Depending on the request, the voice controlled assistant may perform any number of actions. For instance, the assistant may play music or emit verbal answers to the user.
  • the assistant may alternatively function as a communication device to facilitate network voice communications with a far end talker.
  • the user may ask a question or submit a search request to be performed by a remote cloud service.
  • the user's voice input may be transmitted from the assistant over a network to the cloud service, where the voice input is interpreted and used to perform a function. In the event that the function creates a response, the cloud service transmits the response back over the network to the assistant, where it may be audibly emitted.
  • Some of the techniques described herein may be implemented in other electronic devices besides a voice controlled assistant 104 .
  • aspects may be implemented in communications devices, tablet or computing devices, or any number of electronic devices that is capable of producing sound from one or more speakers and receiving sound in one or more microphones.
  • the architecture may be implemented in many ways. Various example implementations are provided below. However, the architecture may be implemented in many other contexts and situations different from those shown and described below.
  • FIG. 1 shows an illustrative architecture 100 , set in an exemplary environment 102 , which includes a voice controlled assistant 104 and a user 106 of the voice controlled assistant 104 .
  • a voice controlled assistant 104 and a user 106 of the voice controlled assistant 104 .
  • the user 106 may be located proximal to the voice controlled assistant 104 , and hence serve as a near end talker in some contexts.
  • the voice controlled assistant 104 is physically positioned on a table 108 within the environment 102 .
  • the voice controlled assistant 104 is shown sitting upright and supported on its base end. In other implementations, the assistant 104 may be placed in any number of locations (e.g., ceiling, wall, in a lamp, beneath a table, on a work desk, in a hall, under a chair, etc.).
  • the voice controlled assistant 104 is shown communicatively coupled to remote entities 110 over a network 112 .
  • the remote entities 110 may include individual people, such as person 114 , or automated systems (not shown) that serve as far end talkers to verbally interact with the user 106 .
  • the remote entities 110 may alternatively comprise cloud services 116 hosted, for example, on one or more servers 118 ( 1 ), . . . , 118 (S). These servers 118 ( 1 )-(S) may be arranged in any number of ways, such as server farms, stacks, and the like that are commonly used in data centers.
  • the cloud services 116 generally refer to a network accessible platform implemented as a computing infrastructure of processors, storage, software, data access, and so forth that is maintained and accessible via a network such as the Internet. Cloud services 116 do not require end-user knowledge of the physical location and configuration of the system that delivers the services. Common expressions associated with cloud services include “on-demand computing”, “software as a service (SaaS)”, “platform computing”, “network accessible platform”, and so forth.
  • the cloud services 116 may host any number of applications that can process the user input received from the voice controlled assistant 104 , and produce a suitable response.
  • Example applications might include web browsing, online shopping, banking, email, work tools, productivity, entertainment, educational, and so forth.
  • the user 106 is shown communicating with the remote entities 110 via the voice controlled assistant 104 .
  • the assistant 104 outputs an audible questions, “What do you want to do?” as represented by dialog bubble 120 .
  • This output may represent a question from a far end talker 114 , or from a cloud service 116 (e.g., an entertainment service).
  • the user 106 is shown replying to the question by stating, “I'd like to buy tickets to a movie” as represented by the dialog bubble 122 .
  • the voice controlled assistant 104 is equipped with an array 124 of microphones 126 ( 1 ), . . . , 126 (M) to receive the voice input from the user 106 as well as any other audio sounds in the environment 102 .
  • the microphones 126 ( 1 )-(M) are generally arranged at a first or top end of the assistant 104 opposite the base end seated on the table 108 , as will be described in more detail with reference to FIGS. 3 and 4 . Although multiple microphones are illustrated, in some implementations, the assistant 104 may be embodied with only one microphone.
  • the voice controlled assistant 104 may further include a speaker array 128 of speakers 130 ( 1 ), . . . , 130 (P) to output sounds in humanly perceptible frequency ranges.
  • the speakers 130 ( 1 )-(P) may be configured to emit sounds at various frequency ranges, so that each speaker has a different range. In this manner, the assistant 104 may output high frequency signals, mid frequency signals, and low frequency signals.
  • the speakers 130 ( 1 )-(P) are generally arranged at a second or base end of the assistant 104 and oriented to emit the sound in a downward direction toward the base end and opposite to the microphone array 124 in the top end.
  • the assistant 104 may be embodied with only one speaker.
  • the voice controlled assistant 104 may further include computing components 132 that process the voice input received by the microphone array 124 , enable communication with the remote entities 110 over the network 112 , and generate the audio to be output by the speaker array 128 .
  • the computing components 132 are generally positioned between the microphone array 123 and the speaker array 128 , although essentially any other arrangement may be used. One collection of computing components 132 are illustrated and described with reference to FIG. 2 .
  • FIG. 2 shows selected functional components of the voice controlled assistant 104 in more detail.
  • the voice controlled assistant 104 may be implemented as a standalone device that is relatively simple in terms of functional capabilities with limited input/output components, memory and processing capabilities.
  • the voice controlled assistant 104 may not have a keyboard, keypad, or other form of mechanical input. Nor does it have a display or touch screen to facilitate visual presentation and user touch input.
  • the assistant 104 may be implemented with the ability to receive and output audio, a network interface (wireless or wire-based), power, and limited processing/memory capabilities.
  • the voice controlled assistant 104 includes the microphone array 124 , a speaker array 128 , a processor 202 , and memory 204 .
  • the microphone array 124 may be used to capture speech input from the user 106 , or other sounds in the environment 102 .
  • the speaker array 128 may be used to output speech from a far end talker, audible responses provided by the cloud services, forms of entertainment (e.g., music, audible books, etc.), or any other form of sound.
  • the speaker array 128 may output a wide range of audio frequencies including both human perceptible frequencies and non-human perceptible frequencies.
  • the memory 204 may include computer-readable storage media (“CRSM”), which may be any available physical media accessible by the processor 202 to execute instructions stored on the memory.
  • CRSM may include random access memory (“RAM”) and Flash memory.
  • RAM random access memory
  • CRSM may include, but is not limited to, read-only memory (“ROM”), electrically erasable programmable read-only memory (“EEPROM”), or any other medium which can be used to store the desired information and which can be accessed by the processor 202 .
  • An operating system module 206 is configured to manage hardware and services (e.g., wireless unit, USB, Codec) within and coupled to the assistant 104 for the benefit of other modules.
  • Several other modules may be provided to process verbal input from the user 106 .
  • a speech recognition module 208 provides some level of speech recognition functionality. In some implementations, this functionality may be limited to specific commands that perform fundamental tasks like waking up the device, configuring the device, and the like.
  • the amount of speech recognition capabilities implemented on the assistant 104 is an implementation detail, but the architecture described herein can support having some speech recognition at the local assistant 104 together with more expansive speech recognition at the cloud services 116 .
  • An acoustic echo cancellation module 210 and a double talk reduction module 212 are provided to process the audio signals to substantially cancel acoustic echoes and substantially reduce double talk that may occur. These modules may work together to identify times where echoes are present, where double talk is likely, where background noise is present, and attempt to reduce these external factors to isolate and focus on the near talker. By isolating on the near talker, better signal quality is provided to the speech recognition module 208 to enable more accurate interpretation of the speech utterances.
  • a query formation module 214 may also be provided to receive the parsed speech content output by the speech recognition module 208 and to form a search query or some form of request.
  • This query formation module 214 may utilize natural language processing (NLP) tools as well as various language modules to enable accurate construction of queries based on the user's speech input.
  • NLP natural language processing
  • modules shown stored in the memory 204 are merely representative. Other modules 216 for processing the user voice input, interpreting that input, and/or performing functions based on that input may be provided.
  • the voice controlled assistant 104 might further include a codec 218 coupled to the microphones of the microphone array 124 and the speakers of the speaker array 128 to encode and/or decode the audio signals.
  • the codec 218 may convert audio data between analog and digital formats.
  • a user may interact with the assistant 104 by speaking to it, and the microphone array 124 captures the user speech.
  • the codec 218 encodes the user speech and transfers that audio data to other components.
  • the assistant 104 can communicate back to the user by emitting audible statements passed through the codec 218 and output through the speaker array 128 . In this manner, the user interacts with the voice controlled assistant simply through speech, without use of a keyboard or display common to other types of devices.
  • a USB port 224 may further be provided as part of the assistant 104 to facilitate a wired connection to a network, or a plug-in network device that communicates with other wireless networks.
  • a plug-in network device that communicates with other wireless networks.
  • other forms of wired connections may be employed, such as a broadband connection.
  • a power unit 226 is further provided to distribute power to the various components on the assistant 104 .
  • the voice controlled assistant 104 is designed to support audio interactions with the user, in the form of receiving voice commands (e.g., words, phrase, sentences, etc.) from the user and outputting audible feedback to the user. Accordingly, in the illustrated implementation, there are no haptic input devices, such as navigation buttons, keypads, joysticks, keyboards, touch screens, and the like. Further there is no display for text or graphical output.
  • the voice controlled assistant 104 may include non-input control mechanisms, such as basic volume control button(s) for increasing/decreasing volume, as well as power and reset buttons. There may also be a simple light element (e.g., LED) to indicate a state such as, for example, when power is on. But, otherwise, the assistant 104 does not use or need to use any input devices or displays.
  • the assistant 104 may be implemented as an aesthetically appealing device with smooth and rounded surfaces, with some apertures for passage of sound waves, and merely having a power cord and optionally a wired interface (e.g., broadband, USB, etc.).
  • the assistant 104 has a housing of an elongated cylindrical shape. Apertures or slots are formed in a base end to allow emission of sound waves. A more detailed discussion of one particular structure is provided below with reference to FIG. 3 .
  • the device Once plugged in, the device may automatically self-configure, or with slight aid of the user, and be ready to use. As a result, the assistant 104 may be generally produced at a low cost.
  • other I/O components may be added to this basic model, such as specialty buttons, a keypad, display, and the like.
  • FIG. 3 is a cross sectional view 400 of the voice controlled assistant 104 taken along a plane that intersects a center axis 402 and passes through a diameter of the cylindrical-shaped housing.
  • the assistant 104 has a housing 404 with an elongated, cylindrical-shaped middle section 406 extending between a first or base end 408 and a second or top end 410 .
  • the cylindrical-shaped middle section 406 has a smooth outer surface and due to the rounded shape, the two ends 408 and 410 are circular in shape.
  • the base end 408 is designed to rest on a surface, such as a table 108 in FIG. 1 , to support the housing 404 . In this position, the top end 410 is distal and upward relative to the base end 408 .
  • One or more microphones 126 are mounted proximal to the top end 410 of the housing 404 to capture audio input, such as voice input from the user.
  • Multiple orifices 412 are formed in the top end 410 to hold the microphones. There are many possible arrangements of the microphones in the microphone array.
  • FIG. 4 shows one example arrangement of microphones in the top end 410 . More particularly, FIG. 4 shows a top down view of the voice controlled assistant 104 taken along line A-A to illustrate the top end 410 of the housing 404 .
  • the microphone array has seven microphones 126 ( 1 ), . . . , 126 ( 7 ). Six of the microphones 126 ( 1 )-( 6 ) are placed along a circle concentric to the perimeter of the top end 410 . A seventh microphone 126 ( 7 ) is positioned at the center point of the circular top end 410 . It is noted that this is merely one example. Arrays with more or less than seven microphones may be used, and other layouts are possible.
  • the housing 404 defines a hollow chamber 414 therein.
  • this chamber 414 Within this chamber 414 are two skeletal members: a first or lower skeletal member 416 that provides structural support for components in the lower half of the chamber 414 and a second or upper skeletal member 418 that provides structural support for components in the upper half of the chamber 414 .
  • the computing components 132 are mounted in the upper skeletal member 418 , but are not shown in FIG. 3 to better illustrate the structural arrangement of the speakers and microphones.
  • the computing components 132 may include any number of processing and memory capabilities, as well as power, codecs, network interfaces, and so forth. Example components are shown in FIG. 2 .
  • a first speaker 420 is shown mounted within the lower skeletal member 416 .
  • the first speaker 420 outputs a first range of frequencies of audio sound.
  • the first speaker 420 is a mid-high frequency speaker that plays the middle to high frequency ranges in the human-perceptible audible range.
  • a second speaker 422 is shown mounted within the upper skeletal member 416 elevationally above the first speaker 420 .
  • the second speaker 422 is a low frequency speaker that plays the low frequency ranges in the human-perceptible audible range.
  • the mid-high frequency speaker 420 is smaller than the low frequency speaker 422 .
  • the two speakers 420 and 422 are mounted in a coaxial arrangement along the center axis 402 , with the low frequency speaker 422 atop the mid-high frequency speaker 420 .
  • the speakers are also coaxial along the center axis 402 to the microphone array, or more particularly, to the plane intersecting the microphone array.
  • the middle microphone 126 ( 7 ) is positioned at the center point and lies along the center axis 402 .
  • the two speakers 420 and 422 are oriented to output sound in a downward direction toward the base end 408 and away from the microphones 126 mounted in the top end 410 .
  • the low frequency speaker 422 outputs sound waves that pass through one or more openings in the lower skeletal member 416 .
  • the low frequency waves may emanate from the housing in any number of directions. Said another way, in some implementations, the low frequency speaker 422 may function as a woofer to generate low frequency sound waves that flow omni-directionally from the assistant 104 .
  • the mid-high frequency speaker 420 is mounted within a protective shielding 424 , which provides a shield to the sound waves emitted from the low frequency speaker 422 .
  • Small openings or slots 426 are formed in middle section 406 of the housing 404 near the base end 402 to pass sound waves from the chamber 414 , but the low frequency waves need not be constrained to these slots.
  • the mid-high frequency speaker 420 emits mid-high frequency sound waves in a downward direction onto a sound distribution cone 428 mounted to the base end 408 .
  • the sound distribution cone 428 is coaxially arranged in the housing 404 along the center axis 402 and adjacent to the mid-high frequency speaker 420 .
  • the sound distribution cone 428 has a conical shape with a smooth upper nose portion 430 , a middle portion 432 with increasing radii from top to bottom, and a lower flange portion 434 with smooth U-shaped flange.
  • the sound distribution cone 428 directs the mid-high frequency sound waves from the mid-high frequency speaker 420 along the smooth conical surface downward along the middle portion 432 and in a radial outward direction from the center axis 402 along the lower flange portion 434 at the base end 408 of the housing 404 .
  • the radial outward direction is substantially perpendicular to the initial downward direction of the sound along the center axis 402 .
  • the sound distribution cone 428 essentially delivers the sound out of the base end 408 of the housing 404 symmetrical to, and equidistance from, the microphone array 124 in the top end 410 of the housing.
  • the sound distribution cone 428 may also have the affect of amplifying the sound emitted from the mid-high frequency speaker 420 .
  • the housing 404 has openings or slots 436 formed adjacent to the base end 408 . These slots 436 permit passage of the sound waves, and particularly the high frequency sound waves, emitted from the mid-high frequency speaker 420 .
  • the slots 436 are comparatively smaller than the size or diameters of the speakers 420 and 422 . However, the sound is still efficiently directed out through the slots 436 by the sound distribution cone 428 .
  • Structural posts 438 provide structural stability between the based end 408 and the middle section 406 of the housing 404 .
  • FIG. 5 is a flow diagram of an illustrative process 500 to operate a communication device.
  • This process (as well as other processes described throughout) is illustrated as a logical flow graph, each operation of which represents a sequence of operations that can be implemented in hardware, software, or a combination thereof.
  • the operations represent computer-executable instructions stored on one or more tangible computer-readable storage media that, when executed by one or more processors, perform the recited operations.
  • computer-executable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types.
  • the order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the process.
  • the process 500 is described with reference to the voice controlled assistant 104 . However, the process may be performed by other electronic devices.
  • audio input is received by the one or more microphones 126 ( 1 )-(M) of the microphone array 124 at the top end 410 of the housing 404 .
  • the audio input may be speech of the near end talker, other voices in the environment, sounds from other electronics (e.g., TVs, radios, etc.), ambient noise, and so forth.
  • the audio input is processed.
  • various computing components 132 are used to process the audio input.
  • the AEC module 210 may detect and cancel echoes in the input signal at 504 A.
  • the double talk reduction module 212 may determine the likelihood of double talk and seek to reduce or eliminate that component in the input signal at 504 B.
  • the filtered audio input is then transmitted to the far end talker.
  • additional processing may be performed.
  • the speech recognition module 208 can parse the resulting data in an effort to recognize the primary speech utterances from the near end talker at 504 C. From this recognized speech, the query formation module 214 may form a query or request at 504 D. This query or request may then be transmitted to a cloud service 116 for further processing and generation of a response and/or execution of a function.
  • any sound is output by the one or more speakers 130 ( 1 )-(P) mounted in the base end 408 of the housing 404 .
  • the assistant 104 When the assistant 104 is resting on the base end 408 , the sound is output in a downward direction opposite to the microphones 126 ( 1 )-(M) in the top end 410 .
  • At 508 at least parts of the sound, particularly the mid-high frequency ranges, are redirected from the initial downward path outward in a radial direction from the base end 408 of the assistant 104 . These parts of the sound are output symmetrical to, and equidistance from, the microphone array 124 in the top end 410 of the housing 404 .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Otolaryngology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

A voice controlled assistant has a housing to hold one or more microphones, one or more speakers, and various computing components. The housing has an elongated cylindrical body extending along a center axis between a base end and a top end. The microphone(s) are mounted in the top end and the speaker(s) are mounted proximal to the base end. The microphone(s) and speaker(s) are coaxially aligned along the center axis. The speaker(s) are oriented to output sound directionally toward the base end and opposite to the microphone(s) in the top end. The sound may then be redirected in a radial outward direction from the center axis at the base end so that the sound is output symmetric to, and equidistance from, the microphone(s).

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation of, and claims priority to, U.S. patent application Ser. No. 16/397,931, filed on Apr. 29, 2019, entitled “Voice Controlled Assistant With Coaxial Speaker and Microphone Arrangement”, which is a continuation of U.S. patent application Ser. No. 15/649,256, filed on Jul. 13, 2017, entitled “Voice Controlled Assistant With Coaxial Speaker and Microphone Arrangement”, now U.S. Pat. No. 10,283,121, which issued on May 7, 2019, which is a continuation of U.S. patent application Ser. No. 15/207,249, filed on Jul. 11, 2016, entitled “Voice Controlled Assistant With Coaxial Speaker and Microphone Arrangement”, now U.S. Pat. No. 9,837,083, which issued on Dec. 5, 2017, which is a continuation of U.S. patent application Ser. No. 14/738,669, filed on Jun. 12, 2015, entitled “Voice Controlled Assistant with Coaxial Speaker and Microphone Arrangement”, now U.S. Pat. No. 9,390,724, which issued on Jul. 12, 2016, which is a continuation of U.S. patent application Ser. No. 13/486,774, filed on Jun. 1, 2012, entitled “Voice Controlled Assistant with Coaxial Speaker and Microphone Arrangement”, now U.S. Pat. No. 9,060,224, which issued on Jun. 16, 2015, all of which are expressly incorporated herein by reference in their entirety.
BACKGROUND
Homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, and portable communication devices. As these computing devices evolve, many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through speech.
When using speech as an input, the device is commonly equipped with microphones to receive voice input and a speech recognition component that attempts to recognize the voice input. This voice input often competes with other audible sounds that might be received by the microphones, such as background voices, ambient noise, and perhaps even double talk. Double talk refers to a situation where sound from the near end talker reaches the microphones simultaneously with sound from the far end talker that is played out through the device loudspeakers. That is, sound played out of the loudspeaker (e.g., sound corresponding to signals received from the far end talker) echoes and reaches the microphones, along with sound from the near end talker.
These additional sounds concurrent with the speech input can negatively impact acoustic performance of the device, including both input and output of audio. Accordingly, there is a need for improved architectures of voice enabled devices that enhance acoustic performance.
BRIEF DESCRIPTION OF THE DRAWINGS
The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same reference numbers in different figures indicate similar or identical items.
FIG. 1 shows an illustrative voice interactive computing architecture set in an example environment that includes a near end talker communicating with a far end talker or cloud service through use of a voice controlled assistant.
FIG. 2 shows a block diagram of selected functional components implemented in the voice controlled assistant of FIG. 1.
FIG. 3 is a cross sectional view of the voice controlled assistant of FIG. 1 according to one example implementation in which the speakers and microphone array are coaxially aligned.
FIG. 4 shows a top down view of the voice controlled assistant of FIG. 1 to illustrate one example arrangement of microphones in the microphone array.
FIG. 5 is a flow diagram showing an illustrative process of operating the voice controlled assistant of FIG. 1.
DETAILED DESCRIPTION
A voice controlled assistant having a coaxially aligned speaker and microphone arrangement is described. The voice controlled assistant is described in the context of an architecture in which the assistant is connected to far end talkers or a network accessible computing platform, or “cloud service”, via a network. The voice controlled assistant may be implemented as a hands-free device equipped with a wireless LAN (WLAN) interface. The voice controlled assistant relies primarily, if not exclusively, on voice interactions with a user.
The voice controlled assistant may be positioned in a room (e.g., at home, work, store, etc.) to receive user input in the form of voice interactions, such as spoken requests or a conversational dialogue. Depending on the request, the voice controlled assistant may perform any number of actions. For instance, the assistant may play music or emit verbal answers to the user. The assistant may alternatively function as a communication device to facilitate network voice communications with a far end talker. As still another alternative, the user may ask a question or submit a search request to be performed by a remote cloud service. For instance, the user's voice input may be transmitted from the assistant over a network to the cloud service, where the voice input is interpreted and used to perform a function. In the event that the function creates a response, the cloud service transmits the response back over the network to the assistant, where it may be audibly emitted.
Some of the techniques described herein may be implemented in other electronic devices besides a voice controlled assistant 104. For instance, aspects may be implemented in communications devices, tablet or computing devices, or any number of electronic devices that is capable of producing sound from one or more speakers and receiving sound in one or more microphones.
The architecture may be implemented in many ways. Various example implementations are provided below. However, the architecture may be implemented in many other contexts and situations different from those shown and described below.
Illustrative Environment and Device
FIG. 1 shows an illustrative architecture 100, set in an exemplary environment 102, which includes a voice controlled assistant 104 and a user 106 of the voice controlled assistant 104. Although only one user 106 is illustrated in FIG. 1, multiple users may use the voice controlled assistant 104. The user 106 may be located proximal to the voice controlled assistant 104, and hence serve as a near end talker in some contexts.
In this illustration, the voice controlled assistant 104 is physically positioned on a table 108 within the environment 102. The voice controlled assistant 104 is shown sitting upright and supported on its base end. In other implementations, the assistant 104 may be placed in any number of locations (e.g., ceiling, wall, in a lamp, beneath a table, on a work desk, in a hall, under a chair, etc.). The voice controlled assistant 104 is shown communicatively coupled to remote entities 110 over a network 112. The remote entities 110 may include individual people, such as person 114, or automated systems (not shown) that serve as far end talkers to verbally interact with the user 106. The remote entities 110 may alternatively comprise cloud services 116 hosted, for example, on one or more servers 118(1), . . . , 118(S). These servers 118(1)-(S) may be arranged in any number of ways, such as server farms, stacks, and the like that are commonly used in data centers.
The cloud services 116 generally refer to a network accessible platform implemented as a computing infrastructure of processors, storage, software, data access, and so forth that is maintained and accessible via a network such as the Internet. Cloud services 116 do not require end-user knowledge of the physical location and configuration of the system that delivers the services. Common expressions associated with cloud services include “on-demand computing”, “software as a service (SaaS)”, “platform computing”, “network accessible platform”, and so forth.
The cloud services 116 may host any number of applications that can process the user input received from the voice controlled assistant 104, and produce a suitable response. Example applications might include web browsing, online shopping, banking, email, work tools, productivity, entertainment, educational, and so forth.
In FIG. 1, the user 106 is shown communicating with the remote entities 110 via the voice controlled assistant 104. The assistant 104 outputs an audible questions, “What do you want to do?” as represented by dialog bubble 120. This output may represent a question from a far end talker 114, or from a cloud service 116 (e.g., an entertainment service). The user 106 is shown replying to the question by stating, “I'd like to buy tickets to a movie” as represented by the dialog bubble 122.
The voice controlled assistant 104 is equipped with an array 124 of microphones 126(1), . . . , 126(M) to receive the voice input from the user 106 as well as any other audio sounds in the environment 102. The microphones 126(1)-(M) are generally arranged at a first or top end of the assistant 104 opposite the base end seated on the table 108, as will be described in more detail with reference to FIGS. 3 and 4. Although multiple microphones are illustrated, in some implementations, the assistant 104 may be embodied with only one microphone.
The voice controlled assistant 104 may further include a speaker array 128 of speakers 130(1), . . . , 130(P) to output sounds in humanly perceptible frequency ranges. The speakers 130(1)-(P) may be configured to emit sounds at various frequency ranges, so that each speaker has a different range. In this manner, the assistant 104 may output high frequency signals, mid frequency signals, and low frequency signals. The speakers 130(1)-(P) are generally arranged at a second or base end of the assistant 104 and oriented to emit the sound in a downward direction toward the base end and opposite to the microphone array 124 in the top end. One particular arrangement is described below in more detail with reference to FIG. 3. Although multiple speakers are illustrated, in some implementations, the assistant 104 may be embodied with only one speaker.
The voice controlled assistant 104 may further include computing components 132 that process the voice input received by the microphone array 124, enable communication with the remote entities 110 over the network 112, and generate the audio to be output by the speaker array 128. The computing components 132 are generally positioned between the microphone array 123 and the speaker array 128, although essentially any other arrangement may be used. One collection of computing components 132 are illustrated and described with reference to FIG. 2.
Illustrative Voice Controlled Assistant
FIG. 2 shows selected functional components of the voice controlled assistant 104 in more detail. Generally, the voice controlled assistant 104 may be implemented as a standalone device that is relatively simple in terms of functional capabilities with limited input/output components, memory and processing capabilities. For instance, the voice controlled assistant 104 may not have a keyboard, keypad, or other form of mechanical input. Nor does it have a display or touch screen to facilitate visual presentation and user touch input. Instead, the assistant 104 may be implemented with the ability to receive and output audio, a network interface (wireless or wire-based), power, and limited processing/memory capabilities.
In the illustrated implementation, the voice controlled assistant 104 includes the microphone array 124, a speaker array 128, a processor 202, and memory 204. The microphone array 124 may be used to capture speech input from the user 106, or other sounds in the environment 102. The speaker array 128 may be used to output speech from a far end talker, audible responses provided by the cloud services, forms of entertainment (e.g., music, audible books, etc.), or any other form of sound. The speaker array 128 may output a wide range of audio frequencies including both human perceptible frequencies and non-human perceptible frequencies.
The memory 204 may include computer-readable storage media (“CRSM”), which may be any available physical media accessible by the processor 202 to execute instructions stored on the memory. In one basic implementation, CRSM may include random access memory (“RAM”) and Flash memory. In other implementations, CRSM may include, but is not limited to, read-only memory (“ROM”), electrically erasable programmable read-only memory (“EEPROM”), or any other medium which can be used to store the desired information and which can be accessed by the processor 202.
Several modules such as instruction, datastores, and so forth may be stored within the memory 204 and configured to execute on the processor 202. An operating system module 206 is configured to manage hardware and services (e.g., wireless unit, USB, Codec) within and coupled to the assistant 104 for the benefit of other modules. Several other modules may be provided to process verbal input from the user 106. For instance, a speech recognition module 208 provides some level of speech recognition functionality. In some implementations, this functionality may be limited to specific commands that perform fundamental tasks like waking up the device, configuring the device, and the like. The amount of speech recognition capabilities implemented on the assistant 104 is an implementation detail, but the architecture described herein can support having some speech recognition at the local assistant 104 together with more expansive speech recognition at the cloud services 116.
An acoustic echo cancellation module 210 and a double talk reduction module 212 are provided to process the audio signals to substantially cancel acoustic echoes and substantially reduce double talk that may occur. These modules may work together to identify times where echoes are present, where double talk is likely, where background noise is present, and attempt to reduce these external factors to isolate and focus on the near talker. By isolating on the near talker, better signal quality is provided to the speech recognition module 208 to enable more accurate interpretation of the speech utterances.
A query formation module 214 may also be provided to receive the parsed speech content output by the speech recognition module 208 and to form a search query or some form of request. This query formation module 214 may utilize natural language processing (NLP) tools as well as various language modules to enable accurate construction of queries based on the user's speech input.
The modules shown stored in the memory 204 are merely representative. Other modules 216 for processing the user voice input, interpreting that input, and/or performing functions based on that input may be provided.
The voice controlled assistant 104 might further include a codec 218 coupled to the microphones of the microphone array 124 and the speakers of the speaker array 128 to encode and/or decode the audio signals. The codec 218 may convert audio data between analog and digital formats. A user may interact with the assistant 104 by speaking to it, and the microphone array 124 captures the user speech. The codec 218 encodes the user speech and transfers that audio data to other components. The assistant 104 can communicate back to the user by emitting audible statements passed through the codec 218 and output through the speaker array 128. In this manner, the user interacts with the voice controlled assistant simply through speech, without use of a keyboard or display common to other types of devices.
The voice controlled assistant 104 includes a wireless unit 220 coupled to an antenna 222 to facilitate a wireless connection to a network. The wireless unit 220 may implement one or more of various wireless technologies, such as wifi, Bluetooth, RF, and so on.
A USB port 224 may further be provided as part of the assistant 104 to facilitate a wired connection to a network, or a plug-in network device that communicates with other wireless networks. In addition to the USB port 224, or as an alternative thereto, other forms of wired connections may be employed, such as a broadband connection. A power unit 226 is further provided to distribute power to the various components on the assistant 104.
The voice controlled assistant 104 is designed to support audio interactions with the user, in the form of receiving voice commands (e.g., words, phrase, sentences, etc.) from the user and outputting audible feedback to the user. Accordingly, in the illustrated implementation, there are no haptic input devices, such as navigation buttons, keypads, joysticks, keyboards, touch screens, and the like. Further there is no display for text or graphical output. In one implementation, the voice controlled assistant 104 may include non-input control mechanisms, such as basic volume control button(s) for increasing/decreasing volume, as well as power and reset buttons. There may also be a simple light element (e.g., LED) to indicate a state such as, for example, when power is on. But, otherwise, the assistant 104 does not use or need to use any input devices or displays.
Accordingly, the assistant 104 may be implemented as an aesthetically appealing device with smooth and rounded surfaces, with some apertures for passage of sound waves, and merely having a power cord and optionally a wired interface (e.g., broadband, USB, etc.). In the illustrated implementation, the assistant 104 has a housing of an elongated cylindrical shape. Apertures or slots are formed in a base end to allow emission of sound waves. A more detailed discussion of one particular structure is provided below with reference to FIG. 3. Once plugged in, the device may automatically self-configure, or with slight aid of the user, and be ready to use. As a result, the assistant 104 may be generally produced at a low cost. In other implementations, other I/O components may be added to this basic model, such as specialty buttons, a keypad, display, and the like.
FIG. 3 is a cross sectional view 400 of the voice controlled assistant 104 taken along a plane that intersects a center axis 402 and passes through a diameter of the cylindrical-shaped housing. The assistant 104 has a housing 404 with an elongated, cylindrical-shaped middle section 406 extending between a first or base end 408 and a second or top end 410. The cylindrical-shaped middle section 406 has a smooth outer surface and due to the rounded shape, the two ends 408 and 410 are circular in shape. The base end 408 is designed to rest on a surface, such as a table 108 in FIG. 1, to support the housing 404. In this position, the top end 410 is distal and upward relative to the base end 408.
One or more microphones 126 are mounted proximal to the top end 410 of the housing 404 to capture audio input, such as voice input from the user. Multiple orifices 412 are formed in the top end 410 to hold the microphones. There are many possible arrangements of the microphones in the microphone array.
FIG. 4 shows one example arrangement of microphones in the top end 410. More particularly, FIG. 4 shows a top down view of the voice controlled assistant 104 taken along line A-A to illustrate the top end 410 of the housing 404. In this example, the microphone array has seven microphones 126(1), . . . , 126(7). Six of the microphones 126(1)-(6) are placed along a circle concentric to the perimeter of the top end 410. A seventh microphone 126(7) is positioned at the center point of the circular top end 410. It is noted that this is merely one example. Arrays with more or less than seven microphones may be used, and other layouts are possible.
With reference again to FIG. 3, the housing 404 defines a hollow chamber 414 therein. Within this chamber 414 are two skeletal members: a first or lower skeletal member 416 that provides structural support for components in the lower half of the chamber 414 and a second or upper skeletal member 418 that provides structural support for components in the upper half of the chamber 414.
The computing components 132 are mounted in the upper skeletal member 418, but are not shown in FIG. 3 to better illustrate the structural arrangement of the speakers and microphones. The computing components 132 may include any number of processing and memory capabilities, as well as power, codecs, network interfaces, and so forth. Example components are shown in FIG. 2.
Two speakers are shown mounted in the housing 404. A first speaker 420 is shown mounted within the lower skeletal member 416. The first speaker 420 outputs a first range of frequencies of audio sound. In one implementation, the first speaker 420 is a mid-high frequency speaker that plays the middle to high frequency ranges in the human-perceptible audible range. A second speaker 422 is shown mounted within the upper skeletal member 416 elevationally above the first speaker 420. In this implementation, the second speaker 422 is a low frequency speaker that plays the low frequency ranges in the human-perceptible audible range. The mid-high frequency speaker 420 is smaller than the low frequency speaker 422.
The two speakers 420 and 422 are mounted in a coaxial arrangement along the center axis 402, with the low frequency speaker 422 atop the mid-high frequency speaker 420. The speakers are also coaxial along the center axis 402 to the microphone array, or more particularly, to the plane intersecting the microphone array. The middle microphone 126(7) is positioned at the center point and lies along the center axis 402. Further, the two speakers 420 and 422 are oriented to output sound in a downward direction toward the base end 408 and away from the microphones 126 mounted in the top end 410. The low frequency speaker 422 outputs sound waves that pass through one or more openings in the lower skeletal member 416. The low frequency waves may emanate from the housing in any number of directions. Said another way, in some implementations, the low frequency speaker 422 may function as a woofer to generate low frequency sound waves that flow omni-directionally from the assistant 104.
The mid-high frequency speaker 420 is mounted within a protective shielding 424, which provides a shield to the sound waves emitted from the low frequency speaker 422. Small openings or slots 426 are formed in middle section 406 of the housing 404 near the base end 402 to pass sound waves from the chamber 414, but the low frequency waves need not be constrained to these slots.
The mid-high frequency speaker 420 emits mid-high frequency sound waves in a downward direction onto a sound distribution cone 428 mounted to the base end 408. The sound distribution cone 428 is coaxially arranged in the housing 404 along the center axis 402 and adjacent to the mid-high frequency speaker 420. The sound distribution cone 428 has a conical shape with a smooth upper nose portion 430, a middle portion 432 with increasing radii from top to bottom, and a lower flange portion 434 with smooth U-shaped flange. The sound distribution cone 428 directs the mid-high frequency sound waves from the mid-high frequency speaker 420 along the smooth conical surface downward along the middle portion 432 and in a radial outward direction from the center axis 402 along the lower flange portion 434 at the base end 408 of the housing 404. The radial outward direction is substantially perpendicular to the initial downward direction of the sound along the center axis 402. In this manner, the sound distribution cone 428 essentially delivers the sound out of the base end 408 of the housing 404 symmetrical to, and equidistance from, the microphone array 124 in the top end 410 of the housing. The sound distribution cone 428 may also have the affect of amplifying the sound emitted from the mid-high frequency speaker 420.
The housing 404 has openings or slots 436 formed adjacent to the base end 408. These slots 436 permit passage of the sound waves, and particularly the high frequency sound waves, emitted from the mid-high frequency speaker 420. The slots 436 are comparatively smaller than the size or diameters of the speakers 420 and 422. However, the sound is still efficiently directed out through the slots 436 by the sound distribution cone 428. Structural posts 438 provide structural stability between the based end 408 and the middle section 406 of the housing 404.
Illustrative Operation
FIG. 5 is a flow diagram of an illustrative process 500 to operate a communication device. This process (as well as other processes described throughout) is illustrated as a logical flow graph, each operation of which represents a sequence of operations that can be implemented in hardware, software, or a combination thereof. In the context of software, the operations represent computer-executable instructions stored on one or more tangible computer-readable storage media that, when executed by one or more processors, perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the process.
For purposes of discussion, the process 500 is described with reference to the voice controlled assistant 104. However, the process may be performed by other electronic devices.
At 502, audio input is received by the one or more microphones 126(1)-(M) of the microphone array 124 at the top end 410 of the housing 404. The audio input may be speech of the near end talker, other voices in the environment, sounds from other electronics (e.g., TVs, radios, etc.), ambient noise, and so forth.
At 504, the audio input is processed. Depending upon the implementation and environment, various computing components 132 are used to process the audio input. As examples, the AEC module 210 may detect and cancel echoes in the input signal at 504A. The double talk reduction module 212 may determine the likelihood of double talk and seek to reduce or eliminate that component in the input signal at 504B. In some implementations, the filtered audio input is then transmitted to the far end talker. In other implementations, however, additional processing may be performed. For instance, once these and other non-primary components are removed from the audio input, the speech recognition module 208 can parse the resulting data in an effort to recognize the primary speech utterances from the near end talker at 504C. From this recognized speech, the query formation module 214 may form a query or request at 504D. This query or request may then be transmitted to a cloud service 116 for further processing and generation of a response and/or execution of a function.
At 506, any sound is output by the one or more speakers 130(1)-(P) mounted in the base end 408 of the housing 404. When the assistant 104 is resting on the base end 408, the sound is output in a downward direction opposite to the microphones 126(1)-(M) in the top end 410.
At 508, at least parts of the sound, particularly the mid-high frequency ranges, are redirected from the initial downward path outward in a radial direction from the base end 408 of the assistant 104. These parts of the sound are output symmetrical to, and equidistance from, the microphone array 124 in the top end 410 of the housing 404.
CONCLUSION
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as illustrative forms of implementing the claims.

Claims (26)

What is claimed is:
1. A device, comprising:
a housing;
a microphone;
a network interface configured to communicate over a network;
one or more processors; and
non-transitory computer-readable media storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:
generating first audio data representing first audio captured by the microphone;
determining, utilizing a first speech recognition component, that the first audio data represents first speech;
sending, using the network interface, the first audio data to a first remote system;
generating second audio data representing second audio captured by the microphone;
determining, utilizing a second speech recognition component, that the second audio data represents second speech, the second speech recognition component differing from the first speech recognition component; and
sending, using the network interface, the second audio data to a second remote system.
2. The device of claim 1, wherein the non-transitory computer-readable media is further configured with instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising, prior to determining that the first audio data represents first speech, performing beamforming on the first audio data.
3. The device of claim 1, wherein the non-transitory computer-readable media is further configured with instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising, prior to determining that the first audio data represents first speech, performing acoustic echo cancellation on the first audio data.
4. The device of claim 1, further comprising a light element, and wherein the non-transitory computer-readable media is further configured with instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:
causing the light element to emit first light based at least in part on determining that the first audio data represents the first speech; and
causing the light element to emit second light based at least in part on determining that the second audio data represents the second speech, the second light differing from the first light.
5. The device of claim 1, further comprising a light element, and wherein the non-transitory computer-readable media is further configured with instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising causing the light element to emit light based at least in part on at least one of determining that the first audio data represents the first speech or determining that the second audio data represents the second speech.
6. The device of claim 1, further comprising a speaker, and wherein the non-transitory computer-readable media is further configured with instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:
receiving, from the first remote system, third audio data indicating that a first action associated with the first speech has been performed; and
causing the speaker to output audio corresponding to the third audio data.
7. The device of claim 1, wherein the non-transitory computer-readable media is further configured with instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising causing the device to transition from a first state to a second state based at least in part on at least one of the first audio data representing at least a portion of the first speech or the second audio data representing at least a portion of the second speech.
8. A method, comprising:
generating first audio data representing first audio captured by a microphone of a device;
determining, utilizing a first speech recognition component of the device, that the first audio data represents first speech;
sending the first audio data to a first remote system;
generating second audio data representing second audio captured by the microphone;
determining, utilizing a second speech recognition component of the device, that the second audio data represents second speech, the second speech recognition component differing from the first speech recognition component; and
sending the second audio data to a second remote system.
9. The method of claim 8, further comprising, prior to determining that the first audio data represents first speech, performing beamforming on the first audio data.
10. The method of claim 8, further comprising, prior to determining that the first audio data represents first speech, performing acoustic echo cancellation on the first audio data.
11. The method of claim 8, further comprising:
causing a light element of the device to emit first light based at least in part on determining that the first audio data represents the first speech; and
causing the light element to emit second light based at least in part on determining that the second audio data represents the second speech, the second light differing from the first light.
12. The method of claim 8, further comprising causing a light element of the device to emit light based at least in part on at least one of determining that the first audio data represents the first speech or determining that the second audio data represents the second speech.
13. The method of claim 8, further comprising:
receiving, from the first remote system, third audio data indicating that a first action associated with the first speech has been performed; and
causing a speaker of the device to output audio corresponding to the third audio data.
14. The method of claim 8, further comprising causing the device to transition from a first state to a second state based at least in part on at least one of the first audio data representing at least a portion of the first speech or the second audio data representing at least a portion of the second speech.
15. A device, comprising:
a housing;
a microphone;
a network interface configured to communicate over a network;
one or more processors; and
non-transitory computer-readable media storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:
generating first audio data representing first audio captured by the microphone;
determining, utilizing a speech recognition component, that the first audio data represents first speech;
sending, using the network interface, the first audio data to a first remote system;
generating second audio data representing second audio captured by the microphone;
determining, utilizing the speech recognition component, that the second audio data represents second speech, the second speech differing from the first speech; and
sending, using the network interface, the second audio data to a second remote system.
16. The device of claim 15, wherein the non-transitory computer-readable media is further configured with instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising, prior to determining that the first audio data represents first speech, performing beamforming on the first audio data.
17. The device of claim 15, wherein the non-transitory computer-readable media is further configured with instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising, prior to determining that the first audio data represents first speech, performing acoustic echo cancellation on the first audio data.
18. The device of claim 15, further comprising a light element, and wherein the non-transitory computer-readable media is further configured with instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:
causing the light element to emit first light based at least in part on determining that the first audio data represents the first speech; and
causing the light element to emit second light based at least in part on determining that the second audio data represents the second speech, the second light differing from the first light.
19. The device of claim 15, further comprising a light element, and wherein the non-transitory computer-readable media is further configured with instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising causing the light element to emit light based at least in part on determining that the first audio data represents the first speech or determining that the second audio data represents the second speech.
20. The device of claim 15, further comprising a speaker, and wherein the non-transitory computer-readable media is further configured with instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:
receiving, from the first remote system, third audio data indicating that a first action associated with the first speech has been performed; and
causing the speaker to output audio corresponding to the third audio data.
21. A method, comprising:
generating first audio data representing first audio captured by a microphone of a device;
determining, utilizing a speech recognition component of the device, that the first audio data represents first speech;
sending the first audio data to a first remote system;
generating second audio data representing second audio captured by the microphone;
determining, utilizing the speech recognition component, that the second audio data represents second speech, the second speech differing from the first speech; and
sending the second audio data to a second remote system.
22. The method of claim 21, further comprising, prior to determining that the first audio data represents first speech, performing beamforming on the first audio data.
23. The method of claim 21, further comprising, prior to determining that the first audio data represents first speech, performing acoustic echo cancellation on the first audio data.
24. The method of claim 21, further comprising:
causing a light element of the device to emit first light based at least in part on determining that the first audio data represents the first speech; and
causing the light element to emit second light based at least in part on determining that the second audio data represents the second speech, the second light differing from the first light.
25. The method of claim 21, further comprising causing a light element of the device to emit light based at least in part on determining that the first audio data represents the first speech or determining that the second audio data represents the second speech.
26. The method of claim 21, further comprising:
receiving, from the first remote system, third audio data indicating that a first action associated with the first speech has been performed; and
causing a speaker of the device to output audio corresponding to the third audio data.
US15/930,967 2012-06-01 2020-05-13 Voice controlled assistant with coaxial speaker and microphone arrangement Active 2032-08-15 US11270706B1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US15/930,967 US11270706B1 (en) 2012-06-01 2020-05-13 Voice controlled assistant with coaxial speaker and microphone arrangement
US17/027,155 US11521624B1 (en) 2012-06-01 2020-09-21 Voice controlled assistant with coaxial speaker and microphone arrangement
US18/074,798 US12014742B1 (en) 2012-06-01 2022-12-05 Voice controlled assistant with coaxial speaker and microphone arrangement

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US13/486,774 US9060224B1 (en) 2012-06-01 2012-06-01 Voice controlled assistant with coaxial speaker and microphone arrangement
US14/738,669 US9390724B2 (en) 2012-06-01 2015-06-12 Voice controlled assistant with coaxial speaker and microphone arrangement
US15/207,249 US9837083B1 (en) 2012-06-01 2016-07-11 Voice controlled assistant with coaxial speaker and microphone arrangement
US15/649,256 US10283121B1 (en) 2012-06-01 2017-07-13 Voice controlled assistant with coaxial speaker and microphone arrangement
US16/397,931 US10657970B1 (en) 2012-06-01 2019-04-29 Voice controlled assistant with coaxial speaker and microphone arrangement
US15/930,967 US11270706B1 (en) 2012-06-01 2020-05-13 Voice controlled assistant with coaxial speaker and microphone arrangement

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US16/397,931 Continuation US10657970B1 (en) 2012-06-01 2019-04-29 Voice controlled assistant with coaxial speaker and microphone arrangement

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/027,155 Continuation US11521624B1 (en) 2012-06-01 2020-09-21 Voice controlled assistant with coaxial speaker and microphone arrangement

Publications (1)

Publication Number Publication Date
US11270706B1 true US11270706B1 (en) 2022-03-08

Family

ID=53280009

Family Applications (8)

Application Number Title Priority Date Filing Date
US13/486,774 Active 2033-07-24 US9060224B1 (en) 2012-06-01 2012-06-01 Voice controlled assistant with coaxial speaker and microphone arrangement
US14/738,669 Active US9390724B2 (en) 2012-06-01 2015-06-12 Voice controlled assistant with coaxial speaker and microphone arrangement
US15/207,249 Active US9837083B1 (en) 2012-06-01 2016-07-11 Voice controlled assistant with coaxial speaker and microphone arrangement
US15/649,256 Active US10283121B1 (en) 2012-06-01 2017-07-13 Voice controlled assistant with coaxial speaker and microphone arrangement
US16/397,931 Active US10657970B1 (en) 2012-06-01 2019-04-29 Voice controlled assistant with coaxial speaker and microphone arrangement
US15/930,967 Active 2032-08-15 US11270706B1 (en) 2012-06-01 2020-05-13 Voice controlled assistant with coaxial speaker and microphone arrangement
US17/027,155 Active 2032-10-22 US11521624B1 (en) 2012-06-01 2020-09-21 Voice controlled assistant with coaxial speaker and microphone arrangement
US18/074,798 Active US12014742B1 (en) 2012-06-01 2022-12-05 Voice controlled assistant with coaxial speaker and microphone arrangement

Family Applications Before (5)

Application Number Title Priority Date Filing Date
US13/486,774 Active 2033-07-24 US9060224B1 (en) 2012-06-01 2012-06-01 Voice controlled assistant with coaxial speaker and microphone arrangement
US14/738,669 Active US9390724B2 (en) 2012-06-01 2015-06-12 Voice controlled assistant with coaxial speaker and microphone arrangement
US15/207,249 Active US9837083B1 (en) 2012-06-01 2016-07-11 Voice controlled assistant with coaxial speaker and microphone arrangement
US15/649,256 Active US10283121B1 (en) 2012-06-01 2017-07-13 Voice controlled assistant with coaxial speaker and microphone arrangement
US16/397,931 Active US10657970B1 (en) 2012-06-01 2019-04-29 Voice controlled assistant with coaxial speaker and microphone arrangement

Family Applications After (2)

Application Number Title Priority Date Filing Date
US17/027,155 Active 2032-10-22 US11521624B1 (en) 2012-06-01 2020-09-21 Voice controlled assistant with coaxial speaker and microphone arrangement
US18/074,798 Active US12014742B1 (en) 2012-06-01 2022-12-05 Voice controlled assistant with coaxial speaker and microphone arrangement

Country Status (1)

Country Link
US (8) US9060224B1 (en)

Families Citing this family (142)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9060224B1 (en) 2012-06-01 2015-06-16 Rawles Llc Voice controlled assistant with coaxial speaker and microphone arrangement
US8971543B1 (en) * 2012-06-25 2015-03-03 Rawles Llc Voice controlled assistant with stereo sound from two speakers
CN110882094A (en) 2013-03-15 2020-03-17 威廉·L·亨特 Devices, systems, and methods for monitoring hip replacements
US20160192878A1 (en) 2013-06-23 2016-07-07 William L. Hunter Devices, systems and methods for monitoring knee replacements
WO2015123658A1 (en) 2014-02-14 2015-08-20 Sonic Blocks, Inc. Modular quick-connect a/v system and methods thereof
CN112190236A (en) 2014-09-17 2021-01-08 卡纳里医疗公司 Devices, systems, and methods for using and monitoring medical devices
CN108810732B (en) * 2014-09-30 2020-03-24 苹果公司 Loudspeaker
USRE49437E1 (en) 2014-09-30 2023-02-28 Apple Inc. Audio driver and power supply unit architecture
US9574762B1 (en) * 2014-09-30 2017-02-21 Amazon Technologies, Inc. Light assemblies for electronic devices containing audio circuitry
US9641919B1 (en) * 2014-09-30 2017-05-02 Amazon Technologies, Inc. Audio assemblies for electronic devices
USD793347S1 (en) * 2015-09-03 2017-08-01 Interactive Voice, Inc. Voice controlled automation assistant
US9826599B2 (en) * 2015-12-28 2017-11-21 Amazon Technologies, Inc. Voice-controlled light switches
KR101726815B1 (en) * 2016-02-01 2017-04-17 주식회사 이베스트 Microphone device with a speaker
US9965247B2 (en) 2016-02-22 2018-05-08 Sonos, Inc. Voice controlled media playback system based on user profile
US9772817B2 (en) 2016-02-22 2017-09-26 Sonos, Inc. Room-corrected voice detection
US10509626B2 (en) 2016-02-22 2019-12-17 Sonos, Inc Handling of loss of pairing between networked devices
US10095470B2 (en) 2016-02-22 2018-10-09 Sonos, Inc. Audio response playback
US10264030B2 (en) 2016-02-22 2019-04-16 Sonos, Inc. Networked microphone device control
US9947316B2 (en) 2016-02-22 2018-04-17 Sonos, Inc. Voice control of a media playback system
WO2017165717A1 (en) 2016-03-23 2017-09-28 Canary Medical Inc. Implantable reporting processor for an alert implant
US11191479B2 (en) 2016-03-23 2021-12-07 Canary Medical Inc. Implantable reporting processor for an alert implant
CN109313465A (en) * 2016-04-05 2019-02-05 惠普发展公司,有限责任合伙企业 The audio interface docked for multiple microphones and speaker system with host
US9978390B2 (en) 2016-06-09 2018-05-22 Sonos, Inc. Dynamic player selection for audio signal processing
US10853761B1 (en) 2016-06-24 2020-12-01 Amazon Technologies, Inc. Speech-based inventory management system and method
US11315071B1 (en) * 2016-06-24 2022-04-26 Amazon Technologies, Inc. Speech-based storage tracking
WO2018005895A1 (en) * 2016-06-29 2018-01-04 Oneview Controls, Inc. Common distribution of audio and power signals
US10134399B2 (en) 2016-07-15 2018-11-20 Sonos, Inc. Contextualization of voice inputs
US10152969B2 (en) 2016-07-15 2018-12-11 Sonos, Inc. Voice detection by multiple devices
US10115400B2 (en) 2016-08-05 2018-10-30 Sonos, Inc. Multiple voice services
US10063929B1 (en) 2016-09-01 2018-08-28 Nufbee Llc Community controlled audio entertainment system
US10798044B1 (en) 2016-09-01 2020-10-06 Nufbee Llc Method for enhancing text messages with pre-recorded audio clips
US10587950B2 (en) 2016-09-23 2020-03-10 Apple Inc. Speaker back volume extending past a speaker diaphragm
US9942678B1 (en) 2016-09-27 2018-04-10 Sonos, Inc. Audio playback settings for voice interaction
US9743204B1 (en) 2016-09-30 2017-08-22 Sonos, Inc. Multi-orientation playback device microphones
US10181323B2 (en) 2016-10-19 2019-01-15 Sonos, Inc. Arbitration-based voice recognition
US10726835B2 (en) * 2016-12-23 2020-07-28 Amazon Technologies, Inc. Voice activated modular controller
US20180213396A1 (en) * 2017-01-20 2018-07-26 Essential Products, Inc. Privacy control in a connected environment based on speech characteristics
KR102580418B1 (en) * 2017-02-07 2023-09-20 삼성에스디에스 주식회사 Acoustic echo cancelling apparatus and method
US10764474B2 (en) 2017-03-02 2020-09-01 Amazon Technologies, Inc. Assembly for electronic devices
US11183181B2 (en) 2017-03-27 2021-11-23 Sonos, Inc. Systems and methods of multiple voice services
KR20180118461A (en) * 2017-04-21 2018-10-31 엘지전자 주식회사 Voice recognition module and and voice recognition method
USD864466S1 (en) 2017-05-05 2019-10-22 Hubbell Incorporated Lighting fixture
TWI617197B (en) 2017-05-26 2018-03-01 和碩聯合科技股份有限公司 Multimedia apparatus and multimedia system
US10474417B2 (en) 2017-07-20 2019-11-12 Apple Inc. Electronic device with sensors and display devices
US10475449B2 (en) 2017-08-07 2019-11-12 Sonos, Inc. Wake-word detection suppression
US10553235B2 (en) * 2017-08-28 2020-02-04 Apple Inc. Transparent near-end user control over far-end speech enhancement processing
JP2019041359A (en) 2017-08-29 2019-03-14 オンキヨー株式会社 Speaker device
US10048930B1 (en) 2017-09-08 2018-08-14 Sonos, Inc. Dynamic computation of system response volume
US10446165B2 (en) 2017-09-27 2019-10-15 Sonos, Inc. Robust short-time fourier transform acoustic echo cancellation during audio playback
US10482868B2 (en) 2017-09-28 2019-11-19 Sonos, Inc. Multi-channel acoustic echo cancellation
US10051366B1 (en) 2017-09-28 2018-08-14 Sonos, Inc. Three-dimensional beam forming with a microphone array
US10621981B2 (en) 2017-09-28 2020-04-14 Sonos, Inc. Tone interference cancellation
US10466962B2 (en) 2017-09-29 2019-11-05 Sonos, Inc. Media playback system with voice assistance
US10880650B2 (en) 2017-12-10 2020-12-29 Sonos, Inc. Network microphone devices with automatic do not disturb actuation capabilities
US10818290B2 (en) 2017-12-11 2020-10-27 Sonos, Inc. Home graph
CA3086506A1 (en) 2017-12-20 2019-06-27 Hubbell Incorporated Voice responsive in-wall device
EP3729650B1 (en) 2017-12-20 2024-06-12 Hubbell Incorporated Gesture control for in-wall device
USD877121S1 (en) 2017-12-27 2020-03-03 Yandex Europe Ag Speaker device
RU2707149C2 (en) 2017-12-27 2019-11-22 Общество С Ограниченной Ответственностью "Яндекс" Device and method for modifying audio output of device
EP3776169A4 (en) 2017-12-29 2022-01-26 Polk Audio, LLC Voice-control soundbar loudspeaker system with dedicated dsp settings for voice assistant output signal and mode switching method
USD927433S1 (en) 2018-01-05 2021-08-10 Hubbell Incorporated Front panel of in-wall fan controller with indicator component
WO2019139991A1 (en) 2018-01-09 2019-07-18 Polk Audio, Llc System and method for generating an improved voice assist algorithm signal input
JP6984420B2 (en) * 2018-01-09 2021-12-22 トヨタ自動車株式会社 Dialogue device
US11343614B2 (en) 2018-01-31 2022-05-24 Sonos, Inc. Device designation of playback and network microphone device arrangements
US20190236208A1 (en) * 2018-02-01 2019-08-01 Nano Shield Technology Co., Ltd. Smart speaker with music recognition
US11337061B2 (en) 2018-03-15 2022-05-17 Ways Investments, LLC System, method, and apparatus for virtualizing digital assistants
US11044364B2 (en) 2018-03-15 2021-06-22 Ways Investments, LLC System, method, and apparatus for providing help
WO2019177804A1 (en) 2018-03-15 2019-09-19 Ways Investments, LLC System, method, and apparatus for providing help
US10674014B2 (en) 2018-03-15 2020-06-02 Ways Investments, LLC System, method, and apparatus for providing help
CN108461083B (en) * 2018-03-23 2024-06-21 北京小米移动软件有限公司 Electronic equipment main board, audio processing method and device and electronic equipment
US11146871B2 (en) * 2018-04-05 2021-10-12 Apple Inc. Fabric-covered electronic device
US10978056B1 (en) 2018-04-20 2021-04-13 Facebook, Inc. Grammaticality classification for natural language generation in assistant systems
US11115410B1 (en) 2018-04-20 2021-09-07 Facebook, Inc. Secure authentication for assistant systems
US11010436B1 (en) 2018-04-20 2021-05-18 Facebook, Inc. Engaging users by personalized composing-content recommendation
US11715042B1 (en) 2018-04-20 2023-08-01 Meta Platforms Technologies, Llc Interpretability of deep reinforcement learning models in assistant systems
US11886473B2 (en) 2018-04-20 2024-01-30 Meta Platforms, Inc. Intent identification for agent matching by assistant systems
US10770089B2 (en) 2018-05-10 2020-09-08 Caterpillar Inc. Sound dampening and pass through filtering
US11175880B2 (en) 2018-05-10 2021-11-16 Sonos, Inc. Systems and methods for voice-assisted media content selection
US10847178B2 (en) 2018-05-18 2020-11-24 Sonos, Inc. Linear filtering for noise-suppressed speech detection
US10959029B2 (en) 2018-05-25 2021-03-23 Sonos, Inc. Determining and adapting to changes in microphone performance of playback devices
US10681460B2 (en) 2018-06-28 2020-06-09 Sonos, Inc. Systems and methods for associating playback devices with voice assistant services
US10896295B1 (en) 2018-08-21 2021-01-19 Facebook, Inc. Providing additional information for identified named-entities for assistant systems
US10949616B1 (en) 2018-08-21 2021-03-16 Facebook, Inc. Automatically detecting and storing entity information for assistant systems
CN110867191B (en) 2018-08-28 2024-06-25 洞见未来科技股份有限公司 Speech processing method, information device and computer program product
US11076035B2 (en) 2018-08-28 2021-07-27 Sonos, Inc. Do not disturb feature for audio notifications
US10461710B1 (en) 2018-08-28 2019-10-29 Sonos, Inc. Media playback system with maximum volume setting
US10878811B2 (en) 2018-09-14 2020-12-29 Sonos, Inc. Networked devices, systems, and methods for intelligently deactivating wake-word engines
US10587430B1 (en) 2018-09-14 2020-03-10 Sonos, Inc. Networked devices, systems, and methods for associating playback devices based on sound codes
US11024331B2 (en) 2018-09-21 2021-06-01 Sonos, Inc. Voice detection optimization using sound metadata
US10811015B2 (en) 2018-09-25 2020-10-20 Sonos, Inc. Voice detection optimization based on selected voice assistant service
US11100923B2 (en) 2018-09-28 2021-08-24 Sonos, Inc. Systems and methods for selective wake word detection using neural network models
US10692518B2 (en) 2018-09-29 2020-06-23 Sonos, Inc. Linear filtering for noise-suppressed speech detection via multiple network microphone devices
US11899519B2 (en) 2018-10-23 2024-02-13 Sonos, Inc. Multiple stage network microphone device with reduced power consumption and processing load
JP2020071764A (en) * 2018-11-01 2020-05-07 東芝テック株式会社 Instruction management apparatus and control program thereof
EP3654249A1 (en) 2018-11-15 2020-05-20 Snips Dilated convolutions and gating for efficient keyword spotting
US11183183B2 (en) 2018-12-07 2021-11-23 Sonos, Inc. Systems and methods of operating media playback systems having multiple voice assistant services
CN111312253A (en) 2018-12-11 2020-06-19 青岛海尔洗衣机有限公司 Voice control method, cloud server and terminal equipment
US11132989B2 (en) 2018-12-13 2021-09-28 Sonos, Inc. Networked microphone devices, systems, and methods of localized arbitration
US10602268B1 (en) 2018-12-20 2020-03-24 Sonos, Inc. Optimization of network microphone devices using noise classification
KR102196628B1 (en) 2019-01-30 2020-12-30 주식회사 오투오 Type-c universal serial bus digital media player device
US10867604B2 (en) 2019-02-08 2020-12-15 Sonos, Inc. Devices, systems, and methods for distributed voice processing
US11315556B2 (en) 2019-02-08 2022-04-26 Sonos, Inc. Devices, systems, and methods for distributed voice processing by transmitting sound data associated with a wake word to an appropriate device for identification
US10979810B2 (en) * 2019-03-19 2021-04-13 Amazon Technologies, Inc. Electronic device
US11120794B2 (en) 2019-05-03 2021-09-14 Sonos, Inc. Voice assistant persistence across multiple network microphone devices
US10586540B1 (en) 2019-06-12 2020-03-10 Sonos, Inc. Network microphone device with command keyword conditioning
US11200894B2 (en) 2019-06-12 2021-12-14 Sonos, Inc. Network microphone device with command keyword eventing
US11361756B2 (en) 2019-06-12 2022-06-14 Sonos, Inc. Conditional wake word eventing based on environment
US11442992B1 (en) 2019-06-28 2022-09-13 Meta Platforms Technologies, Llc Conversational reasoning with knowledge graph paths for assistant systems
US11657094B2 (en) 2019-06-28 2023-05-23 Meta Platforms Technologies, Llc Memory grounded conversational reasoning and question answering for assistant systems
US11477568B2 (en) * 2019-07-12 2022-10-18 Lg Electronics Inc. Voice input apparatus
US11138975B2 (en) 2019-07-31 2021-10-05 Sonos, Inc. Locally distributed keyword detection
US10871943B1 (en) 2019-07-31 2020-12-22 Sonos, Inc. Noise classification for event detection
US11138969B2 (en) 2019-07-31 2021-10-05 Sonos, Inc. Locally distributed keyword detection
KR20210025812A (en) 2019-08-28 2021-03-10 삼성전자주식회사 Electronic apparatus, display apparatus and method for controlling thereof
USD947152S1 (en) 2019-09-10 2022-03-29 Yandex Europe Ag Speaker device
US11567788B1 (en) 2019-10-18 2023-01-31 Meta Platforms, Inc. Generating proactive reminders for assistant systems
US11861674B1 (en) 2019-10-18 2024-01-02 Meta Platforms Technologies, Llc Method, one or more computer-readable non-transitory storage media, and a system for generating comprehensive information for products of interest by assistant systems
USD947137S1 (en) 2019-10-22 2022-03-29 Hubbell Incorporated Front panel of in-wall fan controller with indicator component
US11189286B2 (en) 2019-10-22 2021-11-30 Sonos, Inc. VAS toggle based on device orientation
US20210136471A1 (en) * 2019-11-01 2021-05-06 Microsoft Technology Licensing, Llc Audio device
US11200900B2 (en) 2019-12-20 2021-12-14 Sonos, Inc. Offline voice control
US11562740B2 (en) 2020-01-07 2023-01-24 Sonos, Inc. Voice verification for media playback
US11556307B2 (en) 2020-01-31 2023-01-17 Sonos, Inc. Local voice data processing
US11308958B2 (en) 2020-02-07 2022-04-19 Sonos, Inc. Localized wakeword verification
US11562744B1 (en) 2020-02-13 2023-01-24 Meta Platforms Technologies, Llc Stylizing text-to-speech (TTS) voice response for assistant systems
US11159767B1 (en) 2020-04-07 2021-10-26 Facebook Technologies, Llc Proactive in-call content recommendations for assistant systems
US11727919B2 (en) 2020-05-20 2023-08-15 Sonos, Inc. Memory allocation for keyword spotting engines
US11482224B2 (en) 2020-05-20 2022-10-25 Sonos, Inc. Command keywords with input detection windowing
US11308962B2 (en) 2020-05-20 2022-04-19 Sonos, Inc. Input detection windowing
US11658835B2 (en) 2020-06-29 2023-05-23 Meta Platforms, Inc. Using a single request for multi-person calling in assistant systems
US11698771B2 (en) 2020-08-25 2023-07-11 Sonos, Inc. Vocal guidance engines for playback devices
US11984123B2 (en) 2020-11-12 2024-05-14 Sonos, Inc. Network device interaction by range
US11563706B2 (en) 2020-12-29 2023-01-24 Meta Platforms, Inc. Generating context-aware rendering of media contents for assistant systems
US11809480B1 (en) 2020-12-31 2023-11-07 Meta Platforms, Inc. Generating dynamic knowledge graph of media contents for assistant systems
US11551700B2 (en) 2021-01-25 2023-01-10 Sonos, Inc. Systems and methods for power-efficient keyword detection
US11861315B2 (en) 2021-04-21 2024-01-02 Meta Platforms, Inc. Continuous learning for natural-language understanding models for assistant systems
US12118790B2 (en) 2021-04-21 2024-10-15 Meta Platforms, Inc. Auto-capture of interesting moments by assistant systems
US12045568B1 (en) 2021-11-12 2024-07-23 Meta Platforms, Inc. Span pointer networks for non-autoregressive task-oriented semantic parsing for assistant systems
US11641505B1 (en) * 2022-06-13 2023-05-02 Roku, Inc. Speaker-identification model for controlling operation of a media player
US12081936B2 (en) * 2022-07-22 2024-09-03 Dell Products Lp Method and apparatus for a fragmented radial sound box hub device
US11983329B1 (en) 2022-12-05 2024-05-14 Meta Platforms, Inc. Detecting head gestures using inertial measurement unit signals
US12112001B1 (en) 2023-03-14 2024-10-08 Meta Platforms, Inc. Preventing false activations based on don/doff detection for assistant systems

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4574906A (en) 1984-11-15 1986-03-11 Audio Technica U.S., Inc. Outdoor speaker
US6765971B1 (en) 2000-08-08 2004-07-20 Hughes Electronics Corp. System method and computer program product for improved narrow band signal detection for echo cancellation
US6826533B2 (en) * 2000-03-30 2004-11-30 Micronas Gmbh Speech recognition apparatus and method
US20050207591A1 (en) 2001-09-14 2005-09-22 Sony Corporation Audio input unit, audio input method and audio input and output unit
US7227566B2 (en) 2003-09-05 2007-06-05 Sony Corporation Communication apparatus and TV conference apparatus
US7277566B2 (en) 2002-09-17 2007-10-02 Riken Microscope system
US20070263845A1 (en) 2006-04-27 2007-11-15 Richard Hodges Speakerphone with downfiring speaker and directional microphones
US7418392B1 (en) 2003-09-25 2008-08-26 Sensory, Inc. System and method for controlling the operation of a device by voice commands
US7720683B1 (en) 2003-06-13 2010-05-18 Sensory, Inc. Method and apparatus of specifying and performing speech recognition operations
WO2011088053A2 (en) 2010-01-18 2011-07-21 Apple Inc. Intelligent automated assistant
US20120223885A1 (en) 2011-03-02 2012-09-06 Microsoft Corporation Immersive display experience
US8385557B2 (en) 2008-06-19 2013-02-26 Microsoft Corporation Multichannel acoustic echo reduction
US9060224B1 (en) 2012-06-01 2015-06-16 Rawles Llc Voice controlled assistant with coaxial speaker and microphone arrangement

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7346315B2 (en) * 2004-03-30 2008-03-18 Motorola Inc Handheld device loudspeaker system
US7769408B2 (en) * 2006-06-21 2010-08-03 Sony Ericsson Mobile Communications Ab Mobile radio terminal having speaker port selection and method
US20180336892A1 (en) * 2017-05-16 2018-11-22 Apple Inc. Detecting a trigger of a digital assistant

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4574906A (en) 1984-11-15 1986-03-11 Audio Technica U.S., Inc. Outdoor speaker
US6826533B2 (en) * 2000-03-30 2004-11-30 Micronas Gmbh Speech recognition apparatus and method
US6765971B1 (en) 2000-08-08 2004-07-20 Hughes Electronics Corp. System method and computer program product for improved narrow band signal detection for echo cancellation
US20050207591A1 (en) 2001-09-14 2005-09-22 Sony Corporation Audio input unit, audio input method and audio input and output unit
US7277566B2 (en) 2002-09-17 2007-10-02 Riken Microscope system
US7720683B1 (en) 2003-06-13 2010-05-18 Sensory, Inc. Method and apparatus of specifying and performing speech recognition operations
US7227566B2 (en) 2003-09-05 2007-06-05 Sony Corporation Communication apparatus and TV conference apparatus
US7418392B1 (en) 2003-09-25 2008-08-26 Sensory, Inc. System and method for controlling the operation of a device by voice commands
US7774204B2 (en) * 2003-09-25 2010-08-10 Sensory, Inc. System and method for controlling the operation of a device by voice commands
US20070263845A1 (en) 2006-04-27 2007-11-15 Richard Hodges Speakerphone with downfiring speaker and directional microphones
US7925004B2 (en) 2006-04-27 2011-04-12 Plantronics, Inc. Speakerphone with downfiring speaker and directional microphones
US8385557B2 (en) 2008-06-19 2013-02-26 Microsoft Corporation Multichannel acoustic echo reduction
WO2011088053A2 (en) 2010-01-18 2011-07-21 Apple Inc. Intelligent automated assistant
US20120016678A1 (en) * 2010-01-18 2012-01-19 Apple Inc. Intelligent Automated Assistant
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US20120223885A1 (en) 2011-03-02 2012-09-06 Microsoft Corporation Immersive display experience
US9060224B1 (en) 2012-06-01 2015-06-16 Rawles Llc Voice controlled assistant with coaxial speaker and microphone arrangement
US20150279387A1 (en) 2012-06-01 2015-10-01 Rawles Llc Voice Controlled Assistant with Coaxial Speaker and Microphone Arrangement

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
Office action for U.S. Appl. No. 13/486,774, dated Sep. 8, 2014, List, "Voice Controlled Assistant with Coaxial Speaker and Microphone Arrangement", 11 pages.
Office action for U.S. Appl. No. 14/738,669, dated Feb. 11, 2016, List, "Voice Controlled Assistant with Coaxial Speaker and Microphone Arrangement", 6 pages.
Office action for U.S. Appl. No. 14/738,669, dated Sep. 24, 2015, List, "Voice Controlled Assistant with Coaxial Speaker and Microphone Arrangement", 9 pages.
Office Action for U.S. Appl. No. 15/207,249, dated Feb. 1, 2017, List, "Voice Controlled Assistant With Coaxial Speaker and Microphone Arrangement", 15 pages.
Office action for U.S. Appl. No. 15/207,249, dated Sep. 28, 2016, List, "Voice Controlled Assistant With Coaxial Speaker and Microphone Arrangement", 13 pages.
Office action for U.S. Appl. No. 15/649,256, dated Mar. 22, 2018, List, "Voice Controlled Assistant With Coaxial Speaker and Microphone Arrangement", 6 pages.
Office action for U.S. Appl. No. 15/649,256, dated Oct. 17, 2018, List, "Voice Controlled Assistant With Coaxial Speaker and Microphone Arrangement", 11 pages.
Office Action for U.S. Appl. No. 16/397,931, dated Sep. 19, 2019, List, "Voice Controlled Assistant With Coaxial Speaker and Microphone Arrangement", 10 pages.
Pinhanez, "The Everywhere Displays Projector: A Device to Create Ubiquitous Graphical Interfaces", IBM Thomas Watson Research Center, Ubicomp 2001, 18 pages.

Also Published As

Publication number Publication date
US11521624B1 (en) 2022-12-06
US12014742B1 (en) 2024-06-18
US9837083B1 (en) 2017-12-05
US20150279387A1 (en) 2015-10-01
US9060224B1 (en) 2015-06-16
US9390724B2 (en) 2016-07-12
US10657970B1 (en) 2020-05-19
US10283121B1 (en) 2019-05-07

Similar Documents

Publication Publication Date Title
US12014742B1 (en) Voice controlled assistant with coaxial speaker and microphone arrangement
US10123119B1 (en) Voice controlled assistant with stereo sound from two speakers
US11488591B1 (en) Altering audio to improve automatic speech recognition
US11974082B2 (en) Audio assemblies for electronic devices
US11501792B1 (en) Voice controlled system
US11455994B1 (en) Identifying a location of a voice-input device
US9659577B1 (en) Voice controlled assistant with integrated control knob
US11600271B2 (en) Detecting self-generated wake expressions
US9087520B1 (en) Altering audio based on non-speech commands
US11287565B1 (en) Light assemblies for electronic devices
US10887710B1 (en) Characterizing environment using ultrasound pilot tones
US9466286B1 (en) Transitioning an electronic device between device states
US9799329B1 (en) Removing recurring environmental sounds
US11862153B1 (en) System for recognizing and responding to environmental noises
US9805721B1 (en) Signaling voice-controlled devices
US10438582B1 (en) Associating identifiers with audio signals

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE