WO2019136460A1

WO2019136460A1 - Synchronized voice-control module, loudspeaker system and method for incorporating vc functionality into a separate loudspeaker system

Info

Publication number: WO2019136460A1
Application number: PCT/US2019/012738
Authority: WO
Inventors: Matthew P. LYONS; Bradley M. Starobin; John Crisco
Original assignee: Polk Audio, Llc
Priority date: 2018-01-08
Filing date: 2019-01-08
Publication date: 2019-07-11
Also published as: EP3776880A4; EP3776880A1

Abstract

A system (400) and method for incorporating Voice-Controlled ("VC") or smart speaker system features into a separate high-performance host loudspeaker system (604) includes a "Smart Puck" VC module (404) programmed to incorporate VC functionality into the separate host loudspeaker system (604). The Smart Puck VC module (404) includes a microphone array (424) and senses noise stimuli emitted by the separate host loudspeaker system (604) in a user's listening space which is used to derive a complex transfer function (TF) between the host loudspeaker system and the microphone array, and the captured TF is utilized to improve the VC speaker system's sound in the listening space.

Description

Synchronized Voice-Control Module, Loudspeaker System and Method for Incorporating VC Functionality into a Separate

Loudspeaker System

BACKGROUND OF THE INVENTION

Priority Claim and Reference to Related Applications:

[0001] This application claims priority to and benefit of US Provisional Application No. 62/614,726, filed January 8, 2018 by Matthew LYONS and Bradley M.

STAROBIN, and entitled“Synchronized Voice-Control Module, Loudspeaker System and Method for Incorporating VC Functionality Into Existing Bespoke or High- performance Loudspeaker Systems” the disclosure of which is hereby incorporated herein by reference.

Field of the Invention:

[0002] The present invention relates to Voice Controlled (“VC”) media playback systems or Smart Speakers adapted to receive and respond to a user’s spoken commands.

Discussion of the Prior Art:

[0003] Music listeners listen to music over compact audio reproduction systems such as the applicant’s l-Sonic® music playback system (see, e.g., U.S. Patent 7817812) which enables them to enjoy surprisingly high-fidelity playback in an easy- to-place, aesthetically pleasing product. The widespread adoption of home Wi-Fi systems has led to the use of a wide variety of“connected” audio products, including voice-controlled“Smart” speaker systems such as the Google Home™, Amazon Echo™, or the Apple HomePod™ systems which are becoming commonplace not only in homes but in a wide variety of other locations. Voice-controlled (“VC”) or “smart” speakers incorporate microphones to pick up a user’s voice commands, include components to connect to the user’s Wi-Fi system and may include components that can be used to control the user’s smart equipment. Known prior art VC smart speakers, sold as Amazon Echo™“voice-controlled assistants,” are described and illustrated in US Patents Nos. 8971543 and 9060224 (assigned to Rawles, LLC). This prior art is illustrated in Figs 1A-1C of the present application, which show the elements of exemplary (typical) Amazon Echo™ system

architectures and are included for purposes of establishing the background of the present invention, and the nomenclature used to describe such devices.

[0004] Some VC speaker systems (e.g., the Amazon Echo™ 104, Google Home™ or Cortana™ VC speaker systems) are capable of running existing third party voice- based software (“chat-bots”) or assistant applications (e.g., Skills™ or Actions™) and can respond to a user’s spoken commands with voice-based synthesized audible responses generated as part of Voice Assistance (“VA”) operations. In these devices, the VC speaker senses or detects user-spoken trigger phrases (i.e., “wake” words or phrases) or commands and generates an audible VA reply or acknowledgement in response. Amazon’s VA or voice software system is known as “Alexa”; Google’s VA or voice software system may be summoned by“Hey Google” and Apple’s VA or voice software system may be summoned by addressing“Siri”. Each of these VA systems is programmable to respond to a user’s“wake word” or response-triggering phrase, whereupon the VA takes over control of the VC speaker and responds to the user with an audible response or reply.

[0005] VC loudspeaker systems reproduce several types of audio program material, including music, movie soundtracks, news, podcasts, etc., but many of the VC speakers currently offered do a mediocre job of reproducing anything more sonically demanding than news reports. Most loudspeaker products with voice- control systems embedded within them, though desirable for obvious reasons, present some unwelcome issues for many consumers. The prior art Amazon Echo™ and similar VC speaker devices fail to recognize when and how the host system is operating and offer nothing that would allow the user to alter the

performance characteristics of the VC device in order to optimize overall system performance.

[0006] There is a need, therefore, for a convenient, user-friendly, flexible and unobtrusive system and method for high performance loudspeaker system users and music listeners who are acclimated to VC or“smart” speaker systems and want improved VC functionality in high performance home theater loudspeaker systems or soundbars. SUMMARY OF THE INVENTION

[0007] Briefly, the present invention is directed to an improved Voice-Controlled (“VC”) Loudspeaker system and method which incorporates a synchronized VC module or“smart puck” programmed to provide a method for incorporating VC functionality into separate high-performance host loudspeaker systems.

[0008] More particularly, the invention is directed to an improved VC loudspeaker system and method for incorporating VC functionality into separate yet synchronized high-performance host loudspeaker systems, where the synchronized VC module is programmed to optimize its own performance and the performance of the separate host loudspeaker system. In accordance with the invention, the synchronized VC module not only accounts for its characteristic acoustic environment, but also recognizes when and how the host system is operating, while providing a

mechanism to alter its performance characteristics in order to optimize overall VC system performance.

[0009] Broadly speaking, an exemplary embodiment of the present invention would be an improved or enhanced smart or Voice-Controlled (“VC”) loudspeaker system incorporating a synchronized Voice-Control (“VC”)“Smart Puck” device configured to operate in conjunction with a separate host loudspeaker system. For example, a suitable separate high-performance loudspeaker system would be as full featured as one of applicant’s existing high performance Soundbar systems. The invention further is directed to a method for incorporating VC functionality into other separate host loudspeaker systems.

[0010] The applicant for the present invention has developed many great sounding feature-rich loudspeaker and audio reproduction devices and systems that are adapted for use with a users’ Wi-Fi system. Such devices and systems incorporate DSP elements programmable to achieve specific sonic goals for specific audio program playback applications. US Patents 9,277,044, 9,374,640, 9,584,935, 9,706,320, 9,767,786 and 9,807,484 provide useful context and background for the present invention and are incorporated herein in their entireties by reference.

[0011] The VC speaker system of the present invention includes at least one synchronized Voice-Control (“VC”)“Smart Puck” module which incorporates a Controller or Computing module with a pre-programmed DSP system to provide optimized sound quality during audio program playback as well as subjectively pleasant-sounding Voice Assistance response as perceived by user when used, for example, with an Amazon Alexa or a Google voice device. This result is obtained regardless of the audio settings that the user may have selected on the separate high-performance host Loudspeaker system even when those settings might otherwise inhibit VA intelligibility.

[0012] When compared to a conventional voice-control or voice assistant module, the synchronized VC module of present invention provides several important advantages. In accordance with conventional methods for measuring in-room acoustic transfer functions which commonly rely on sine-sweep or pink noise stimuli, the“Smart Puck” synchronized VC module can be initiated to capture the complex Transfer Function (“TF”) of audio signals between a separate host loudspeaker (such as a soundbar) and a microphone array on the Smart Puck housing. The captured TFs, most efficiently expressed as Finite Impulse Response (FIR) filter parameters, may be utilized to improve the Smart Puck’s recognition of the user’s voice queries or commands. That the Smart Puck and separate host audio system are synchronized - indeed, requests for program material should be expected to pass through the Puck - means that the Smart Puck synchronized VC module will be able to monitor both the program broadcast from the separate host

loudspeaker(s) and user voice commands at any given time.

[0013] To further improve the Smart Puck synchronized VC module’s recognition of voice queries, transfer functions that model room effects at the location of the Smart Puck also are derived over a range of DSP settings via acquisition routines involving noise stimuli emitted by the loudspeaker and captured by the microphone array. Then, the inverse of the transfer function that best matches the current state of the separate Host speaker system’s DSP settings (e.g., using Polk Audio’s DSP Audio enhancement modes, master volume = 70%, bass = +2, Voice Adjust =+1 , Movie Mode, etc.) is recalled from the Smart Puck’s memory and imposed on the signals acquired by the Smart Puck microphone array as a means of“subtracting” the separate host loudspeaker’s contribution from the microphone array’s acquired signal, thereby greatly improving the“signal” (voice commands/queries) to“noise” (separate host loudspeaker output) ratio and permitting greatly improved responsiveness to a user’s voice commands. Additionally, as a sophisticated listening device, the Smart Puck synchronized VC module can monitor ambient environmental noise apart from program material, and by virtue of the methods and system elements described and illustrated in Polk Audio’s commonly owned Patent 9,767,786 (Starobin, Lyons, et al, the entire disclosure of which is hereby

incorporated herein by reference), a“micro quiet zone” centered about the smart puck can be established for purposes of further improving the signal to noise ratio associated with voice commands.

[0014] Another aspect of the system and method of the present invention pertains to the use of the Smart Puck synchronized VC module as means for capturing the separate host loudspeaker system’s acoustic frequency response at the primary listening location in conjunction with“room-smoothing” algorithms such as

commercially available ones (e.g. Audissey™) or Polk Audio’s propriety algorithm (as described in commonly owned Patent No. 8,194,874 to Starobin, Lyon, et al , the entire disclosure of which is also hereby incorporated herein by reference).

[0015] Another advantageous aspect of the system and method of the present invention exploits the smart puck’s utility as a microphone array capable of integrating with a host soundbar-subwoofer system for purposes of determining the location of the soundbar and subwoofer relative to a customer’s primary listening position. Once the subwoofer’s location is known relative to the soundbar, certain DSP settings may be modified in order to optimize overall system performance.

[0016] In summary, then, an enhanced Voice-Controlled (“VC”) Loudspeaker system and method in accordance with the invention includes providing a Smart Puck synchronized VC module with a microphone array and configured to generate an audio output linked to a separate host loudspeaker system, where the Smart Puck synchronized VC module is programmed to provide a method for incorporating VC functionality into the separate host loudspeaker system. The system and method further includes programming the Smart Puck synchronized VC module to optimize its performance with the separate host loudspeaker system, the programming including, for example, the steps of determining the characteristic acoustic

environment of the system, determining when and how the host loudspeaker system is operating, and altering the performance characteristics of the Smart Puck synchronized VC module in response thereto to optimize overall VC system performance. It will be understood that the host loudspeaker system is separate and remote from the VC module; that is, it is a separate loudspeaker which may incorporate a soundbar and may also include a subwoofer.

[0017] The Smart Puck synchronized VC module of the present invention incorporates a processor linked to the microphone array and to the separate host loudspeaker system and is further linked to remote entities by way of a network for obtaining responses to signals from the Smart Puck synchronized VC module’s microphone array and directing the responses to the separate host loudspeaker system. The processor includes a microprocessor incorporating suitable memory and operating system components connected to a pre-programmed digital signal processing (DSP) engine to provide optimized sound quality during audio program playback as well as subjectively pleasant-sounding Voice Assistance response as perceived by a user.

[0018] The host loudspeaker system may include user controls for providing system settings, and the Smart Puck synchronized VC module’s DSP system is preprogrammed to account for such system settings. The processor includes an operating system that is configured and programmed to operate the DSP engine and to manage hardware such as a wireless unit, a USB unit, and a Codec within the synchronized module as well as to operate a variety of control element modules in the processor.

[0019] In accordance with the method of the present invention, an enhanced Voice-Controlled (“VC”) Loudspeaker system incorporates a microphone array, a synchronized VC module linked to the array and having an audio output linked to an existing remotely located bespoke or high-performance host loudspeaker system, where the Smart Puck synchronized VC module includes a processor programmed to provide a method for incorporating VC functionality into the separate host loudspeaker system, and operates by capturing (at the microphone array, through the use of acquisition routines) noise stimuli emitted by the separate host

loudspeaker system and deriving from such stimuli a complex transfer function (TF) between the host loudspeaker system and the microphone array. This captured TF, most efficiently expressed as Finite Impulse Response (FIR) filter parameters, is utilized to improve recognition of voice queries in the system.

[0020] In addition, the method further includes providing a digital signal processing (DSP) engine preprogrammed to include a range of transfer functions corresponding to the separate host loudspeaker’s system settings, capturing transfer functions TF that reflect room effects derived over a range of preprogrammed DSP settings, recalling from memory the inverse of the transfer function that best matches the current state of the DSP settings, and imposing on the signals acquired by the microphone array the best match inverse signals to subtract the separate host loudspeaker’s contribution from user voice signals acquired by the microphone array, thereby greatly improving the signal to noise ratio of voice commands/queries signals to loudspeaker output noise to provide greatly improved responsiveness to user voice commands. The method may further include monitoring ambient environmental noise apart from desired program material produced by the host loudspeaker system and modifying the DSP settings in accordance with the monitored environmental noise to provide a“micro quiet zone” centered about the modulator for further improving the signal to noise ratio associated with voice commands.

[0021] The method further includes capturing the acoustic frequency response of the loudspeaker system at the user’s primary listening location in conjunction with “room-smoothing” algorithms and modifying by inverse magnitude shaping the DSP settings in accordance with the captured acoustic frequency response to optimize system performance. Alternatively, the method may modify the DSP settings in accordance with the captured acoustic frequency response by imposing time- delayed correction signals to optimize system performance.

[0022] The method of the invention may also include determining, in a host loudspeaker system containing a sound bar SB and a subwoofer SW, the location of the sound bar relative to the subwoofer and modifying the DSP settings in

accordance with the determined relative locations to optimize system performance.

In addition, the steps of determining the location of the soundbar and subwoofer relative to a user’s primary listening position and modifying DSP settings accordingly may also be provided in order to optimize system performance. [0023] The system and the method of operating the present invention enable the user of a bespoke or high-performance audio or home theater system (a host loudspeaker system) incorporating, for example, a high performance soundbar to add VC functionality without requiring the user to replace the entire bespoke or high- performance loudspeaker system. Instead, the invention adds a synchronized module or smart puck which is compatible with an existing (e.g., the Amazon

Echo™) system architecture and which may be configured as a puck-like Wi-Fi enabled device with a host of capabilities prompted via voice control. The

synchronized module is also configured to permit the user to optimize its

performance with the bespoke or high performance (“host”) loudspeaker system not only to account for the characteristic acoustic environment, but also to recognize when and how the host system is operating and to alter its performance

characteristics in order to optimize overall VC and loudspeaker system performance.

BRIEF DESCRIPTION OF THE DRAWINGS

[0024] The following detailed description of the present invention is taken in conjunction with reference to the following drawings, wherein the same reference numbers in in the different Figures indicate similar or identical components:

[0025] Figs. 1A-1C illustrate typical Voice-Control (“VC”) speaker architectures and Methods, in accordance with the Prior Art.

[0026] Fig 2 is a diagram illustrating a synchronized voice-controlled“smart puck” module and loudspeaker system architecture and method for incorporating VC functionality into a separate high performance loudspeaker system, in accordance with the present invention.

[0027] Fig 3 is a block diagram illustrate the synchronized voice-controlled“smart puck” module and system architecture of Fig. 2 illustrating the interconnected processor components for incorporating VC functionality into a separate high performance loudspeaker system, in accordance with the present invention.

DESCRIPTION OF PREFERRED EMBODIMENTS

[0028] The following description, taken with the accompanying drawings, disclose specific, exemplary embodiments for an improved or enhanced smart or voice- controlled (“VC”) loudspeaker system incorporating a synchronized voice-controlled (“VC”) module, a host loudspeaker system and a method for incorporating VC functionality into a separate high performance host loudspeaker systems including features similar to those found in applicant’s existing high performance Soundbar systems, in accordance with the present invention.

VC Speaker Architectures and Nomenclature

[0029] In order to place the present invention in its proper context and to set forth some VC speaker system architecture nomenclature, reference is made to the prior art illustrated in Figs. 1A-1C which are further described and illustrated in US

Patents Nos. 8971543 and 9060224. These Figures are illustrative of typical VC speaker systems such as those sold as the Amazon Echo™“voice-controlled assistant”; Fig. 1A thus illustrates a first exemplary (typical) prior art system

architecture 100 set in an exemplary VC speaker use environment 102 which includes a typical VC speaker system 104 and at least a first user 106. The user 106 is typically near or proximal to VC speaker system 104 in the use environment 102.

In this illustration, the VC speaker system 104 is physically positioned on a table 108 within the environment 102 and is shown sitting upright and supported on its first base or bottom end.

[0030] The VC speaker system 104 as illustrated is communicatively coupled to one or more remote entities 110 over a network 112. The remote entities 110 may include individual people, such as person 114, or automated systems (not shown) that serve as far end talkers to verbally interact with the user 106. Additionally, or alternatively the remote entities 110 may comprise cloud services 116 hosted, for example, on one or more servers 118(1 ) . . . 118(S). These servers 118(1 )-(S) may be arranged in any number of ways, such as server farms, stacks, and the like that are commonly used in data centers. The cloud services 116 generally refer to a network accessible platform implemented as a computing infrastructure of

processors, storage, software, data access, and so forth that is maintained and accessible via a network 112 such as the Internet. Cloud services 116 do not require end-user knowledge of the physical location and configuration of the system that delivers the services. Common expressions associated with cloud services include "on-demand computing", "Software as a Service (SaaS)", "platform computing", "network accessible platform", and so forth. [0031] The cloud services 116 may host any number of applications that can process the user input received from the VC speaker system 104 and produce a suitable response. Examples of typical applications might include web browsing, online shopping, banking, email, work tools, productivity, entertainment, educational, and so forth. In Fig 1A, user 106 is shown communicating with the remote entities 110 via VC speaker system 104. When activated, a VC speaker system 104 Voice Assist module outputs an audible question such as, "What do you want to do?" as represented by dialog bubble 120. This output may represent a question from a far end talker 114, or from a cloud service 116 (e.g., an entertainment service). The user 106 is shown replying to the question by stating, "I'd like to buy tickets to a movie" as represented by the dialog bubble 122.

[0032] The VC speaker system 104 (or voice-controlled assistant 104) is equipped with an array 124 of microphones 126(1) . . . 126(M) to receive the voice input from the user 106 as well as any other audio sounds in the environment 102. The microphones 126(1) - (M) are generally arranged at a second or top end of the VC speaker system 104 opposite the base end seated on the table 108. Although multiple microphones are illustrated, in some implementations, the VC speaker system 104 may be embodied with only one microphone. The VC speaker system 104 may further include a speaker array 128 of speakers 130(1) . . . 130(P) to output sounds in humanly perceptible frequency ranges. The speakers 130(1) - (P) may be configured to emit sounds at various frequency ranges, so that each speaker has a different range. In this manner, the VC speaker system 104 may output high frequency signals, mid frequency signals, and low frequency signals. The speakers 130(1) - (P) are generally arranged at the first or base end of the VC speaker system 104 and are oriented to emit the sound in a downward direction toward the base end and in an outward direction generally opposite to or away from the microphone array 124 in the top end.

[0033] The voice-controlled assistant or VC speaker system 104 may further include computing components 132 that process the voice input received by the microphone array 124, enable communication with the remote entities 110 over the network 112, and generate the audio to be output by the speaker array 128. The computing components 132 are generally positioned between the microphone array 123 and the speaker array 128, although essentially any other arrangement may be used. In the Fig 1A architecture 100, the VC speaker system 104 may be configured to produce stereo or non-stereo output. The speakers 130(1) - (P) may receive a mono signal for output in a non-stereo configuration. Alternatively, the computing components 132 may generate as an output to the speakers 130(1) - (P) two different channel signals for stereo output. In this stereo configuration, a first channel signal (e.g., left channel signal) is provided to one of the speakers, such as the larger speaker 130(1). A second channel signal (e.g., right channel signal) is provided to the other of the speakers, such as the smaller speaker 130(P). Due to the vertically stacked arrangement of the speakers, however, the two-channel stereo output may not be appreciated by the user 106.

[0034] Fig. 1 B illustrates at 200 another implementation of voice interactive computing architecture that is similar to the architecture 100 of Fig. 1A, but in this illustration a voice-controlled assistant or VC speaker system 204 has a different physical packaging layout. In this embodiment, speaker system 204 has a laterally spaced arrangement of the speakers to better provide stereo output, rather the vertically stacked arrangement found in the system 104 of Fig. 1A. More particularly, the speakers 130(1) - (P) are shown at a horizontally spaced distance from one another. Optionally, VC speaker system 204 is able to play full spectrum stereo using only two speakers of different sizes.

[0035] In Fig 1 B, the VC speaker system 204 is communicatively coupled over the network 112 to an entertainment service 206 that is part of the cloud services 116. The entertainment service 206 is hosted on one or more servers, such as servers 208(1) . . . 208(K), which may be arranged in any number of configurations, such as server farms, stacks, and the like that are commonly used in data centers. The entertainment service 206 may be configured to stream or otherwise download entertainment content, such as movies, music, audio books, and the like to the voice-controlled assistant. When audio content is involved, the voice-controlled assistant 204 can play the audio in stereo with full spectrum sound quality, even though the device has a small form factor and only two speakers. In this scenario, the user 106 is shown using the audible statement, "Pause the music" (in dialog bubble 210) to direct the VC speaker system 204 to pause the music being played. To support this scenario, the VC speaker system 204 is not only designed to play music in full spectrum stereo, but is also configured with an acoustic echo

cancellation (AEC) module to cancel audio components being received at the microphone array 124 so that the VC speaker system 204 can clearly hear the statements and commands spoken by the user 106.

[0036] Fig. 1C shows selected functional components of the voice-controlled assistant or VC speaker systems 104 and 204 in more detail. Generally, each of the VC speaker systems 104 and 204 may be implemented as a standalone device that is relatively simple in terms of functional capabilities with limited input/output components, memory, and processing capabilities. For instance, the VC speaker systems 104 and 204 do not have a keyboard, keypad, or other form of mechanical input. Nor do they have a display or touch screen to facilitate visual presentation and user touch input. Instead, the assistants 104 and 204 may be implemented with the ability to receive and transmit audio signals, a network interface (wireless or wire- based), power input, and limited processing/memory capabilities.

[0037] In the implementations shown in Figs 1A-1C, each VC speaker system 104/204 includes the microphone array 124, a speaker array 128, a processor 302, and memory 304. The microphone array 124 is used to capture speech input from the user 106, or other sounds in the environment 102. The speaker array 128 is used to output speech from a far end talker, audible responses provided by the cloud services, forms of entertainment (e.g., music, audible books, etc.), or any other form of sound. The speaker array 128 produces a wide range of output audio frequencies including both human perceptible and non-perceptible frequencies. In one implementation, the speaker array 128 is formed of two speakers capable of outputting full spectrum stereo sound, as will be described below in more detail. Two speaker array arrangements are shown, including the vertically stacked arrangement 128A and the horizontally spaced arrangement 128B.

[0038] The memory 304 (Fig. 1 C) may include computer-readable storage media ("CRSM"), which may be any available physical media accessible by the processor 302 to execute instructions stored on the memory. In one basic implementation, CRSM may include random access memory ("RAM") and Flash memory. In other implementations, CRSM may include, but is not limited to, read-only memory ("ROM"), electrically erasable programmable read-only memory ("EEPROM"), or any other medium which can be used to store the desired information and which can be accessed by the processor 302. Several modules such as instruction, datastores, and so forth may be stored within the memory 304 and configured to execute on the processor 302. An operating system module 306 is configured to manage hardware and services (e.g., wireless unit, USB, Codec) within and coupled to the assistant 104/204 for the benefit of other modules. Several other modules may be provided to process verbal input from the user 106. For instance, a speech recognition module 308 provides some level of speech recognition functionality. In some

implementations, this functionality may be limited to specific commands that perform fundamental tasks like waking up the device, configuring the device, and the like.

The amount of speech recognition capability included on the VC speaker system 104/204 is an implementation detail, but the architecture described herein can support having some speech recognition at the local VC speaker system 104/204 together with more expansive speech recognition at the cloud service 116.

[0039] An acoustic echo cancellation module 310 and a double talk reduction module 312 are provided to process the audio signals to substantially cancel acoustic echoes and substantially reduce double talk that might occur. These modules may work together to identify times where echoes are present, where double talk is likely, or where background noise is present, and attempt to reduce these external factors to isolate and focus on the“near talker” (i.e. , user 106). By isolating on the near talker, better signal quality is provided to the speech recognition module 308 to enable more accurate interpretation of the speech utterances. A query formation module 314 may also be provided to receive the parsed speech content output by the speech recognition module 308 and to form a search query or some form of request. This query formation module 314 may utilize natural language processing (NLP) tools as well as various language modules to enable accurate construction of queries based on the user's speech input.

[0040] The modules shown stored in the memory 304 are merely representative. Other modules 316 for processing the user voice input, interpreting that input, and/or performing functions based on that input may be provided. The voice controlled assistant 104/204 might further include a codec 318 coupled to the microphones of the microphone array 124 and the speakers of the speaker array 128 to encode and/or decode the audio signals. The codec 318 may convert audio data between analog and digital formats. In this case, a user interacts with the assistant 104/204 by speaking to it, and the microphone array 124 receives the user speech. The codec 318 encodes the user speech and transfers that audio data to other components. The assistant 104/204 can communicate back to the user by emitting audible statements passed through the codec 318 and output through the speaker array 128. In this manner, the user interacts with the voice-controlled assistant simply through speech, without use of a keyboard or display common to other types of devices.

[0041] The VC speaker system or voice controlled assistant 104/204 includes a wireless unit 320 coupled to an antenna 322 to facilitate a wireless connection to network 112. The wireless unit 320 may implement one or more of various wireless technologies, such as Wi-Fi, Bluetooth, RF, and so on. A USB port 324 may further be provided as part of the assistant 104/204 to facilitate a wired connection to a network, or a plug-in network device that communicates with other wireless networks. In addition to the USB port 324, or as an alternative thereto, other forms of wired connections may be employed, such as a broadband connection. A power unit 326 is further provided to distribute power to the various components on the assistant 104/204. A stereo component 328 is optionally provided to output stereo signals to the various speakers in the speaker array 128.

The“Smart Puck” VC Speaker System and Method of the Present Invention

[0042] Turning now to the system of the present invention as illustrated in Figs 2 and 3, the VC speaker system 400 of the present invention system incorporates a Voice Controlled (“VC”) Smart Puck synchronized module 404 that is synchronized with and responsive to a separate high performance loudspeaker system 604 (which may be referred to as a“host” loudspeaker system) such that the Smart Puck 404 and the separate host loudspeaker system 604 are linked. Separate high

performance host loudspeaker system 604 may incorporate a selected set of the applicant’s commonly owned loudspeaker performance improving developments, including those adapted for use with users’ Wi-Fi systems and incorporating DSP elements programmable to achieve specific sonic goals for specific audio program playback applications US Patents 9,277,044; 9,374,640; 9,584,935; 9,706,320; 9,767,786 and 9,807,484 provide useful context and background for this aspect of the present invention and are hereby incorporated herein in their entireties by reference.

[0044] As illustrated in Figs. 2 and 3 of the accompanying drawings, the VC speaker system 400 of the present invention system incorporates a Voice Controlled (“VC”) Smart Puck synchronized module 404 that is preferably configured as a compact puck-shaped product having a housing with a base 406 that rests on a support such as a table 408 and a top 410 which carries and aims an array 424 of multiple (e.g., eight) microphones. Smart Puck VC module 404 preferably is not limited to any particular industrial design and is synchronized with and connected to a separate high performance loudspeaker system 604 (which may be referred to as a“host” loudspeaker system) such that the Smart Puck 404 and the separate host loudspeaker system 604 are linked in several particular ways that permit superior responsiveness to voice commands 122 when compared to a conventional voice- controlled puck and its incorporated loudspeaker system. The synchronized voice control module 404 of system 400 includes a digital processor 432 including components such as a controller or computing module 434 having a pre- programmed DSP system to provide optimized sound quality on the user’s host bespoke or high-performance loudspeaker product 604 during audio program playback. The DSP module also provides a subjectively pleasant-sounding Voice Assistance response as perceived by a user when used, for example, with VC devices such as an Amazon Alexa or a Google voice device. This result is obtained regardless of the audio settings that the user may have selected on the user’s bespoke or high-performance loudspeaker product even when those settings might otherwise inhibit VA intelligibility.

[0045] The system and the method of operating the present invention enable the user of a separate loudspeaker system 604 (e.g., optionally an existing bespoke or high-performance audio or home theater system which is configured or

reconfigurable for synchronization with Smart Puck VC module 404) incorporating, for example, a high performance soundbar to add VC functionality to a high- performance loudspeaker system. Instead, the invention adds a synchronized module which is compatible with, for example, the Amazon Echo™ architecture and which may be configured as the puck-like Wi-Fi enabled device 404 with a host of capabilities prompted via voice control. The Smart Puck synchronized VC module 404 is also configured to incorporate control modules that optimize its performance with the separate host loudspeaker system 604 not only to account for the

characteristic acoustic environment, but also to recognize when and how the separate host loudspeaker system 604 is operating and to alter its performance characteristics in order to optimize overall VC and loudspeaker system performance.

[0046] More particularly, in the illustrated embodiment of the invention, the VC speaker system illustrated in Fig. 2 at 400 has at least one Smart Puck synchronized VC module 404. Fig. 2 also illustrates the Smart Puck synchronized VC module 404 in an enlarged diagrammatic view at 404 as incorporating a controller or computing module 432 containing processor components further illustrated in detail in Fig. 3. Computing module 432 incorporates a pre-programmed DSP system 434 (as illustrated in Figs 2-3) which, as is known in the art, provides optimized sound quality during audio program playback as well as subjectively pleasant sounding Voice Assistance response (as indicated at 120) as perceived by user 106 when Smart Puck synchronized VC module 404 is connected to the user’s high performance loudspeaker product 604 and when used with, for example, an Amazon Alexa or a Google voice assist, regardless of the audio settings that the user may have selected and which might otherwise inhibit VA intelligibility.

[0047] The system 400, smart puck module 404, and the method of the present invention allow the user 106 of the existing high-performance audio or home theater loudspeaker system 604, which may, for example, be a conventional soundbar SB, may include a separate subwoofer SW, and may include conventional user controls 608, to add VC functionality without requiring the user to replace an entire separate high-performance loudspeaker system (e.g., such as a separate high performance soundbar-subwoofer system 604). This is accomplished through the addition of the synchronized VC module 404 which is compatible with, for example, the Amazon Echo™ architecture and is connectable to the remote (i.e. separate) host

loudspeaker system 604 either by a direct wired connection or wirelessly (e.g., by a Bluetooth connection via a TX/RX module retrofitted (or included) in separate host loudspeaker system 604 as indicated at 606). The synchronized module 404 may be configured as a puck-shaped cylindrical Wi-Fi enabled device with a large number of capabilities prompted by voice control. The synchronized module 404 is also configured for optimized performance with the host loudspeaker system 604 by incorporating components which not only account for the characteristic acoustic environment 102, but also recognize and account for when and how the host system 604 is operating. By doing this the performance characteristics of the Smart Puck synchronized VC module 404 are altered to optimize overall VC loudspeaker system performance. Furthermore, the audio performance of the VC speaker system 400 is enhanced by incorporation of pre-programmed features in synchronized module 404, as will be described.

[0048] When compared to a conventional voice-control or voice assistant module or smart puck, the system 400 and synchronized VC module 404 of present invention provide several important advantages. In accordance with conventional methods for measuring in-room acoustic transfer functions, which commonly rely on sine-sweep or pink noise stimuli, the Smart Puck synchronized VC module 404 may optionally be initiated to sense and record and so capture signal(s) revealing the complex Transfer Function (“TF”) between its host loudspeaker 604 and the puck’s microphone array. The captured TFs, most efficiently expressed as Finite Impulse Response (FIR) filters, may be utilized to improve recognition of user’s voice queries 122. That the puck 404 and separate host loudspeaker system 604 are

synchronized - indeed, requests for program material should be expected to pass through the puck 404 - means that the puck will be able to monitor the program at any given time. Through the use of acquisition routines involving noise stimuli emitted by the loudspeaker 604 and captured by the puck’s microphone array, transfer functions that also reflect room effects are derived over a range of DSP settings. Then, the inverse of the transfer function that best matches the current state of the separate host loudspeaker’s DSP settings (e.g. master volume = 70%, bass = +2, Voice Adjust =+1 , Movie Mode, etc.) is recalled from the puck’s memory and imposed on the signals acquired by the microphone array. This DSP

programmed method or procedure“subtracts” the host loudspeaker’s contribution from the signals acquired by the microphone array, thereby greatly improving the “signal” (voice commands/queries) to“noise” (loudspeaker output) ratio and permitting greatly improved responsiveness to voice commands. This method provides an enhanced method for Acoustic or Automatic Echo Cancellation (“AEC”).

[0049] Additionally, as a sophisticated listening device Smart Puck synchronized VC module 404 may optionally be used to monitor ambient environmental noise apart from program material, and by virtue of the methods and system elements described and illustrated in Polk’s commonly owned Patent No. 9,767,786 (to Starobin, Lyons, et al, the entire disclosure of which is hereby incorporated herein by reference), a“micro quiet zone” centered about Smart Puck synchronized VC module 404 can be established for purposes of further improving the signal to noise ratio associated with voice commands. Another aspect of the system 400 and method of the present invention pertains to the use of the smart puck 404 for capturing the loudspeaker system’s acoustic frequency response at a user’s primary listening location in conjunction with“room-smoothing” algorithms such as

commercially available ones (e.g. Audissey™) or Polk Audio’s propriety one as described commonly owned Patent No. 8,194,874 to Starobin, Lyon, et al , the entire disclosure of which is hereby incorporated herein by reference.

[0050] A final advantageous aspect of system 400 and method of the present invention further exploits the puck’s utility as a microphone array capable of integrating with a host soundbar-subwoofer system for the purpose of determining the location the soundbar and subwoofer relative to a customer’s primary listening position. Once the subwoofer’s location is known relative to the soundbar, certain DSP settings may be modified in order to optimize system performance.

[0051] Figs 2 and 3 illustrate selected functional components of the Smart Puck synchronized VC module 404 which are utilized to carry out the method of the present invention. These components are preferably implemented as a standalone device that is relatively simple in terms of functional capabilities, requiring only limited input/output components, memory, and processing capabilities with the ability to receive and output audio, and including a network interface (wireless or wire- based), power, and limited processing/memory capabilities. In the illustrated implementation, each synchronized module 404 includes the microphone array 424 and a computing/communications/audio processor 432. [0052] As best seen in Fig 3, processor 432 includes a microprocessor or controller 800 and a memory 802. The microphone array 424 is used to sense and capture speech input 122 from the user 106 or other sounds in the environment 102. The synchronized module 404 uses separate host loudspeaker system 604 to output speech (e.g., 120) from a far end talker, audible responses provided by the cloud services, forms of entertainment (e.g., music, audible books, etc.), or any other form of sound. The separate host loudspeaker system 604 may output a wide range of audio frequencies and in one implementation, the host speaker 604 in the illustrated example comprises a soundbar SB and subwoofer SW (Fig. 2) which may be connected to module 404 by way of the wired or wireless connection 606.

[0053] The memory component 802 of processor 432 may include computer- readable storage media ("CRSM"), which may be any available physical media accessible by the microprocessor 800 to execute instructions stored in the memory.

In one basic implementation, the CRSM may include both random access memory ("RAM") and Flash memory. In other implementations, the CRSM may include, but is not limited to, read-only memory ("ROM"), electrically erasable programmable read- only memory ("EEPROM"), or any other medium which can be used to store the desired information, and which can be accessed by the microprocessor 800. Several programming modules such as instructions, datastores, and so forth may be stored within the memory 802 and configured to execute on the microprocessor 800.

[0054] The processor 432 incorporates an operating system 804 configured and programmed to operate the Digital Signal Processing (“DSP”) engine 434 and to manage hardware and services (e.g., wireless unit 806, USB unit 808, Codec 810) within synchronized module 404 for the benefit of other control elements 812-832 in the processor. The control elements and their interconnections as shown in the diagram of Fig. 3 are merely representative; other elements or sub-circuits for processing the user voice input, interpreting that input, and/or performing functions based on that input may be provided. The codec unit 810 illustrated as part of the synchronized module 404 converts audio data between analog and digital formats and is coupled to the microphones of the microphone array 424 and to the processor to encode the audio received signals and similarly is connected via the

communications module 822 to the host loudspeaker 604 to decode the audio signals to be broadcast.

[0055] A user 106 may interact with the Smart Puck synchronized VC module 404 by speaking to it, the microphone array 424 receives the user speech 122, and the codec unit encodes the user speech and transfers that audio data to other components via the operating system for use in carrying out commands or responding to queries, for example, in known manner. The Smart Puck synchronized VC module 404 can then communicate back to the user by way of the audio output module 816 and the communication module 822 which produce signals that are converted to audible statements 120 by passing through the codec and being connectable to the host loudspeaker system 604 either by a direct wired connection or wirelessly as via a Bluetooth connection indicated at 606 to be output, or broadcast, through the host speaker 604. In this manner, the user interacts with the VC loudspeaker system 400 and synchronized module 404 simply through speech 122, without use of the keyboard or display that is common in other types of devices.

[0056] Other elements, such as the unwanted sound ID module 828, the

cancellation signal ID module, and the cancellation signal generating module 832 cooperate with the microprocessor 800 and the DSP component 434 to cancel extraneous environmental sounds that would otherwise degrade the response of the system to voice commands or queries. The onboard sensor element 818 and the sensor manager module 820 control various sensor components that may be incorporated in the module 404.

[0057] The wireless unit 806 of the VC speaker system’s synchronized module or smart puck 404 is coupled to an antenna 840 to facilitate a wireless connection to the network 112. The wireless unit 806 may implement one or more of various wireless technologies, such as Wi-Fi, Bluetooth, RF, and so on. The USB port 808 may be provided as part of the synchronized module 404 to facilitate a wired connection to a network, or a plug-in network device that communicates with other wireless networks. In addition to the USB port, or as an alternative thereto, other forms of wired connections may be employed, such as a broadband connection. The power supply module, or unit 812 is provided to distribute power to the various components in synchronized module 404. A stereo component may be provided optionally in the communication module 822 to produce stereo signals to the host speaker 604 or other speakers.

[0058] The VC speaker system 400 and synchronized module 404 are designed to support audio interactions with the user 106, in the form of receiving voice

commands (e.g., 122, words, phrase, sentences, etc.) from the user and outputting audible feedback to the user. In one implementation, the synchronized module 404 may include non-input control mechanisms, such as basic volume control button(s) for increasing/decreasing volume, as well as power and reset buttons as a part of the user interface module 814. There may also be a simple light element (e.g., LED) to indicate a state such as, for example, when power is on. But otherwise, synchronized module 404 does not need any input devices or displays to perform its functions.

[0059] The method of the present invention further exploits the utility of Smart Puck synchronized VC module 404 as a microphone array capable of integrating with a separate host soundbar-subwoofer system 604 by providing suitable processor components such as the location ID module 826 for the purpose of determining the location the soundbar and subwoofer loudspeaker system 604 relative to a customer’s primary listening position. As noted above, the processor 432

incorporates digital processor components such as a controller or computing module with a pre-programmed DSP system 434 to provide optimized sound quality on the user’s bespoke or high-performance loudspeaker product, or host

loudspeaker, during audio program playback as well as subjectively pleasant sounding Voice Assistance response as perceived by a user when used, for example, with VC devices such as an Amazon Alexa or a Google voice device. This result is obtained regardless of the audio settings that the user may have selected via the interface module 814 or via controls on the user’s bespoke or high- performance loudspeaker product even when those settings might otherwise inhibit VA intelligibility. Once the subwoofer’s location is known relative to the soundbar, certain DSP settings may be modified, as discussed above, in order to optimize system performance.

[0060] In a method for measuring in-room acoustic transfer functions (which may rely on sine-sweep or pink noise stimuli), the Smart Puck’s Cancellation signal generation module 832 is initiated to generate signal(s) which can then be sensed and captured to determine the complex Transfer Function (“TF”) signal(s) between separate host loudspeaker system 604 and the puck’s microphone array. The captured TFs, most efficiently expressed as Finite Impulse Response (FIR) filters, may be utilized to improve recognition of voice queries. Since the puck and separate host loudspeaker system 604 are synchronized, the puck will be able to monitor the program at any given time. Through the use of acquisition routines involving noise stimuli emitted by separate host loudspeaker system 604 and captured by the puck’s microphone array, the unwanted sound ID module 828, Cancellation signal ID module 830 and Cancellation signal generation module 832 are used in this method to derive transfer functions that also reflect room effects over a range of DSP settings. Then, the inverse of the transfer function that best matches the current state of the separate host loudspeaker system’s DSP settings (e.g. master volume = 70%, bass = +2, Voice Adjust =+1 , Movie Mode, etc.) is recalled from puck memory unit 802 and imposed on the signals acquired by the microphone array 424. This procedure“subtracts” the loudspeaker’s contribution from the signals acquired by the microphone array, thereby greatly improving the“signal” (voice

commands/queries) to“noise” (loudspeaker output) ratio and permitting greatly improved responsiveness to voice commands.

[0061] Additionally, since the Smart Puck synchronized VC module 404 functions as a sophisticated listening device the puck can monitor ambient environmental noise apart from the desired program material. By virtue of the methods and system elements described and illustrated in commonly owned Patent No. 9,767,786 to Starobin, Lyons, et al, a“micro quiet zone" centered about the puck 404 can be established for purposes of further improving the signal to noise ratio associated with voice commands. Another aspect of the system 400 and the method of the present invention enables the smart puck 404 to capture the loudspeaker system’s acoustic frequency response at the primary listening location in conjunction with“room- smoothing” algorithms such as commercially available ones (e.g. Audissey™) or Polk Audio’s propriety one as described in commonly owned Patent No. 8,194,874 to Starobin, Lyon, et al.

[0062] More than one smart puck 404 may be used in a system 400 with more than one host speaker system 604 and once plugged in, each synchronized module 404 may automatically self-configure, or may be configured with a slight amount of assistance by the user and be ready to use. As a result, the VC speaker system 400 with synchronized module 404 is very convenient to use and may be installed and set up for use at a low cost. When more than one Smart Puck synchronized VC module 404 is employed within a system, the Puck that most clearly“hears” a voice command 122 shall control the associated separate host loudspeaker system 604; any other Puck(s) shall cede control to that which hears the particular command best.

[0063] There are five major aspects to the system 400 and to the method of the present invention, as will be described. First, the system incorporates Smart Puck synchronized VC module 404 that is preferably configured as a compact puck- shaped product having a housing with a base that rests on a support which carries and orients an array 424 of multiple (e.g., eight) microphones in microphone array 424; Smart Puck synchronized VC module 404 is configured and programmed to employ beamforming and aiming capabilities when sensing with microphone array 424. Synchronized VC module 404 preferably is not limited to any particular industrial design and is synchronized with a host loudspeaker system such that the smart puck 404 and the host loudspeaker 604 are linked in several particular ways that permit superior responsiveness to voice commands 122 relative to a

conventional voice-controlled puck linked to a loudspeaker system.

[0064] In use, and as part of the method of the invention, as long as audio content reproduced by the host loudspeakers 604 is selected via connection to the puck 404, primarily by direct voice command 122, then it is possible to electro-acoustically extract from the real-time output of the microphone array 424 those signals which are associated with the audio output from the loudspeakers 604, as modified by room effects, such as resonances, reflections, diffractions, etc. A procedure that establishes the relationship, or transfer function, between the loudspeakers’ audio output and the microphone array’s received input signal(s) is conducted for purposes of acquiring the complex Transfer Function (“TF”) between the host loudspeaker and the puck’s microphone array signal(s). The captured TFs, most efficiently expressed as Finite Impulse Response (FIR) filters, are then appropriately imposed, in inverse fashion, on the known audio signal reproduced by the loudspeaker system as a means of extracting the loudspeaker audio signals from the open microphone array’s signals. Furthermore, any loudspeaker system settings, (such as master volume, Bass, Voice Adjust™, etc.) and the various program modes (Movie, Music, Sports, etc.), to the extent that they have been determined to significantly affect the puck’s responsiveness to voice commands, the best measure of which is signal to noise ratio, are also taken into account by relaying these settings to the puck and applying the appropriate inverse transfer function (as expressed by the previously computed FIR filter and its unique sets of coefficients). That is, a host or family of TFs are acquired to reflect the range of permutations of possible audio settings to the extent that the VC performance is significantly improved by taking them into account. The transfer function acquisition procedure is semi-automated or may be completely automated, allowing the user to set program modes and to establish other settings by voice command (with affirmation by user 106).

[0065] A second major aspect of the invention, which pertains only to host soundbar-subwoofer systems (e.g. 604), is use of the puck 404 as a means for locating the subwoofer relative to the soundbar SB and communicating to the system’s DSP alternative settings for optimal integration between the soundbar SB and subwoofer SW. When a subwoofer’s location is based mainly on aesthetics and convenience within room 102, audio performance inevitably suffers. Ideally, the soundbar and subwoofer are placed relatively close together, approximately equidistant from most listening positions. Failing that, certain DSP settings may be altered for purposes of compensating for the issues associated with non-optimal subwoofer positions. In particular, the delay imposed on audio signals sent to the soundbar may be adjusted so as to synchronize the time of arrival of incident sounds from both soundbar and subwoofer at the listening location. So long as audio does not lag video by more than 30ms, no“lip synch” issues should be expected. This constraint does imply certain limits on the placement location of the subwoofer relative to the soundbar for which optimization is possible without incurring lip synch issues.

[0066] A third aspect of the method of the present invention concerns use of the puck 404 for purposes of acquiring the loudspeaker system’s in-room acoustic frequency response in accordance with available“room-smoothing” algorithms. Audissey™, available in several mass-marketed brand name AVR’s such as Denon™, Marantz™, Onkyo™ and others attempts to improve a loudspeaker system’s low-frequency performance by imposing inverse magnitude shaping (equalization) in accordance with the acquired in-room, time-averaged acoustic response. An alternative room-smoothing technique, as described by patent # 8,194,874, imposes time delayed“correction signals” as a means of addressing troublesome room modes. These“room-smoothing” techniques involve placing Smart Puck synchronized VC module 404 within the primary listening area during the set-up sequence which involves initiating pink-noise, swept-sine or other stimuli so as to capture the loudspeaker-to-puck transfer function which necessarily includes acoustic room artifacts such as resonances and reflections. That said, “room-smoothing” techniques are most effective at addressing low-order room modes, or low-frequency resonances.

[0067] A fourth aspect of the present invention pertains to use of the host loudspeaker 604 as secondary (sound-cancelling) source(s) for creating a“micro quiet zone” in accordance with commonly owned US patent #9,767,786 (the entire disclosure of which is hereby incorporated herein by reference). In this usage, the host loudspeaker system 604 further improves the signal to noise ratio for the smart puck microphone array 432 in the presence of external noise, in addition to performing its primary audio broadcasting duty. Its combined output when fulfilling this noise-reduction function will generally reflect the phase-inverted noise detected by the puck’s microphone array 432 but transformed to account for both its spatial displacement relative to the puck and room effects. A separate host loudspeaker system 604 with an elongated soundbar comprising many aligned loudspeaker drivers is especially suitable here, for its beam forming capabilities. Loudspeaker system 604 may be configured as a beam-forming array targeting the area immediately around smart puck 404 as a micro-quiet zone. Soundbars composed of many transducers especially lend themselves to this application but even simple full- range or two-way loudspeakers may offer advantages with respect to an improved signal to noise ratio via this active noise cancellation technique.

[0068] The user may have a desire to use the system 400 with more than one designated micro-quiet zone and optionally more than one Smart Puck synchronized VC module 404. While the fourth aspect describes a micro quiet zone about the puck where it will normally be positioned for receiving voice commands (e.g., on table 408), a fifth (related) aspect involves a similar advantage whereby Smart Puck synchronized VC module 404 may be positioned within another targeted micro quiet zone (e.g., in accordance with patent # 9,767,786). A transfer function between a second spatially displaced micro-quiet zone, such as the primary seating area or the user’s head pillow for a bedroom system, and the normal, fixed location of the smart-puck may optionally be established so as to determine the appropriate TF ratio that shall be applied to the noise cancellation signal that’s reproduced by the separate host loudspeaker system 604. Depending on its transducer configuration, the separate host loudspeaker system 604 may radiate a tightly controlled beam, characterized by a high directivity factor, towards the selected micro quiet zone. Further, to the extent that Smart Puck VC module 404 controls program material content, background noise within the listening area may be reduced by this method when noise cancellation signals are combined with program material and any other corrective signals such those associated with room-smoothing techniques.

[0069] Having described preferred embodiments of a new and improved or enhanced VC speaker system 400 including Smart Puck synchronized VC module 404, separate host loudspeaker system 604 and the method for setting up and optimizing the VC system, it is believed that other modifications, variations and changes will be suggested to those skilled in the art in view of the teachings set forth herein. It is therefore to be understood that all such variations, modifications and changes are believed to fall within the scope of the present invention as set forth in the appended claims.

Claims

What is claimed is:

Claim 1. An enhanced Voice-Controlled (“VC”) Loudspeaker system 400, comprising:

a microphone array 424;

an audio output linked to a separate host loudspeaker system 604; and a synchronized VC module 404 linked to said microphone array, said synchronized VC module being programmed to synchronize with and incorporate VC functionality into said separate host loudspeaker system 604.

Claim 2. The enhanced VC Loudspeaker system of claim 1 , further including a pre-programmed digital signal processing (DSP) engine 434 within said synchronized VC module 404, said DSP engine including DSP programming to optimize its performance with the host loudspeaker system 604, the DSP

programming including:

determining the characteristic acoustic environment 102 of said system 400; determining when and how the host loudspeaker system 604 is operating; and

altering the performance characteristics of synchronized VC module 404 in response thereto to optimize overall VC system performance.

Claim 3. The enhanced VC Loudspeaker system of claim 2, wherein said separate host loudspeaker system 604 is remote from the synchronized VC module 404.

Claim 4. The enhanced VC Loudspeaker system of claim 3, wherein said synchronized VC module 404 incorporates a processor 432 linked to said microphone array 424 and to said host loudspeaker system and is further linked to remote entities 110 by way of a network 112 for obtaining responses to signals from said microphone array 424 and directing said responses to said host loudspeaker system.

Claim 5. The enhanced VC Loudspeaker system of claim 4, wherein said synchronized VC module’s processor 432 incorporates a microprocessor 800 incorporating suitable memory 802 and operating system components 804 connected to the pre-programmed digital signal processing (DSP) engine 434 to provide optimized sound quality during audio program playback as well as

subjectively pleasant-sounding Voice Assistance response as perceived by a user.

Claim 6. The enhanced VC Loudspeaker system of claim 5, wherein said host loudspeaker system 604 includes first and second spaced loudspeakers (e.g., SB and SW) and wherein said synchronized VC module 404 further includes a location ID module connected to said microprocessor 800 for determining the spatial relationship of said loudspeakers and altering said DSP system to compensate for the spacing of the loudspeakers.

Claim 7. The enhanced VC Loudspeaker system of claim 6, wherein said separate host loudspeaker system 604 further includes user controls for providing system settings, and wherein said synchronized VC module’s DSP system is preprogrammed to account for said separate host loudspeaker settings.

Claim 8. The enhanced VC Loudspeaker system of claim 5, wherein said synchronized VC module’s processor’s operating system 804 is configured and programmed to operate the DSP engine 434 and to manage hardware wireless unit 806, USB unit 808 and Codec 810 within synchronized VC module 404 for the benefit of control element modules 812-832 in the processor.

Claim 9. A method for operating an enhanced Voice-Controlled (“VC”) Loudspeaker system 400 incorporating a microphone array, a synchronized VC module 404 linked to said array and having an audio output linked to an existing remotely located high-performance host loudspeaker system, said VC module including a processor 432 being programmed to provide a method for incorporating VC functionality into said host loudspeaker system, the method including:

capturing at the microphone array, through the use of acquisition routines, noise stimuli emitted by the host loudspeaker system;

deriving from said stimuli a complex transfer function (TF) between the host loudspeaker system and the microphone array;

wherein the captured TF, expressed as Finite Impulse Response (FIR) filter parameters, is utilized to improve recognition of a user’s voice queries or

commands.

Claim 10. The method of Claim 9, further including:

providing a digital signal processing (DSP) engine pre-programmed to include a range of transfer functions corresponding to host loudspeaker system settings; capturing transfer functions TF that reflect room effects derived over a range of preprogrammed DSP settings;

recalling from memory 802 the inverse of the transfer function that best matches the current state of the DSP settings;

imposing on the signals acquired by the microphone array the best match inverse signals to subtract the loudspeaker’s contribution from user voice signals acquired by the microphone array, thereby greatly improving the signal to noise ratio of voice commands/queries signals to loudspeaker output noise to provide greatly improved responsiveness to user voice commands.

Claim 11. The method of Claim 10, further including:

monitoring ambient environmental noise apart from desired program material produced by said host loudspeaker system; and

modifying said DSP settings in accordance with the monitored environmental noise to provide a“micro quiet zone” centered about the synchronized VC module 404 for further improving the signal to noise ratio associated with said user’s voice commands.

Claim 12. The method of Claim 10, further including:

capturing the acoustic frequency response of the loudspeaker system at the users primary listening location in conjunction with“room-smoothing” algorithms; and modifying by inverse magnitude shaping said DSP settings in accordance with the captured acoustic frequency response to optimize system performance.

Claim 13. The method of Claim 10, further including:

capturing the acoustic frequency response of the loudspeaker system at the user’s primary listening location in conjunction with“room-smoothing” algorithms; and

modifying said DSP settings in accordance with the captured acoustic frequency response by imposing time-delayed correction signals to optimize system performance.

Claim 14. The method of Claim 10, further including:

determining, in a host loudspeaker system containing a sound bar SB and a subwoofer SW, the location of the sound bar relative to the subwoofer; and

modifying the DSP settings in accordance with the determined relative locations to optimize system performance.

Claim 15. The method of Claim 14, further including:

determining the location of the soundbar and subwoofer relative to a user’s primary listening position; and

modifying DSP settings in order to optimize system performance.