WO2019133942A1 - Voice-control soundbar loudspeaker system with dedicated dsp settings for voice assistant output signal and mode switching method - Google Patents

Voice-control soundbar loudspeaker system with dedicated dsp settings for voice assistant output signal and mode switching method Download PDF

Info

Publication number
WO2019133942A1
WO2019133942A1 PCT/US2018/068074 US2018068074W WO2019133942A1 WO 2019133942 A1 WO2019133942 A1 WO 2019133942A1 US 2018068074 W US2018068074 W US 2018068074W WO 2019133942 A1 WO2019133942 A1 WO 2019133942A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
user
soundbar
enhanced
dsp
Prior art date
Application number
PCT/US2018/068074
Other languages
French (fr)
Inventor
Bradley M. Starobin
Brian E. COX
Original Assignee
Polk Audio, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Polk Audio, Llc filed Critical Polk Audio, Llc
Priority to EP18895921.7A priority Critical patent/EP3776169A4/en
Publication of WO2019133942A1 publication Critical patent/WO2019133942A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/04Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/01Input selection or mixing for amplifiers or loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R27/00Public address systems

Definitions

  • the present invention relates to Voice Controlled media playback systems or Smart Speakers adapted to generate Voice Controlled Assistant Output Signals and receive and respond to a user’s spoken commands.
  • Some VC speaker systems (e.g., the Amazon EchoTM 104, Google HomeTM or CortanaTM VC speaker systems) will run third party voice-based software (“chat- bots”) or assistant applications (e.g., SkillsTM or ActionsTM) and can respond to a user’s spoken commands 122 with a voice-based application’s synthesized audible response or Voice Controlled Assistant Output Signal 120 generated as part of Voice Assistance (“VA”) operations, where the VC speaker (e.g., 104) senses or detects user-spoken trigger phrases (i.e.,“wake” words or phrases) or commands and generates an audible VA reply or acknowledgement (e.g., 120) in response.
  • VA Voice Assistance
  • Amazon’s VA or voice software system is known as“Alexa”
  • Goggle’s VA or voice software system is known as“Assistant”
  • Apple’s VA or voice software system is known as“Siri”
  • each of these VA systems is programmable to respond to a user’s“wake word” or response- triggering phrase, whereupon the VA takes over control of the VC speaker (e.g., 104) and responds to the user with an audible response or reply.
  • VC loudspeaker systems necessarily reproduce several types of audio program material, including music, news, podcasts, etc., and issue Voice Assistance (VA) audible feedback or Voice Controlled Assistant Output Signals 120 to the user in response to detecting the user’s voice control commands and queries (e.g., 122, 210). Reproduced music may be significantly enhanced if the audio program signal is modified with Digital Signal Processing (“DSP”) parameters selected for optimal audio performance, but any sensed wake word or other user-spoken voice
  • DSP Digital Signal Processing
  • VA response or Voice Controlled Assistant Output Signal e.g. 120
  • every VA response or Voice Controlled Assistant Output Signal 120 to a user’s spoken voice command is clearly heard and easily understood by the user (e.g., 106), but in current VC speaker systems, if the DSP parameters are optimized for music playback (for example), the VA response will be less intelligible and less understandable because the VA reply from Voice Controlled Assistant Output Signal 120 is played with those music-enhancing DSP settings.
  • Surround-sound or home theater loudspeaker systems can also be integrated with Voice Controlled Assistant or VA features and are configured for use with standardized home theater audio systems having a plurality of playback channels, each typically served by an amplifier and a loudspeaker.
  • DolbyTM home theater audio playback systems there are typically five channels of substantially full range material plus a subwoofer channel configured to reproduce band-limited low frequency material.
  • the five substantially full range channels in a Dolby Digital 5.1TM system are typically, center, left front, right front, left surround and right surround.
  • the center channel is typically positioned in a home theater system directly over or under the video display and that channel used by content creators for most of the dialog, which has the desirable effect of making reproduced dialog sound as if it were emanating from the display.
  • typical surround sound loudspeaker systems are installed in listener’s homes, setup problems are encountered and many users struggle with speaker placement, component connections and related complications.
  • many listeners have turned to “soundbar” style home theater loudspeaker systems (e.g., 350 as shown in Fig. 1 D which incorporate at least left, center and right channels into a single enclosure 352 configured for use near the user’s video display (e.g., as seen in Fig. 1 G).
  • Soundbar style single enclosure loudspeaker systems (e.g., 350, as shown in Figs 1D-1 F) are simpler to install and connect, but usually require significant care in product design to provide satisfactory performance for listeners who listen from listening positions arrayed in a listening space, so that the listener can actually hear dialog and localize the center channel and the dialog as appearing to emanate from the display while appreciating a high fidelity, natural dynamic quality to that portion of the program material rendered in the center channel.
  • VA controlled soundbar for example, the VA response (e.g. 120) was less intelligible and less understandable because the VA reply from Voice Controlled Assistant Output Signal (e.g., 120) was not well suited for processing through those movie soundtrack center-channel enhancing (e.g.,“Voice Adjust”) DSP settings.
  • Figs 1A-1C illustrate typical Voice-Control (“VC”) speaker architectures as used in an exemplary AlexaTM brand Voice Controlled Assistant system, in
  • Figs 1 D-1G illustrate one of Applicant’s Soundbar/Subwoofer home theater loudspeaker systems configured with Polk Audio’s Voice AdjustTM Digital Signal Processing (“DSP”) system for enhancing intelligibility of dialog and center channel fidelity in a multi-element single enclosure soundbar loudspeaker system, in accordance with the Prior Art.
  • DSP Digital Signal Processing
  • Fig 2 is a diagram illustrating a Voice-Control Loudspeaker System with Dedicated DSP Settings for the Voice Controlled Assistant’s (e.g., Alexa’s) spoken responses (which differ from the DSP settings used when playing audio program material such as movie soundtracks, music or sportscasts) and a Mode Switching Method, in accordance with the present invention.
  • Voice Controlled Assistant e.g., Alexa
  • Fig 3 is a diagram illustrating the mode switching method for the Voice- Control Loudspeaker System of Fig. 2, in which pre-defined and dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal are selected in response to a user’s wake word or command, in accordance with the present invention.
  • Fig 4 is a process flow diagram illustrating the mode switching method for the Voice-Control Loudspeaker System 400 of Figs. 2 and 3, in which pre-defined and dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal are selected in response to a user’s wake word or command, in accordance with the present invention.
  • Fig 5 is a table illustrating the DSP program mode switching method for the Voice-Control Loudspeaker System 400 of Figs. 2-4, in which pre-defined and dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal are selected in response to a user’s wake word or command, in accordance with the present invention.
  • the DSP program mode switching method of the present invention as illustrated in Fig 5 is also embodied in the Soundbar/Subwoofer system 500 of Figs 6A-7.
  • Fig 6A-6C illustrate an improved Voice Actuated or Voice Controlled
  • Fig 7 illustrates an improved DSP signal flow and mode switching method for the preferred embodiment of the Voice-Controlled Soundbar/Subwoofer
  • loudspeaker system 500 of Figs 6A and 6B with Dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal in accordance with the present invention.
  • VC Voice-Controlled
  • VA Voice Assistance
  • VC speakers must also generate VA audible responses to user’s commands and queries and VA response audible feedback (e.g., Alexa’s answers) was discovered to benefit from a very different set of DSP settings.
  • VA response audible feedback e.g., Alexa’s answers
  • the VC speaker system architecture of the present invention 400 incorporates a DSP mode switching system and method (as illustrated in Figs 2-5) to provide a more consistently intelligible and more subjectively pleasant sounding Voice Assistance response 420 as perceived by user 106 (e.g., when used with AmazonTM AlexaTM AppleTM SiriTM or GoogleTM Voice Assist), regardless of the audio settings that the user may have selected on the host VC speaker product 404 (which may otherwise inhibit VA response intelligibility).
  • VA sound quality and Voice Controlled Assistant Output Signal 420
  • VA sound quality are controlled with dedicated, preprogrammed DSP settings that override the particular DSP settings selected by the user on the basis of program material, listening conditions and personal taste.
  • Figs 3-5 include diagrams and a table illustrating the DSP mode switching method for the Voice-Control Loudspeaker System 400 of Fig. 2, in which pre-defined and dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal are selected in response to sensing a user’s wake word or command 422, in accordance with the present invention.
  • the general principles of the method of the present invention illustrated in Figs 3-5 apply to (a) the single enclosure standalone embodiment 404 of Fig. 2 and (b) the Soundbar/Subwoofer system embodiment 500 of Figs 6A-7 (as described below).
  • VC speaker system architecture e.g., sold as Amazon EchoTM or AlexaTM brand“voice controlled assistants”
  • Fig. 1A illustrates a first exemplary (typical) prior art system architecture 100 set in an exemplary VC speaker use environment 102, which includes a typical VC speaker system 104 and at least a first user 106.
  • the user 106 is typically near VC speaker system 104.
  • VC speaker system 104 is physically positioned on a table 108 within the environment 102.
  • the VC speaker system 104 is shown sitting upright and supported on its base end.
  • the VC speaker system 104 is shown communicatively coupled to remote entities 110 over a network 112 and remote entities 110 may include individual people, such as person 114, or automated systems (not shown) that serve as far end talkers to verbally interact with the user 106.
  • the remote entities 110 may alternatively comprise cloud services 116 hosted, for example, on one or more servers 118(1 ), . . . , 118(S). These servers 118(1 )-(S) may be arranged in any number of ways, such as server farms, stacks, and the like that are commonly used in data centers.
  • the cloud services 116 generally refer to a network accessible platform implemented as a computing infrastructure of processors, storage, software, data access, and so forth that is maintained and accessible via a network such as the Internet. Cloud services 116 do not require end-user knowledge of the physical location and configuration of the system that delivers the services. Common expressions associated with cloud services include “on-demand computing”, “Software as a Service (SaaS)”, “platform computing”, “network accessible platform”, and so forth.
  • Cloud services 116 may host any number of applications that can process the user input received from the VC speaker system 104, and produce a suitable response.
  • Example applications might include web browsing, online shopping, banking, email, work tools, productivity, entertainment, educational, and so forth.
  • user 106 is shown communicating with the remote entities 110 via VC speaker system 104.
  • the VC speaker system 104 Voice Assist outputs an audible question, "What do you want to do?" as represented by dialog bubble 120. This output may represent a question from a far end talker 114, or from a cloud service 116 (e.g., an entertainment service).
  • the user 106 is shown replying to the question by stating, "I'd like to buy tickets to a movie" as represented by the dialog bubble 122.
  • the VC speaker system 104 (or voice controlled assistant 104) is equipped with an array 124 of microphones 126(1), . . . . 126(M) to receive the voice input from the user 106 as well as any other audio sounds in the environment 102.
  • the microphones 126(1)-(M) are generally arranged at a first or top end of the VC speaker system 104 opposite the base end seated on the table 108. Although multiple microphones are illustrated, in some implementations, the VC speaker system 104 may be embodied with only one microphone.
  • the VC speaker system 104 may further include a speaker array 128 of speakers 130(1), . . . , 130(P) to output sounds in humanly perceptible frequency ranges.
  • the speakers 130(1)-(P) may be configured to emit sounds at various frequency ranges, so that each speaker has a different range. In this manner, the VC speaker system 104 may output high frequency signals, mid frequency signals, and low frequency signals.
  • the speakers 130(1)-(P) are generally arranged at a second or base end of the VC speaker system 104 and oriented to emit the sound in a downward direction toward the base end and opposite to the microphone array 124 in the top end.
  • the voice controlled assistant or VC speaker system 104 may further include computing components 132 that process the voice input received by the microphone array 124, enable communication with the remote entities 110 over the network 112, and generate the audio to be output by the speaker array 128.
  • the computing components 132 are generally positioned between the microphone array 123 and the speaker array 128, although essentially any other arrangement may be used.
  • the VC speaker system 104 may be configured to produce stereo or non-stereo output.
  • the speakers 130(1)-(P) may receive a mono signal for output in a non-stereo configuration.
  • the computing components 132 may generate and output to the speakers 130(1)-(P) two different channel signals for stereo output.
  • a first channel signal (e.g., left channel signal) is provided to one of the speakers, such as the larger speaker 130(1).
  • a second channel signal (e.g., right channel signal) is provided to the other of the speakers, such as the smaller speaker 130(P). Due to the vertically stacked arrangement of the speakers, however, the two-channel stereo output may not be appreciated to the user 106.
  • Fig 1 B shows another implementation of voice interactive computing architecture 200 similar to the architecture 100 of Fig. 1A, but in this illustration a voice controlled assistant or VC speaker system 204 has a different physical packaging layout that allows a spaced arrangement of the speakers to better provide stereo output, rather the vertically stacked arrangement found in the assistant 104 of Fig 1A. More particularly, the speakers 130(1)-(P) are shown at a horizontally spaced distance from one another. Optionally, VC speaker system 204 is able to play full spectrum stereo using only two speakers of different sizes. In Fig 1 B, VC speaker system 204 is communicatively coupled over the network 112 to an entertainment service 206 that is part of the cloud services 116.
  • the entertainment service 206 is hosted on one or more servers, such as servers 208(1), . . . , 208(K), which may be arranged in any number of configurations, such as server farms, stacks, and the like that are commonly used in data centers.
  • the entertainment service 206 may be configured to stream or otherwise download entertainment content, such as movies, music, audio books, and the like to the voice controlled assistant.
  • the voice controlled assistant 204 can play the audio in stereo with full spectrum sound quality, even though the device has a small form factor and only two speakers.
  • the user 106 is shown directing the VC speaker system 204 to pause the music being played through the audible statement, "Pause the music" in dialog bubble 210.
  • the VC speaker system 204 is not only designed to play music in full spectrum stereo, but is also configured with an acoustic echo cancellation (AEC) module to cancel audio components being received at the microphone array 124 so that the VC speaker system 204 can clearly hear the statements and commands spoken by the user 106.
  • AEC acoustic echo cancellation
  • Fig. 1C shows selected functional components of the voice controlled assistants or VC speaker systems 104 and 204 in more detail.
  • each of the VC speaker systems 104 and 204 may be implemented as a standalone device that is relatively simple in terms of functional capabilities with limited input/output components, memory, and processing capabilities.
  • the VC speaker systems 104 and 204 may not have a keyboard, keypad, or other form of
  • assistants 104 and 204 may be implemented with the ability to receive and output audio, a network interface
  • each VC speaker system 104/204 includes the microphone array 124, a speaker array 128, a processor 302, and memory 304.
  • the microphone array 124 may be used to capture speech input from the user 106, or other sounds in the environment 102.
  • the speaker array 128 may be used to output speech from a far end talker, audible responses provided by the cloud services, forms of entertainment (e.g., music, audible books, etc.), or any other form of sound.
  • the speaker array 128 may output a wide range of audio frequencies including both human perceptible frequencies and non-human perceptible
  • the speaker array 128 is formed of two speakers capable of outputting full spectrum stereo sound, as will be described below in more detail. Two speaker array arrangements are shown, including the vertically stacked arrangement 128A and the horizontally spaced arrangement 128B.
  • the memory 304 may include computer-readable storage media (“CRSM”), which may be any available physical media accessible by the processor 302 to execute instructions stored on the memory.
  • CRSM may include random access memory (“RAM”) and Flash memory.
  • RAM random access memory
  • Flash memory Flash memory
  • CRSM may include, but is not limited to, read-only memory (“ROM”), electrically erasable programmable read-only memory (“EEPROM”), or any other medium which can be used to store the desired information and which can be accessed by the processor 302.
  • An operating system module 306 is configured to manage hardware and services (e.g., wireless unit, USB, Codec) within and coupled to the assistant 104/204 for the benefit of other modules.
  • a speech recognition module 308 provides some level of speech recognition functionality.
  • this functionality may be limited to specific commands that perform fundamental tasks like waking up the device, configuring the device, and the like.
  • the amount of speech recognition capabilities implemented on the VC speaker system 104/204 is an implementation detail, but the architecture described herein can support having some speech recognition at the local VC speaker system
  • An acoustic echo cancellation module 310 and a double talk reduction module 312 are provided to process the audio signals to substantially cancel acoustic echoes and substantially reduce double talk that may occur. These modules may work together to identify times where echoes are present, where double talk is likely, where background noise is present, and attempt to reduce these external factors to isolate and focus on the“near talker” (i.e., user 106). By isolating on the near talker, better signal quality is provided to the speech recognition module 308 to enable more accurate interpretation of the speech utterances.
  • a query formation module 314 may also be provided to receive the parsed speech content output by the speech recognition module 308 and to form a search query or some form of request.
  • This query formation module 314 may utilize natural language processing (NLP) tools as well as various language modules to enable accurate construction of queries based on the user's speech input.
  • NLP natural language processing
  • the modules shown stored in the memory 304 are merely representative.
  • Other modules 316 for processing the user voice input, interpreting that input, and/or performing functions based on that input may be provided.
  • the voice controlled assistant 104/204 might further include a codec 318 coupled to the microphones of the microphone array 124 and the speakers of the speaker array 128 to encode and/or decode the audio signals.
  • the codec 318 may convert audio data between analog and digital formats.
  • a user may interact with the assistant 104/204 by speaking to it, and the microphone array 124 receives the user speech.
  • the codec 318 encodes the user speech and transfers that audio data to other components.
  • the assistant 104/204 can
  • the VC speaker system or voice controlled assistant 104/204 includes a wireless unit 320 coupled to an antenna 322 to facilitate a wireless connection to a network.
  • the wireless unit 320 may implement one or more of various wireless technologies, such as wife, Bluetooth, RF, and so on.
  • a USB port 324 may further be provided as part of the assistant 104/204 to facilitate a wired connection to a network, or a plug-in network device that communicates with other wireless networks. In addition to the USB port 324, or as an alternative thereto, other forms of wired connections may be employed, such as a broadband connection.
  • a power unit 326 is further provided to distribute power to the various components on the assistant 104/204.
  • a stereo component 328 is optionally provided to output stereo signals to the various speakers in the speaker array 128.
  • the VC speaker system or voice controlled assistant 104/204 is designed to support audio interactions with the user, in the form of receiving voice commands (e.g., words, phrase, sentences, etc.) from the user and outputting audible feedback to the user and in one implementation, the voice controlled assistant 104/204 may include non-input control mechanisms, such as basic volume control button(s) for increasing/decreasing volume, as well as power and reset buttons. There may also be a simple light element (e.g., LED) to indicate a state such as, for example, when power is on. But, otherwise, the assistant 104/204 does not use or need to use any input devices or displays.
  • voice commands e.g., words, phrase, sentences, etc.
  • the voice controlled assistant 104/204 may include non-input control mechanisms, such as basic volume control button(s) for increasing/decreasing volume, as well as power and reset buttons.
  • the VC speaker system 104/204 may be implemented as an aesthetically appealing device with a power cord and optionally a wired interface (e.g., broadband, USB, etc.).
  • the cylindrical-shaped (e.g., EchoTM) assistant 104 has an elongated cylindrical housing with apertures or slots formed in a base end to allow emission of sound waves.
  • a cube-shaped assistant 204 may also be implemented as an aesthetically appealing device with smooth surfaces, and covered apertures for passage of sound waves. The cube or box shape enables the two speakers to be spaced apart to provide a stereo sound experience for the user. Once plugged in, each device 104/204 may automatically self-configure, or with slight aid of the user, and be ready to use. As a result, the VC speaker system or assistant 104/204 may be generally produced at a low cost.
  • the audio performance of VC speaker system 404 is enhanced by condition responsive use or implementation of dedicated DSP settings optimized for reproduction of various kinds of program material (e.g., movies, music, sports and news) and may be further affected by user settings intended to improve or alter audio playback such as dialogue/voice enhancement (or a“late night” listening mode).
  • program material e.g., movies, music, sports and news
  • user settings intended to improve or alter audio playback such as dialogue/voice enhancement (or a“late night” listening mode).
  • a first exemplary enhanced system architecture 400 is set in a VC speaker use environment 102 where the enhanced VC speaker system 404 is shown with a first user 106.
  • the user 106 is typically near or proximal to enhanced VC speaker system 404.
  • enhanced VC speaker system 404 is physically positioned on a table 108 within the environment 102 and is shown sitting upright and supported on its base end.
  • the enhanced VC speaker system 404 is shown communicatively coupled to remote entities 110 over a network 112 and remote entities 110 may include individual people, such as person 114, or automated systems (not shown) that serve as far end talkers to verbally interact with the user 106.
  • the remote entities 110 may alternatively comprise cloud services 116 hosted, for example, on one or more servers 118(1), . . . , 118(S). These servers 118(1 )-(S) may be arranged in any number of ways, as described above and cloud services 116 generally refer to a network accessible platform implemented as a computing infrastructure of processors, storage, software, data access, and so forth that is maintained and accessible via a network such as the Internet. Cloud services 116 do not require end-user knowledge of the physical location and configuration of the system that delivers the services.
  • the cloud services 116 may host any number of applications that can process the user input received from the enhanced VC speaker system 404, and produce a suitable response. Example applications might include web browsing, online shopping, banking, email, work tools, productivity, entertainment, educational, and so forth, and user 106 may communicate with the remote entities 110 via enhanced VC speaker system 404.
  • enhanced VC speaker system 404 incorporates Voice Assist which outputs a VA output signal 420 comprising audible questions (e.g., "What do you want to do?" as represented by dialog bubble 420), and this VA output may represent a question from a far end talker 114 or from cloud service 116 (e.g., an entertainment service).
  • the user 106 is shown replying to the question by stating, "I'd like to Hear another song" as represented by the dialog bubble 422.
  • the enhanced VC speaker system 404 is equipped with array 124 of microphones 126(1), . . . . 126(M) to receive the voice input from the user 106 as well as any other audio sounds in the environment 102.
  • the microphones 126(1)-(M) may be arranged at a first or top end of enhanced VC speaker system 404 opposite the base end seated on the table 108. Although multiple microphones are illustrated, in some implementations, the enhanced VC speaker system 404 may be embodied with only one microphone.
  • the enhanced VC speaker system 404 may further include a speaker array 128 of speakers 130(1), . . , 130(P) to output sounds in humanly perceptible frequency ranges.
  • the speakers 130(1)-(P) may be configured to emit sounds at various frequency ranges, so that each speaker has a different range. In this manner, the enhanced VC speaker system 404 may output high frequency signals, mid
  • the speakers 130(1)-(P) are generally arranged within enhanced VC speaker system 404 and oriented to emit the sound in a selected direction.
  • the enhanced VC speaker system 404 may further include computing components 432 (best seen in Figs 2 and 3 and further including a processor 302, memory 304, wireless unit 320 and related components as illustrated in Fig. 1C) and these computing components process the user’s voice input (e.g., 422) as received by the microphone array 124, enable communication with the remote entities 110 over the network 112, and generate the audio to be output by the speaker array 128.
  • the computing components 432 are generally positioned between the microphone array 123 and the speaker array 128, although essentially any other arrangement may be used.
  • the enhanced VC speaker system 404 may be configured to produce stereo or non-stereo (e.g., mono or multi-channel home theater) output.
  • enhanced VC speaker system 404 has a physical packaging layout that allows a spaced arrangement of the speakers to better provide stereo or other multi-channel output with speakers 130(1)-(P) spaced from one another and is able to play full spectrum stereo using speakers of different sizes.
  • enhanced VC speaker system 404 is optionally communicatively coupled over the network 112 to an entertainment service 206 that is part of the cloud services 116.
  • the entertainment service 206 is hosted on one or more servers (e.g., such as servers 208(1), . . . , 208(K) as shown in Fig. 1 B), which may be arranged in any number of configurations, as described above.
  • the entertainment service 206 may be configured to stream or otherwise download entertainment content, such as movies, music, audio books, and the like to the enhanced VC speaker system 404.
  • the enhanced VC speaker system 404 can play the audio in stereo or in a multi-channel home theater (e.g., soundbar) mode with full spectrum sound quality.
  • the user 106 is shown directing the enhanced VC speaker system 404 to pause the music being played and select another recording for playback through the audible
  • enhanced VC speaker system 404 is not only designed to play music in full spectrum stereo, but is also configured to clearly hear the statements and commands spoken by the user 106.
  • enhanced VC speaker system 404 has pre-programmed DSP modes (see, e.g., Fig. 5) including audio content listening modes with selected effects which are user selected on the basis of the nature of the program material and personal taste.
  • DSP modes see, e.g., Fig. 5
  • a VC speaker configured as a soundbar will be programmed with appropriate DSP settings for optimal audio reproduction of TV/Movie program material.
  • “Movie mode” may entail augmented surround channel effects, boosted bass and other audio enhancements.
  • a further example pertains to low level listening to movie or music program material when the associated DSP modes apply loudness compensation for improved perceived spectral balance.
  • voice control feedback e.g., formerly VA output signal 120
  • the loudness compensation settings which prove to be acceptable for music and movie program material may inhibit VA output signal speech intelligibility and render VA speech unnaturally bass- heavy, with a chesty, muffled quality.
  • the system 400 and method of the present invention establishes a plurality of distinct DSP modes with dedicated DSP settings for voice-control VA feedback (typically programmed into enhanced VC speaker system 404) so as to optimize VA intelligibility and VA voice (or audio) quality regardless of the DSP mode/effects the user may have imposed on the basis of program material or personal taste, thus providing a dedicated DSP response for use in generating enhanced VA dialog output signal 420.
  • voice-control VA feedback typically programmed into enhanced VC speaker system 404
  • the range of control for some of the settings, master volume in particular, associated with a voice-controlled audio system may exceed that which is optimal for voice feedback.
  • master volume there are settings below and above which voice feedback should not be played for some voice-controlled audio products.
  • voice feedback signal (formerly 120) will be difficult to hear for user 106 when the master volume is set too low even if certain program material may be enjoyed at that level under some listening conditions.
  • voice feedback 420 should be played at a comfortable, intelligible sound level - decidedly not that at which the program material was playing before the user’s voice command was issued.
  • the DSP or program mode switching method of the present invention comprises the enhanced VC speaker system 404 including at least one microphone transducer (e.g., like 128 but preferably an array like 124) configured and aimed to sense and receive a first user signal spoken by user 106 (e.g., a wake word, trigger sound, user query or command (e.g., 422)).
  • a first user signal spoken by user 106 e.g., a wake word, trigger sound, user query or command (e.g., 422)
  • Enhanced VC speaker 404 also includes a controller or processor 432 configured to implement a first (“audio program enhancing”) playback DSP mode and a second (“VA response enhancing”) playback DSP mode which differs from the first DSP mode to process the first user signal 422 and generate an enhanced-intelligibility first Voice Assist (“VA”) response output signal 420 in response to the first user signal 422.
  • a controller or processor 432 configured to implement a first (“audio program enhancing") playback DSP mode and a second (“VA response enhancing”) playback DSP mode which differs from the first DSP mode to process the first user signal 422 and generate an enhanced-intelligibility first Voice Assist (“VA”) response output signal 420 in response to the first user signal 422.
  • VA Voice Assist
  • the enhanced VC speaker system 404 is programmed to switch from the first audio-playback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility) in response to a sensed user command or wake word (e.g., 422), and then, if no further user commands are sensed, switching enhanced VC speaker system 404 back from said second (VA response enhancing) playback DSP mode to said first playback DSP mode by restoring the first audio-playback DSP settings.
  • the first audio-playback DSP settings which may include user-selectable settings for movie effects or increased low frequencies, etc.
  • the second VA dedicated DSP settings for enhanced VA audio quality and intelligibility
  • Enhanced VC speaker system 404 is also programmed to attenuate or mute the listener’s selected music or movie program material in response to sensing that user command before changing from said first audio-playback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility), and is also programmed to change a master volume setting to a constrained subset of a master volume range when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
  • first audio-playback DSP settings which may include user-selectable settings for movie effects or increased low frequencies, etc.
  • the second VA dedicated DSP settings for enhanced VA audio quality and intelligibility
  • the enhanced VC speaker system 404 is also programmed to disable or mute any home theater (e.g., Dolby 5.1)“surround left” or“surround right” signals when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
  • any home theater e.g., Dolby 5.1
  • “surround left” or“surround right” signals when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
  • FIG. 1 Another, but not limiting example, pertains to the configuration and physical lay-out of a voice controlled multi-loudspeaker system such as a multi-driver, single enclosure soundbar 552 as used in a soundbar/subwoofer home theater system (e.g., 500, as illustrated in Figs 6A and 6B). While this configuration may be optimal for reproduction of movie soundtracks - owing to their discrete, multichannel nature and wide bandwidth - employing all of the available loudspeaker drivers (e.g., all five mid-bass drivers 354, 356, 358, 360 and 362 as shown in Fig.
  • a voice controlled multi-loudspeaker system such as a multi-driver, single enclosure soundbar 552 as used in a soundbar/subwoofer home theater system (e.g., 500, as illustrated in Figs 6A and 6B). While this configuration may be optimal for reproduction of movie soundtracks - owing to their discrete, multichannel nature and wide bandwidth - employing all of the available loud
  • a voice assistant response 520 for reproduction of a voice assistant response 520 will only deleteriously affect sound quality and intelligibility relative to use of only the soundbar in such a setup.
  • attempted derivation of a VA response 520 for the surround channels may result in only objectionable noise (since the VA response signal is monophonic and surround derivation depends on the difference between Front Left and Right channels (e.g., as described in commonly owned US patent 7231053, the entire disclosure of which is incorporated herein by reference.)
  • the system may feature an“all- channel stereo” mode in which the surround channels emulate the front channel signals, resulting in poorly intelligibility due to multiple sources playing identical program material if the surround loudspeakers were to reproduce the VA signal.
  • VA signal (or input signal from a VA signal engine, as best seen in Fig. 3) may be reproduced by the soundbar 552 alone and the DSP parameters associated with VA playback are preferably established as a separate dedicated DSP mode for reproduction solely via amplifiers and drivers in the soundbar 552.
  • enhanced VC soundbar speaker system 552 is configured with at least three pre-programmed user-selectable sound reproduction (e.g., DSP) modes including a Movie Mode for which enhanced VC soundbar subwoofer speaker system 500 is Acoustically optimized for both movie and TV content and provides a bass boost, increased spatialization, and enhanced center channel Voice Adjust levels for improved dialogue clarity.
  • Movie Mode is typically the default sound mode for HDMI and Optical input sources.
  • a second user-selectable sound reproduction mode is the Music Mode which gives the user and listener a spectrally balanced sound and smoother bass while minimizing spatialization effects to ensure more natural sound reproduction of musical performance program material.
  • a third user-selectable sound reproduction mode is a Sport Mode which gives the user enhanced vocal intelligibility for dialogue-rich content, like sporting events, news casts and talk shows.
  • this mode e.g., Polk’s own“Voice Adjust”TM sound reproduction system and method
  • the DSP adjustments boost dialogue clarity and optimize the subwoofer volume levels.
  • enhanced VC soundbar subwoofer speaker system 500 is also configured to provide user-selectable sound reproduction mode known as“Night Mode" which reduces bass and volume dynamics while improving voice intelligibility for low-volume listening.
  • any of these user-selectable sound reproduction (or DSP) modes are changed when user 106 utters a wake word or speaks a voice command 522, in accordance with the present invention.
  • FIG. 6A another exemplary enhanced system architecture 500 is set in a VC soundbar speaker system use environment 02 wherein the enhanced VC soundbar speaker system 552 is shown with a first user 106.
  • the user 106 is typically near or proximal to enhanced VC soundbar speaker system 552.
  • enhanced VC soundbar speaker system 552 is physically positioned on a table 108 within the environment 102 and is shown sitting upright and
  • the enhanced VC soundbar speaker system 552 is shown communicatively coupled to remote entities 110 over a network 112 and remote entities 110 may include individual people, such as person 114, or automated systems (not shown) that serve as far end talkers to verbally interact with the user 106.
  • the remote entities 110 may alternatively comprise cloud services 116 hosted, for example, on one or more servers 118(1), . . . , 118(S). These servers 118(1 )-(S) may be arranged in any number of ways, as described above and cloud services 116 generally refer to a network accessible platform implemented as a computing infrastructure of processors, storage, software, data access, and so forth that is maintained and accessible via a network such as the Internet.
  • Cloud services 116 do not require end-user knowledge of the physical location and configuration of the system that delivers the services.
  • the cloud services 116 may host any number of applications that can process the user input received from the enhanced VC soundbar speaker system 552, and produce a suitable response.
  • Example applications might include web browsing, online shopping, banking, email, work tools, productivity, entertainment, educational, and so forth, and user 106 may communicate with the remote entities 110 via enhanced VC soundbar speaker system 552.
  • Enhanced VC soundbar speaker system 552 incorporates Voice Assist which outputs a VA output signal 520 comprising audible responses or questions (e.g., "What do you want to do?" as represented by dialog bubble 520), and this VA output may represent a question from a far end talker 114, or from cloud service 116 (e.g., an entertainment service).
  • the user 106 is shown replying to the question by stating, "I'd like to Change Sound Modes" as represented by the dialog bubble 522.
  • the enhanced VC soundbar speaker system 552 is equipped with array 524 of microphones 126(1), . . . , 126(M) to receive the voice input from the user 106 as well as any other audio sounds in the environment 102.
  • the microphone array 524 is preferably arranged in a circular array in the center of a first or top end of enhanced VC soundbar speaker system 552 opposite the base, as best seen in Fig. 6B.
  • the enhanced VC soundbar speaker system 552 further includes a soundbar enclosure speaker array 528 of loudspeaker drivers (similar to the array of drivers illustrated in Fig. 1 F) to output sounds in humanly perceptible frequency ranges.
  • the soundbar enclosure speaker array 528 includes tweeter and mid-bass drivers configured to emit sounds at various frequency ranges, so that each speaker has a different range.
  • enhanced VC soundbar speaker system 552 may output high frequency signals, mid frequency signals, and low frequency signals which are supplemented or augmented by low frequency signals from subwoofer 554.
  • the speakers in soundbar enclosure speaker array 528 are generally supported within and aimed by an enclosure front baffle to emit the sound in a selected direction toward a listening position (e.g., in a manner similar to that shown in Figs 1 F and 1G).
  • the enhanced VC soundbar speaker system 552 may further include computing components 532 (similar to 432, for the embodiment of Figs. 2-5 and including a processor 302, memory 304, wireless unit 320 and related components as illustrated in Fig. 1C) that process the user’s voice input (e.g., 522) as received by the microphone array 524, enable communication with the remote entities 110 over the network 112, and generate the audio to be output by the speaker array 528.
  • the computing components 532 are generally responsive to signals from the microphone array 123 and include DSP circuits used to drive amplifiers connected to the speaker array 528.
  • the enhanced VC soundbar speaker system 552 may be configured to produce stereo or non-stereo (e.g., mono or multi- channel home theater) output.
  • enhanced VC soundbar speaker system 552 is optionally communicatively coupled over the network 112 to an entertainment service that is part of the cloud services 116.
  • the entertainment service is hosted on one or more servers which may be arranged in any number of configurations, as described above.
  • the entertainment service may be configured to stream or otherwise download entertainment content, such as movies, music, audio books, and the like to the enhanced VC soundbar speaker system 552.
  • the enhanced VC soundbar speaker system 552 can play the audio in stereo or in a multi-channel home theater (e.g., soundbar) mode with full spectrum sound quality.
  • the user 106 is shown directing the enhanced VC soundbar speaker system 552 to select another audio playback mode (e.g., thereby controlling whether the DSP engine is optimized with“Voice Adjust” processing) through the audible statement, " Change the Sound Mode " in dialog bubble 522.
  • the enhanced VC soundbar speaker system 552 is not only designed to play music in full spectrum stereo, but is also configured to clearly hear the statements and commands spoken by the user 106.
  • the audio content listening modes and effects are user selected on the basis of the nature of the program material and personal taste.
  • a VC speaker configured as an enhanced VC soundbar speaker system 552 will be programmed with appropriate DSP settings for optimal audio reproduction of TV/Movie program material.
  • “Movie mode” may entail augmented surround channel effects, boosted bass and other audio enhancements.
  • These DSP settings generally do not lend themselves to voice control feedback (e.g., 520) whose optimal DSP settings will both enhance speech intelligibility and clarity, while providing the VA assistant’s desired sound quality at the possible expense of bandwidth and soundstage (breadth of acoustic image).
  • a further example pertains to low level listening to movie or music program material when the associated DSP modes apply loudness compensation for improved perceived spectral balance.
  • the loudness compensation settings which prove to be acceptable for music and movie program material may inhibit VA output signal speech intelligibility and render VA speech unnaturally bass-heavy, with a chesty, muffled quality.
  • the system 500 and method of the present invention establishes dedicated DSP settings for voice-control VA feedback (typically programmed into enhanced VC soundbar speaker system 552) so as to optimize VA intelligibility and VA voice (or audio) quality regardless of the DSP mode/effects the user may have imposed on the basis of program material or personal taste, thus providing a dedicated DSP response for use in generating enhanced VA dialog output signal 520.
  • the range of control for some of the settings, master volume in particular, associated with a voice-controlled audio system may exceed that which is optimal for voice feedback.
  • master volume there are settings below and above which voice feedback should not be played for some voice-controlled audio products.
  • VA voice feedback signal
  • the VA voice feedback signal (formerly 120) will be difficult to hear for user 106 because the master volume is set too low for VA response intelligibility even if certain program material may be enjoyed at that level under some listening conditions.
  • the voice feedback 520 should be played at a comfortable, intelligible sound level - decidedly not that at which the program material was playing before the user’s voice command was issued.
  • a voice command or wake word e.g., 522
  • that command or wake word is detected and that detection triggers (within enhanced VC soundbar speaker system 552) a response which first attenuates or mutes the audio program material or system program audio signal then being processed and amplified for playback through the speakers (e.g., within soundbar speaker array 528).
  • the system switches the DSP settings (e.g., amplitude, frequency response, subwoofer level, magnitude shaping, etc, from the program material enhancing DSP settings or mode to a dedicated VA response enhancing mode, as described and shown in Figs 6A-6C and Fig. 7).
  • DSP settings e.g., amplitude, frequency response, subwoofer level, magnitude shaping, etc, from the program material enhancing DSP settings or mode to a dedicated VA response enhancing mode, as described and shown in Figs 6A-6C and Fig. 7.
  • enhanced VC soundbar speaker system 552 plays the VA response audio signal for user 106, and then enhanced VC soundbar speaker system 552 is programmed to monitor the microphones to detect and sense any subsequent response (e.g., command 522) from user 106.
  • user 106 may use commands 522 to ask the VA (e.g.,“Alexa”) to perform many useful tasks.
  • VA e.g.,“Alexa”
  • user 106 simply says, for example,“Alexa” and then asks the VA (Alexa) for sports updates, weather reports, cooking questions or asks the VA (Alexa) to control the sound bar (e.g., to switch to a different selected input audio signal source, change sound modes, adjust the bass or choose a user-customized sound mode (or DSP setting) such as a Polk Audio’s Voice Adjust® center channel level.
  • a Polk Audio Voice Adjust® center channel level.
  • the table of Fig. 5 illustrates, in table form, preferred method steps, responses, and dedicated DSP selection modes for VA signals when reproduced or played aloud using enhanced VC soundbar speaker system 552 also.
  • the crossover between the soundbar enclosure drivers (e.g., generally within enclosure 552) and the subwoofer (e.g., generally within enclosure 554) is usually about 120Hz, but in the present invention, for the dedicated mode pre-programmed for more intelligible Voice Assistant responses (e.g., 520) that crossover between soundbar and subwoofer is shifted downwardly to about 80Hz, specifically so that substantially all of the audio for Voice Assistant responses (e.g., 520) emanates from the soundbar 552, meaning that soundbar 552 is used over a wider passband (nearly full range) than it is employed for non-VA signals.
  • the microphone array may be housed in an enclosure separate and distinct from the enhanced VC soundbar speaker system 552 and linked with controller/signal processing block 532 wirelessly.
  • enhanced VC soundbar speaker system 552 may be configured to function in the same manner as a Google miniTM or Amazon EchoTM type device which is configured and programmed to be paired with a loudspeaker to make a voice controlled speaker system with better Voice Assist sound quality and
  • VA Voice Assist
  • VA audio may include“special effects" that are best reproduced via spatialization techniques.
  • the surround channels are enabled (for example) for optimal VA reproduction.
  • the system and method of the present invention may be readily adapted for use in a VA controllable Soundbar or enhanced VC soundbar speaker system 552 or Surround-sound or home theater loudspeaker system with Voice Controlled Assistant or VA features configured for use with a plurality of playback channels, each typically served by an amplifier and a loudspeaker.
  • a DolbyTM compatible home theater audio playback system including DSP as illustrated in Fig. 6C
  • the five substantially full range channels in a Dolby Digital 5.1TM system are typically, center (“C”) , left front (“FL”) , right front (“FR”), left surround (“SL”) and right surround (“SR”).
  • the center channel (“C”) is positioned in a home theater system directly over or under the video display (as shown in Fig. 1G) and in the system and method of the present invention the VA output signal playback 520 is preferably via the soundbar enclosure speakers.
  • enhanced VC soundbar speaker system 552 includes a VA speaker subsystem including, preferably, a VC speaker microphone array 524 and related components near the center of enclosure in an upward facing surface (as best seen in Fig. 6B).
  • the enhanced VC soundbar speaker system 552 is otherwise very similar to soundbar 352 of loudspeaker system 350 (as shown in Fig. 1 D, which also incorporates at least left, center and right channels into a single enclosure 352 configured for use near the user’s video display (e.g., as seen in Fig. 1G), but includes the enhanced DSP for generating a more intelligible Voice Controlled Assistant Output Signal 520, as best seen in Fig. 7).
  • soundbar style single enclosure loudspeaker systems can provide unsatisfactory performance for listeners in some of the listening positions arrayed in a listening space. There is often only one position directly in front of the center of the soundbar which provides acceptable center-channel performance, meaning that the listener can actually hear dialog, localize the center channel and the dialog as appearing to emanate from the display and appreciate a high fidelity, natural dynamic quality to that portion of the program material rendered in the center channel.
  • the audibility of Voice Controlled Assistant Output Signal e.g., 120
  • a typical soundbar speaker system e.g. 350
  • the typical soundbar loudspeaker (by definition, multi-element, single-enclosure, e.g., 352) can thus do a poor job of reproducing the Voice Controlled Assistant Output Signal 120, whether discrete within a multichannel mix (such as Dolby Digital 5.1) or derived from a 2- channel mixdown via any appropriate means (such as SRS or Dolby ProLogic algorithms), and so listeners experience poor intelligibility of dialog, a lack of overall clarity, with a chesty or muted unnatural timbre and this poor performance is experienced and appreciated for most of the listener seating and viewing locations in typical (domestic) home theater environments.
  • a multichannel mix such as Dolby Digital 5.1
  • any appropriate means such as SRS or Dolby ProLogic algorithms
  • the Voice Assist output signal 120 is separated from the other audio signal inputs and processed separately from the Soundbar System’s program material (e.g., music or movie soundtrack) signals, as illustrated in the table of Fig. 5 and the signal flow diagram of Fig 7.
  • program material e.g., music or movie soundtrack
  • enhanced VC soundbar speaker system 552 is modified with a VA speaker subsystem including, preferably, a VC speaker microphone array (e.g., 524) and related components preferably incorporated near the center of enclosure in an upward facing surface (as shown in Fig. 6B).
  • VA speaker subsystem including, preferably, a VC speaker microphone array (e.g., 524) and related components preferably incorporated near the center of enclosure in an upward facing surface (as shown in Fig. 6B).
  • Soundbar subwoofer systems are designed such that each major loudspeaker subsystem - the soundbar system 552 and subwoofer 554 - reproduces the passband for which it has been designed.
  • the Voice Assistant s primary signal (e.g., 520) content - its voice itself - is relatively bandiimited to the range above approximately 80Hz, where the human speaking voice resides.
  • VA content i.e., Voice Controlled Assistant Output Signal 520
  • VA content may be characterized by its relatively low dynamic range.
  • VA output signal 520 - that it is relatively both band-limited and restricted in dynamic range - facilitates designation of the soundbar for expanded usage in terms of VA reproduction by the methods described in the present invention and to thereby address the shortfalls of VA reproduction by conventional soundbar-subwoofer systems.
  • the soundbar-subwoofer crossover frequency By moving down in frequency the soundbar-subwoofer crossover frequency to 80Hz or even lower, the soundbar 552 would reproduce substantially all of the VA content (not withstanding any low frequency effects mixed to that channel) and thereby avoid the potential problems described above (see Fig. 7).
  • the soundbar-subwoofer DSP system of Fig. 6C all of the channels are subjected to a bass level control.
  • the present invention processes the VA’s signal and effectively bypasses the bass summation and adjustment control, thereby ensuring that the VA signal (now Voice Controlled Assistant Output Signal 520) is clear, natural sounding and highly intelligible regardless of the bass adjustment setting.
  • a Compression/Expansion (often referred to as a“compander”) feature may be placed in the VA response or voice signal path.
  • the compander boosts signals above a set threshold (e.g. -45dB) by as much as a prescribed maximum gain level (e.g. +12dB). Additionally, the compander will typically permit setting an expansion ratio (e.g. 3:1).
  • Fig.7 includes a compander within the VA response’s signal path. It should be noted that the compander is placed upstream of the subwoofer and soundbar within the VA’s signal path so that the compander operates on both the subwoofer and soundbar signal components.
  • Bandpass frequency (such as 120Flz).
  • Bass Level Bandpass frequency
  • subwoofer reproduces any substantial portion of the VA’s bandwidth
  • unintended consequences such as poor spectral balance and inappropriate spatial cues may result, depending upon system settings (Bass Level) and subwoofer placement relative to the soundbar.
  • Bass Level Bandpass frequency
  • the dedicated system 500 for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal 520 in a Voice-Controlled Soundbar/Subwoofer Loudspeaker Product provides a useful combination of elements including an enhanced VC Voice- Controlled Soundbar/Subwoofer Loudspeaker including a subwoofer 554 and a soundbar 552 including at least one microphone transducer 524 configured and aimed to receive a first user signal (e.g., 522) spoken by a user 106 including wake word, trigger sound or user query or command (e.g., such as 122, 210); and a DSP system programmed into a controller or processor 532 configured to implement a first (“soundbar subwoofer audio program”) playback DSP mode and a second (“soundbar optimized VA output signal”) playback DSP mode which differs from said first playback DSP mode to process the first user signal 522 and generate an enhanced-intelligibility first Voice Assist
  • enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to switch from the first audio-playback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility) in response to a sensed user command or wake word (e.g., 522), and then, if no further user commands are sensed, switching the Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 back from said second (VA response enhancing) playback DSP mode to said first playback DSP mode by restoring the first audio-playback DSP settings.
  • the first audio-playback DSP settings which may include user-selectable settings for movie effects or increased low frequencies, etc.
  • the second VA dedicated DSP settings for enhanced VA audio quality and intelligibility
  • the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to attenuate or mute program material in response to sensing said user command before changing from said first audioplayback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
  • the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to change a master volume setting to a constrained subset of a master volume range when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility), and the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is preferably also programmed to generate substantially no subwoofer signal upon changing to the second VA dedicated DSP settings and filters the VA response signal in a manner which permits all of the VA response signal 520 to be played through the soundbar 552 when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
  • the Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is also programmed to (a) disable or mute any“surround left” or“surround right” signals and (b) mute or bypass any specialization processing such as SRS, 3D or D2 Widesound processing when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
  • the method of the present invention may be considered to have as many as four dedicated“modes” of audio program or DSP control pre-programmed into the system of the present invention (e.g., 500), namely, a first (“audio program enhancing”) playback DSP mode, a second (“VA response enhancing”) playback DSP mode, a third (movie, music sports, Voice Adjust audio program enhancing) playback DSP mode and a fourth (“Soundbar optimized home theater VA response enhancing”) playback DSP mode.
  • the method for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal in a Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 comprises:
  • microphone transducer 524 configured and aimed to receive a first user signal (e.g., 522) spoken by a user 106 including wake word, trigger sound or user query or command (e.g., such as 122, 210); and a DSP system, controller or processor (e.g., as illustrated in Figs 5 and 7) configured to implement a third (“soundbar subwoofer audio program”) playback DSP mode and a fourth (“soundbar optimized VA output signal”) playback mode which differs from said first or third DSP modes to process the first user signal 522 and generate an enhanced-intelligibility first Voice Assist (“VA”) output signal 520 reproduced primarily through the soundbar 552 in response to the first user signal 522;
  • a first user signal e.g., 522
  • a user query or command e.g., such as 122, 210
  • a DSP system, controller or processor e.g., as illustrated in Figs 5 and 7) configured to implement a third (“soundbar subwoofer audio

Abstract

A dedicated system (400 or 500) and method for optimizing the sound quality and intelligibility of a Voice Assistant Output Signal (e.g., 420, 520) in a Voice- Controlled Loudspeaker Product comprises an enhanced Voice-Controlled Loudspeaker including at least one microphone transducer (e.g., 524) configured and aimed to receive a first user signal (e.g., 522) spoken by a user 106 including wake word, trigger sound or user query or command (e.g., 522); and a DSP system, controller or processor (e.g., as illustrated in Figs. 3, 4, 5 and 7) configured to implement an audio ("soundbar subwoofer audio program") playback DSP mode and a pre-programmed, dedicated ("VA output signal ") VA playback DSP mode which differs from the audio playback DSP mode to process the first user signal (e.g., 522) and generate an enhanced-intelligibility first Voice Assist ("VA") output signal (e.g., 520 reproduced primarily through the soundbar 552).

Description

PCT PATENT APPLICATION
Inventors: Bradley M. STAROBIN and Brian E. COX
For: Voice-Control Soundbar Loudspeaker System with Dedicated DSP
Settings for Voice Assistant Output Signal and Mode Switching Method
PRIORITY CLAIM AND REFERENCE TO RELATED APPLICATIONS
[001] This application claims the benefit of U.S. Provisional Patent Application No. 62/611 ,832, filed Dec. 29, 2017 and entitled " Voice-Control Loudspeaker System with Dedicated DSP Settings for Voice Assistant and Mode Switching Method ", the entire disclosure of which is hereby incorporated herein by reference. The subject matter of this invention is also related to the following commonly owned applications:
(1) Ser. No. 10/692,692, filed Oct. 27, 2003, now U.S. Pat. No. 6,937,737,
(2) Ser. No. 11/147,447, filed Jun. 8, 2005, now U.S. Pat. No. 7,231 ,053, and
(3) Ser. No. 14/563,508 filed Jun. 21 , 2016, now U.S. Pat. No. 9,374,640, the entireties of which are also incorporated herein by reference.
BACKGROUND OF THE INVENTION
Field of the Invention:
[002] The present invention relates to Voice Controlled media playback systems or Smart Speakers adapted to generate Voice Controlled Assistant Output Signals and receive and respond to a user’s spoken commands.
Discussion of the Prior Art:
[003] Music listeners listen to music over compact audio reproduction systems such as the applicant’s l-Sonic® music playback system (see, e.g., U.S. Patent 7817812) which enables music listeners to enjoy surprisingly high-fidelity playback in an easy to place, aesthetically pleasing product. The widespread adoption of home Wi-Fi systems has led to a wide variety of“connected” audio products including voice controlled“Smart” speaker systems such as the Google Home™, Amazon Echo™ or the Apple HomePod™ smart speaker systems. Voice Controlled (“VC”) or“smart” speakers have microphones to pick up a user’s voice commands, connect to the user’s home Wi-Fi system and can be used to control the user’s smart home gadgets. The VC speakers sold as Amazon Echo™ or Alexa™ brand“Voice
Controlled Assistants” are described and illustrated in US Patent 8971543 and 9060224 (to Rawles, LLC). This prior art is illustrated in Figs 1A-1C, which show the elements of exemplary (typical) Amazon Echo™ system architectures 100 and 200, for purposes of establishing the background of the present invention, and the nomenclature.
[004] Some VC speaker systems (e.g., the Amazon Echo™ 104, Google Home™ or Cortana™ VC speaker systems) will run third party voice-based software (“chat- bots”) or assistant applications (e.g., Skills™ or Actions™) and can respond to a user’s spoken commands 122 with a voice-based application’s synthesized audible response or Voice Controlled Assistant Output Signal 120 generated as part of Voice Assistance (“VA”) operations, where the VC speaker (e.g., 104) senses or detects user-spoken trigger phrases (i.e.,“wake” words or phrases) or commands and generates an audible VA reply or acknowledgement (e.g., 120) in response. Amazon’s VA or voice software system is known as“Alexa”, Goggle’s VA or voice software system is known as“Assistant” and Apple’s VA or voice software system is known as“Siri”, and each of these VA systems is programmable to respond to a user’s“wake word” or response- triggering phrase, whereupon the VA takes over control of the VC speaker (e.g., 104) and responds to the user with an audible response or reply.
[005] VC loudspeaker systems necessarily reproduce several types of audio program material, including music, news, podcasts, etc., and issue Voice Assistance (VA) audible feedback or Voice Controlled Assistant Output Signals 120 to the user in response to detecting the user’s voice control commands and queries (e.g., 122, 210). Reproduced music may be significantly enhanced if the audio program signal is modified with Digital Signal Processing (“DSP”) parameters selected for optimal audio performance, but any sensed wake word or other user-spoken voice
command will require an interruption in audio program playback and generation of a VA response or Voice Controlled Assistant Output Signal (e.g. 120) which might not be enhanced with those music-optimizing DSP settings. Ideally, every VA response or Voice Controlled Assistant Output Signal 120 to a user’s spoken voice command is clearly heard and easily understood by the user (e.g., 106), but in current VC speaker systems, if the DSP parameters are optimized for music playback (for example), the VA response will be less intelligible and less understandable because the VA reply from Voice Controlled Assistant Output Signal 120 is played with those music-enhancing DSP settings.
[006] Surround-sound or home theater loudspeaker systems can also be integrated with Voice Controlled Assistant or VA features and are configured for use with standardized home theater audio systems having a plurality of playback channels, each typically served by an amplifier and a loudspeaker. In Dolby™ home theater audio playback systems, there are typically five channels of substantially full range material plus a subwoofer channel configured to reproduce band-limited low frequency material. The five substantially full range channels in a Dolby Digital 5.1™ system are typically, center, left front, right front, left surround and right surround. The center channel is typically positioned in a home theater system directly over or under the video display and that channel used by content creators for most of the dialog, which has the desirable effect of making reproduced dialog sound as if it were emanating from the display. Unfortunately, when typical surround sound loudspeaker systems are installed in listener’s homes, setup problems are encountered and many users struggle with speaker placement, component connections and related complications. In response, many listeners have turned to “soundbar” style home theater loudspeaker systems (e.g., 350 as shown in Fig. 1 D which incorporate at least left, center and right channels into a single enclosure 352 configured for use near the user’s video display (e.g., as seen in Fig. 1 G).
[007] Soundbar style single enclosure loudspeaker systems (e.g., 350, as shown in Figs 1D-1 F) are simpler to install and connect, but usually require significant care in product design to provide satisfactory performance for listeners who listen from listening positions arrayed in a listening space, so that the listener can actually hear dialog and localize the center channel and the dialog as appearing to emanate from the display while appreciating a high fidelity, natural dynamic quality to that portion of the program material rendered in the center channel. Listeners positioned elsewhere in the listening space often suffer with significantly poorer center channel dialog intelligibility and localization, and those other listeners usually notice that the center channel sound they hear is difficult to understand, not easy to localize, and distorted or compressed, especially if the audio program material is dynamic (e.g., with explosions or other loud effects). Applicant’s commonly owned US patent 9,374,640 illustrates and describes a system and“Voice Adjust” DSP method for enhancing intelligibility of movie soundtrack dialog and center channel fidelity for multi-element single enclosure loudspeaker systems (e.g., 350, as shown in Figs 1 D-1 F) in home theater sound systems. Integrating a Voice Controlled Assistant capability (like that illustrated in Figs 1A-1C) into a high performance
soundbar/subwoofer system (e.g., such as 350, shown in Figs 1C-1G) is not really a straightforward matter of plugging in new modules, however. In applicant’s development work, it was discovered that if the“Voice Adjust” DSP
parameters selected to optimize playback of the center channel signals were used in a VA controlled soundbar (for example), the VA response (e.g. 120) was less intelligible and less understandable because the VA reply from Voice Controlled Assistant Output Signal (e.g., 120) was not well suited for processing through those movie soundtrack center-channel enhancing (e.g.,“Voice Adjust”) DSP settings.
[008] There is a need, therefore, for a convenient, user-friendly, flexible and unobtrusive system and method for VC users and music listeners to more effectively use VC smart speaker systems without having the VC speaker’s Voice Assistance functions (especially Voice Controlled Assistant Output Signal 120) adversely affected by the VC speaker’s DSP settings when optimized for music or other program material. More particularly, there is a need for VC speaker’s Voice Assist Voice Controlled Assistant Output Signal 120 sound quality and intelligibility to be improved.
DESCRIPTION OF THE DRAWINGS
[009] Figs 1A-1C illustrate typical Voice-Control (“VC”) speaker architectures as used in an exemplary Alexa™ brand Voice Controlled Assistant system, in
accordance with the Prior Art. [010] Figs 1 D-1G illustrate one of Applicant’s Soundbar/Subwoofer home theater loudspeaker systems configured with Polk Audio’s Voice Adjust™ Digital Signal Processing (“DSP”) system for enhancing intelligibility of dialog and center channel fidelity in a multi-element single enclosure soundbar loudspeaker system, in accordance with the Prior Art.
[011] Fig 2 is a diagram illustrating a Voice-Control Loudspeaker System with Dedicated DSP Settings for the Voice Controlled Assistant’s (e.g., Alexa’s) spoken responses (which differ from the DSP settings used when playing audio program material such as movie soundtracks, music or sportscasts) and a Mode Switching Method, in accordance with the present invention.
[012] Fig 3 is a diagram illustrating the mode switching method for the Voice- Control Loudspeaker System of Fig. 2, in which pre-defined and dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal are selected in response to a user’s wake word or command, in accordance with the present invention.
[013] Fig 4 is a process flow diagram illustrating the mode switching method for the Voice-Control Loudspeaker System 400 of Figs. 2 and 3, in which pre-defined and dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal are selected in response to a user’s wake word or command, in accordance with the present invention.
[014] Fig 5 is a table illustrating the DSP program mode switching method for the Voice-Control Loudspeaker System 400 of Figs. 2-4, in which pre-defined and dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal are selected in response to a user’s wake word or command, in accordance with the present invention. The DSP program mode switching method of the present invention as illustrated in Fig 5 is also embodied in the Soundbar/Subwoofer system 500 of Figs 6A-7.
[015] Fig 6A-6C illustrate an improved Voice Actuated or Voice Controlled
Soundbar/Subwoofer loudspeaker system with Dedicated DSP Settings for the Voice Controlled Assistant’s Spoken Output and the Mode Switching Method adapted to receive Voice-Controlled commands and generate an improved and more reliable and universally intelligible VA Output Signal.
[016] Fig 7 illustrates an improved DSP signal flow and mode switching method for the preferred embodiment of the Voice-Controlled Soundbar/Subwoofer
loudspeaker system 500 of Figs 6A and 6B with Dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal, in accordance with the present invention.
DESCRIPTION OF PREFERRED EMBODIMENTS
[017] Turning now to Figs 2-7, an enhanced smart or Voice-Controlled (“VC”) loudspeaker system with dedicated DSP settings for use during Voice Assistance (“VA”) operation and a mode switching method are described and illustrated, in accordance with the present invention.
[018] Reproduced music or movie soundtracks, when played through compact audio reproduction systems, require carefully optimized DSP (Digital Signal
Processing) parameters for optimal audio performance. VC speakers must also generate VA audible responses to user’s commands and queries and VA response audible feedback (e.g., Alexa’s answers) was discovered to benefit from a very different set of DSP settings. The present invention recognizes the problems faced by users of voice-controlled audio systems (e.g., 100) and provides an improved system and signal control method which overcomes the problems.
[019] The applicant for the present invention has developed many great sounding loudspeaker and audio reproduction systems adapted for use with users’ Wi-Fi systems and incorporating DSP elements programmable to achieve specific sonic goals for specific audio program playback applications. US Patents 9,277,044, 9,374,640, 9,584,935, 9,706,320 and 9,807,484 provide useful context and
background for the present invention and are incorporated herein in their entireties by reference.
[020] As illustrated in Figs 2-5, the VC speaker system architecture of the present invention 400 incorporates a DSP mode switching system and method (as illustrated in Figs 2-5) to provide a more consistently intelligible and more subjectively pleasant sounding Voice Assistance response 420 as perceived by user 106 (e.g., when used with Amazon™ Alexa™ Apple™ Siri™ or Google™ Voice Assist), regardless of the audio settings that the user may have selected on the host VC speaker product 404 (which may otherwise inhibit VA response intelligibility). VA sound quality (and Voice Controlled Assistant Output Signal 420) are controlled with dedicated, preprogrammed DSP settings that override the particular DSP settings selected by the user on the basis of program material, listening conditions and personal taste. Figs 3-5 include diagrams and a table illustrating the DSP mode switching method for the Voice-Control Loudspeaker System 400 of Fig. 2, in which pre-defined and dedicated DSP Settings for the Voice Controlled Assistant’s Output Signal are selected in response to sensing a user’s wake word or command 422, in accordance with the present invention. The general principles of the method of the present invention illustrated in Figs 3-5 apply to (a) the single enclosure standalone embodiment 404 of Fig. 2 and (b) the Soundbar/Subwoofer system embodiment 500 of Figs 6A-7 (as described below).
[021] In order to place the present invention in its proper context, we’ll need to set forth some VC speaker system architecture nomenclature with a few illustrative drawings, so referring again to Prior Art Figs 1A-1C, typical VC speaker systems (e.g., sold as Amazon Echo™ or Alexa™ brand“voice controlled assistants”) are described and illustrated in US Patents 8971543 and 9060224, and Fig. 1A illustrates a first exemplary (typical) prior art system architecture 100 set in an exemplary VC speaker use environment 102, which includes a typical VC speaker system 104 and at least a first user 106. The user 106 is typically near VC speaker system 104. In this illustration, VC speaker system 104 is physically positioned on a table 108 within the environment 102. The VC speaker system 104 is shown sitting upright and supported on its base end. The VC speaker system 104 is shown communicatively coupled to remote entities 110 over a network 112 and remote entities 110 may include individual people, such as person 114, or automated systems (not shown) that serve as far end talkers to verbally interact with the user 106. The remote entities 110 may alternatively comprise cloud services 116 hosted, for example, on one or more servers 118(1 ), . . . , 118(S). These servers 118(1 )-(S) may be arranged in any number of ways, such as server farms, stacks, and the like that are commonly used in data centers. The cloud services 116 generally refer to a network accessible platform implemented as a computing infrastructure of processors, storage, software, data access, and so forth that is maintained and accessible via a network such as the Internet. Cloud services 116 do not require end-user knowledge of the physical location and configuration of the system that delivers the services. Common expressions associated with cloud services include "on-demand computing", "Software as a Service (SaaS)", "platform computing", "network accessible platform", and so forth.
[022] Cloud services 116 may host any number of applications that can process the user input received from the VC speaker system 104, and produce a suitable response. Example applications might include web browsing, online shopping, banking, email, work tools, productivity, entertainment, educational, and so forth. In Fig 1A, user 106 is shown communicating with the remote entities 110 via VC speaker system 104. The VC speaker system 104 Voice Assist outputs an audible question, "What do you want to do?" as represented by dialog bubble 120. This output may represent a question from a far end talker 114, or from a cloud service 116 (e.g., an entertainment service). The user 106 is shown replying to the question by stating, "I'd like to buy tickets to a movie" as represented by the dialog bubble 122. The VC speaker system 104 (or voice controlled assistant 104) is equipped with an array 124 of microphones 126(1), . . . . 126(M) to receive the voice input from the user 106 as well as any other audio sounds in the environment 102. The microphones 126(1)-(M) are generally arranged at a first or top end of the VC speaker system 104 opposite the base end seated on the table 108. Although multiple microphones are illustrated, in some implementations, the VC speaker system 104 may be embodied with only one microphone. The VC speaker system 104 may further include a speaker array 128 of speakers 130(1), . . . , 130(P) to output sounds in humanly perceptible frequency ranges. The speakers 130(1)-(P) may be configured to emit sounds at various frequency ranges, so that each speaker has a different range. In this manner, the VC speaker system 104 may output high frequency signals, mid frequency signals, and low frequency signals. The speakers 130(1)-(P) are generally arranged at a second or base end of the VC speaker system 104 and oriented to emit the sound in a downward direction toward the base end and opposite to the microphone array 124 in the top end. [023] The voice controlled assistant or VC speaker system 104 may further include computing components 132 that process the voice input received by the microphone array 124, enable communication with the remote entities 110 over the network 112, and generate the audio to be output by the speaker array 128. The computing components 132 are generally positioned between the microphone array 123 and the speaker array 128, although essentially any other arrangement may be used. In the Fig 1A architecture 100, the VC speaker system 104 may be configured to produce stereo or non-stereo output. The speakers 130(1)-(P) may receive a mono signal for output in a non-stereo configuration. Alternatively, the computing components 132 may generate and output to the speakers 130(1)-(P) two different channel signals for stereo output. In this stereo configuration, a first channel signal (e.g., left channel signal) is provided to one of the speakers, such as the larger speaker 130(1). A second channel signal (e.g., right channel signal) is provided to the other of the speakers, such as the smaller speaker 130(P). Due to the vertically stacked arrangement of the speakers, however, the two-channel stereo output may not be appreciated to the user 106.
[024] Fig 1 B shows another implementation of voice interactive computing architecture 200 similar to the architecture 100 of Fig. 1A, but in this illustration a voice controlled assistant or VC speaker system 204 has a different physical packaging layout that allows a spaced arrangement of the speakers to better provide stereo output, rather the vertically stacked arrangement found in the assistant 104 of Fig 1A. More particularly, the speakers 130(1)-(P) are shown at a horizontally spaced distance from one another. Optionally, VC speaker system 204 is able to play full spectrum stereo using only two speakers of different sizes. In Fig 1 B, VC speaker system 204 is communicatively coupled over the network 112 to an entertainment service 206 that is part of the cloud services 116. The entertainment service 206 is hosted on one or more servers, such as servers 208(1), . . . , 208(K), which may be arranged in any number of configurations, such as server farms, stacks, and the like that are commonly used in data centers. The entertainment service 206 may be configured to stream or otherwise download entertainment content, such as movies, music, audio books, and the like to the voice controlled assistant. When audio content is involved, the voice controlled assistant 204 can play the audio in stereo with full spectrum sound quality, even though the device has a small form factor and only two speakers. In this example scenario, the user 106 is shown directing the VC speaker system 204 to pause the music being played through the audible statement, "Pause the music" in dialog bubble 210. To support this scenario, the VC speaker system 204 is not only designed to play music in full spectrum stereo, but is also configured with an acoustic echo cancellation (AEC) module to cancel audio components being received at the microphone array 124 so that the VC speaker system 204 can clearly hear the statements and commands spoken by the user 106.
[025] Fig. 1C shows selected functional components of the voice controlled assistants or VC speaker systems 104 and 204 in more detail. Generally, each of the VC speaker systems 104 and 204 may be implemented as a standalone device that is relatively simple in terms of functional capabilities with limited input/output components, memory, and processing capabilities. For instance, the VC speaker systems 104 and 204 may not have a keyboard, keypad, or other form of
mechanical input. Nor do they have a display or touch screen to facilitate visual presentation and user touch input. Instead, the assistants 104 and 204 may be implemented with the ability to receive and output audio, a network interface
(wireless or wire-based), power, and limited processing/memory capabilities.
[026] In the illustrated implementation, each VC speaker system 104/204 includes the microphone array 124, a speaker array 128, a processor 302, and memory 304. The microphone array 124 may be used to capture speech input from the user 106, or other sounds in the environment 102. The speaker array 128 may be used to output speech from a far end talker, audible responses provided by the cloud services, forms of entertainment (e.g., music, audible books, etc.), or any other form of sound. The speaker array 128 may output a wide range of audio frequencies including both human perceptible frequencies and non-human perceptible
frequencies. In one implementation, the speaker array 128 is formed of two speakers capable of outputting full spectrum stereo sound, as will be described below in more detail. Two speaker array arrangements are shown, including the vertically stacked arrangement 128A and the horizontally spaced arrangement 128B. [027] The memory 304 may include computer-readable storage media ("CRSM"), which may be any available physical media accessible by the processor 302 to execute instructions stored on the memory. In one basic implementation, CRSM may include random access memory ("RAM") and Flash memory. In other implementations, CRSM may include, but is not limited to, read-only memory ("ROM"), electrically erasable programmable read-only memory ("EEPROM"), or any other medium which can be used to store the desired information and which can be accessed by the processor 302. Several modules such as instruction, datastores, and so forth may be stored within the memory 304 and configured to execute on the processor 302. An operating system module 306 is configured to manage hardware and services (e.g., wireless unit, USB, Codec) within and coupled to the assistant 104/204 for the benefit of other modules. Several other modules may be provided to process verbal input from the user 106. For instance, a speech recognition module 308 provides some level of speech recognition functionality. In some
implementations, this functionality may be limited to specific commands that perform fundamental tasks like waking up the device, configuring the device, and the like.
The amount of speech recognition capabilities implemented on the VC speaker system 104/204 is an implementation detail, but the architecture described herein can support having some speech recognition at the local VC speaker system
104/204 together with more expansive speech recognition at the cloud service 116.
[028] An acoustic echo cancellation module 310 and a double talk reduction module 312 are provided to process the audio signals to substantially cancel acoustic echoes and substantially reduce double talk that may occur. These modules may work together to identify times where echoes are present, where double talk is likely, where background noise is present, and attempt to reduce these external factors to isolate and focus on the“near talker” (i.e., user 106). By isolating on the near talker, better signal quality is provided to the speech recognition module 308 to enable more accurate interpretation of the speech utterances. A query formation module 314 may also be provided to receive the parsed speech content output by the speech recognition module 308 and to form a search query or some form of request. This query formation module 314 may utilize natural language processing (NLP) tools as well as various language modules to enable accurate construction of queries based on the user's speech input. The modules shown stored in the memory 304 are merely representative. Other modules 316 for processing the user voice input, interpreting that input, and/or performing functions based on that input may be provided. The voice controlled assistant 104/204 might further include a codec 318 coupled to the microphones of the microphone array 124 and the speakers of the speaker array 128 to encode and/or decode the audio signals. The codec 318 may convert audio data between analog and digital formats. A user may interact with the assistant 104/204 by speaking to it, and the microphone array 124 receives the user speech. The codec 318 encodes the user speech and transfers that audio data to other components. The assistant 104/204 can
communicate back to the user by emitting audible statements passed through the codec 318 and output through the speaker array 128. In this manner, the user interacts with the voice controlled assistant simply through speech, without use of a keyboard or display common to other types of devices.
[029] The VC speaker system or voice controlled assistant 104/204 includes a wireless unit 320 coupled to an antenna 322 to facilitate a wireless connection to a network. The wireless unit 320 may implement one or more of various wireless technologies, such as wife, Bluetooth, RF, and so on. A USB port 324 may further be provided as part of the assistant 104/204 to facilitate a wired connection to a network, or a plug-in network device that communicates with other wireless networks. In addition to the USB port 324, or as an alternative thereto, other forms of wired connections may be employed, such as a broadband connection. A power unit 326 is further provided to distribute power to the various components on the assistant 104/204. A stereo component 328 is optionally provided to output stereo signals to the various speakers in the speaker array 128.
[030] The VC speaker system or voice controlled assistant 104/204 is designed to support audio interactions with the user, in the form of receiving voice commands (e.g., words, phrase, sentences, etc.) from the user and outputting audible feedback to the user and in one implementation, the voice controlled assistant 104/204 may include non-input control mechanisms, such as basic volume control button(s) for increasing/decreasing volume, as well as power and reset buttons. There may also be a simple light element (e.g., LED) to indicate a state such as, for example, when power is on. But, otherwise, the assistant 104/204 does not use or need to use any input devices or displays. The VC speaker system 104/204 may be implemented as an aesthetically appealing device with a power cord and optionally a wired interface (e.g., broadband, USB, etc.). In the illustrated implementation, the cylindrical-shaped (e.g., Echo™) assistant 104 has an elongated cylindrical housing with apertures or slots formed in a base end to allow emission of sound waves. A cube-shaped assistant 204 may also be implemented as an aesthetically appealing device with smooth surfaces, and covered apertures for passage of sound waves. The cube or box shape enables the two speakers to be spaced apart to provide a stereo sound experience for the user. Once plugged in, each device 104/204 may automatically self-configure, or with slight aid of the user, and be ready to use. As a result, the VC speaker system or assistant 104/204 may be generally produced at a low cost.
[031] In the system 400 and method of the present invention, as described here and illustrated in Figs 2-5, the audio performance of VC speaker system 404 is enhanced by condition responsive use or implementation of dedicated DSP settings optimized for reproduction of various kinds of program material (e.g., movies, music, sports and news) and may be further affected by user settings intended to improve or alter audio playback such as dialogue/voice enhancement (or a“late night” listening mode).
[032] Referring again to Fig. 2, a first exemplary enhanced system architecture 400 is set in a VC speaker use environment 102 where the enhanced VC speaker system 404 is shown with a first user 106. The user 106 is typically near or proximal to enhanced VC speaker system 404. In this illustration, enhanced VC speaker system 404 is physically positioned on a table 108 within the environment 102 and is shown sitting upright and supported on its base end. The enhanced VC speaker system 404 is shown communicatively coupled to remote entities 110 over a network 112 and remote entities 110 may include individual people, such as person 114, or automated systems (not shown) that serve as far end talkers to verbally interact with the user 106. The remote entities 110 may alternatively comprise cloud services 116 hosted, for example, on one or more servers 118(1), . . . , 118(S). These servers 118(1 )-(S) may be arranged in any number of ways, as described above and cloud services 116 generally refer to a network accessible platform implemented as a computing infrastructure of processors, storage, software, data access, and so forth that is maintained and accessible via a network such as the Internet. Cloud services 116 do not require end-user knowledge of the physical location and configuration of the system that delivers the services. The cloud services 116 may host any number of applications that can process the user input received from the enhanced VC speaker system 404, and produce a suitable response. Example applications might include web browsing, online shopping, banking, email, work tools, productivity, entertainment, educational, and so forth, and user 106 may communicate with the remote entities 110 via enhanced VC speaker system 404.
[033] In order to maintain standards compatibility with the prior art systems illustrated in Figs 1A-1C, enhanced VC speaker system 404 incorporates Voice Assist which outputs a VA output signal 420 comprising audible questions (e.g., "What do you want to do?" as represented by dialog bubble 420), and this VA output may represent a question from a far end talker 114 or from cloud service 116 (e.g., an entertainment service). The user 106 is shown replying to the question by stating, "I'd like to Hear another song" as represented by the dialog bubble 422. The enhanced VC speaker system 404 is equipped with array 124 of microphones 126(1), . . . . 126(M) to receive the voice input from the user 106 as well as any other audio sounds in the environment 102.
[034] The microphones 126(1)-(M) may be arranged at a first or top end of enhanced VC speaker system 404 opposite the base end seated on the table 108. Although multiple microphones are illustrated, in some implementations, the enhanced VC speaker system 404 may be embodied with only one microphone. The enhanced VC speaker system 404 may further include a speaker array 128 of speakers 130(1), . . , 130(P) to output sounds in humanly perceptible frequency ranges. The speakers 130(1)-(P) may be configured to emit sounds at various frequency ranges, so that each speaker has a different range. In this manner, the enhanced VC speaker system 404 may output high frequency signals, mid
frequency signals, and low frequency signals. The speakers 130(1)-(P) are generally arranged within enhanced VC speaker system 404 and oriented to emit the sound in a selected direction. [035] The enhanced VC speaker system 404 may further include computing components 432 (best seen in Figs 2 and 3 and further including a processor 302, memory 304, wireless unit 320 and related components as illustrated in Fig. 1C) and these computing components process the user’s voice input (e.g., 422) as received by the microphone array 124, enable communication with the remote entities 110 over the network 112, and generate the audio to be output by the speaker array 128. The computing components 432 are generally positioned between the microphone array 123 and the speaker array 128, although essentially any other arrangement may be used. In the Fig 2 architecture 400, the enhanced VC speaker system 404 may be configured to produce stereo or non-stereo (e.g., mono or multi-channel home theater) output. Optionally, enhanced VC speaker system 404 has a physical packaging layout that allows a spaced arrangement of the speakers to better provide stereo or other multi-channel output with speakers 130(1)-(P) spaced from one another and is able to play full spectrum stereo using speakers of different sizes.
[036] In Fig 2, enhanced VC speaker system 404 is optionally communicatively coupled over the network 112 to an entertainment service 206 that is part of the cloud services 116. The entertainment service 206 is hosted on one or more servers (e.g., such as servers 208(1), . . . , 208(K) as shown in Fig. 1 B), which may be arranged in any number of configurations, as described above. The entertainment service 206 may be configured to stream or otherwise download entertainment content, such as movies, music, audio books, and the like to the enhanced VC speaker system 404. When audio content is involved, the enhanced VC speaker system 404 can play the audio in stereo or in a multi-channel home theater (e.g., soundbar) mode with full spectrum sound quality. In this example scenario, the user 106 is shown directing the enhanced VC speaker system 404 to pause the music being played and select another recording for playback through the audible
statement, "I’d like to hear another song" in dialog bubble 422. To support this scenario, the enhanced VC speaker system 404 is not only designed to play music in full spectrum stereo, but is also configured to clearly hear the statements and commands spoken by the user 106. [037] Generally, enhanced VC speaker system 404 has pre-programmed DSP modes (see, e.g., Fig. 5) including audio content listening modes with selected effects which are user selected on the basis of the nature of the program material and personal taste. For example, a VC speaker configured as a soundbar will be programmed with appropriate DSP settings for optimal audio reproduction of TV/Movie program material. “Movie mode” may entail augmented surround channel effects, boosted bass and other audio enhancements. These DSP settings or modes generally do not lend themselves to easily intelligible and natural sounding voice control feedback (e.g., 120) whose optimal DSP settings must differ in order to both enhance speech intelligibility and clarity, while providing the VA assistant’s desired sound quality at the possible expense of bandwidth and soundstage
(breadth of acoustic image). A further example pertains to low level listening to movie or music program material when the associated DSP modes apply loudness compensation for improved perceived spectral balance. When applied to voice control feedback (e.g., formerly VA output signal 120), the loudness compensation settings which prove to be acceptable for music and movie program material may inhibit VA output signal speech intelligibility and render VA speech unnaturally bass- heavy, with a chesty, muffled quality. The system 400 and method of the present invention establishes a plurality of distinct DSP modes with dedicated DSP settings for voice-control VA feedback (typically programmed into enhanced VC speaker system 404) so as to optimize VA intelligibility and VA voice (or audio) quality regardless of the DSP mode/effects the user may have imposed on the basis of program material or personal taste, thus providing a dedicated DSP response for use in generating enhanced VA dialog output signal 420.
[038] Furthermore, the range of control for some of the settings, master volume in particular, associated with a voice-controlled audio system may exceed that which is optimal for voice feedback. With regard to master volume, there are settings below and above which voice feedback should not be played for some voice-controlled audio products. With regard to very low master volume settings, voice feedback signal (formerly 120) will be difficult to hear for user 106 when the master volume is set too low even if certain program material may be enjoyed at that level under some listening conditions. Similarly, if user 106 is listening to program material at extremely high levels and issues a voice command (e.g., 422), the voice feedback 420 should be played at a comfortable, intelligible sound level - decidedly not that at which the program material was playing before the user’s voice command was issued.
[039] Referring again to Fig. 5, the DSP or program mode switching method of the present invention is illustrated and that dedicated system 400 and method for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal in a Voice-Controlled Loudspeaker Product, comprises the enhanced VC speaker system 404 including at least one microphone transducer (e.g., like 128 but preferably an array like 124) configured and aimed to sense and receive a first user signal spoken by user 106 (e.g., a wake word, trigger sound, user query or command (e.g., 422)). Enhanced VC speaker 404 also includes a controller or processor 432 configured to implement a first (“audio program enhancing") playback DSP mode and a second (“VA response enhancing") playback DSP mode which differs from the first DSP mode to process the first user signal 422 and generate an enhanced-intelligibility first Voice Assist (“VA”) response output signal 420 in response to the first user signal 422. As illustrated in Figs 4 and 5, The enhanced VC speaker system 404 is programmed to switch from the first audio-playback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility) in response to a sensed user command or wake word (e.g., 422), and then, if no further user commands are sensed, switching enhanced VC speaker system 404 back from said second (VA response enhancing) playback DSP mode to said first playback DSP mode by restoring the first audio-playback DSP settings. Enhanced VC speaker system 404 is also programmed to attenuate or mute the listener’s selected music or movie program material in response to sensing that user command before changing from said first audio-playback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility), and is also programmed to change a master volume setting to a constrained subset of a master volume range when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility). The enhanced VC speaker system 404 is also programmed to disable or mute any home theater (e.g., Dolby 5.1)“surround left” or“surround right” signals when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
[040] Another, but not limiting example, pertains to the configuration and physical lay-out of a voice controlled multi-loudspeaker system such as a multi-driver, single enclosure soundbar 552 as used in a soundbar/subwoofer home theater system (e.g., 500, as illustrated in Figs 6A and 6B). While this configuration may be optimal for reproduction of movie soundtracks - owing to their discrete, multichannel nature and wide bandwidth - employing all of the available loudspeaker drivers (e.g., all five mid-bass drivers 354, 356, 358, 360 and 362 as shown in Fig. 1 F) for reproduction of a voice assistant response 520 will only deleteriously affect sound quality and intelligibility relative to use of only the soundbar in such a setup. For example, attempted derivation of a VA response 520 for the surround channels may result in only objectionable noise (since the VA response signal is monophonic and surround derivation depends on the difference between Front Left and Right channels (e.g., as described in commonly owned US patent 7231053, the entire disclosure of which is incorporated herein by reference.) Alternatively, the system may feature an“all- channel stereo” mode in which the surround channels emulate the front channel signals, resulting in poorly intelligibility due to multiple sources playing identical program material if the surround loudspeakers were to reproduce the VA signal. For these reasons, surround channel loudspeakers (SL and SR), if present, ought to be muted when the VA signal is being reproduced. Finally, non-ideal placement of the subwoofer 554, such as a position far removed from the soundbar 552, coupled with poorly chosen subwoofer settings may yield a disembodied, unnatural character to the sound quality of a voice assistant reply 520. Given its band-limited nature, the VA signal (or input signal from a VA signal engine, as best seen in Fig. 3) may be reproduced by the soundbar 552 alone and the DSP parameters associated with VA playback are preferably established as a separate dedicated DSP mode for reproduction solely via amplifiers and drivers in the soundbar 552.
[041] In an exemplary embodiment, enhanced VC soundbar speaker system 552 is configured with at least three pre-programmed user-selectable sound reproduction (e.g., DSP) modes including a Movie Mode for which enhanced VC soundbar subwoofer speaker system 500 is Acoustically optimized for both movie and TV content and provides a bass boost, increased spatialization, and enhanced center channel Voice Adjust levels for improved dialogue clarity. Movie Mode is typically the default sound mode for HDMI and Optical input sources. A second user-selectable sound reproduction mode is the Music Mode which gives the user and listener a spectrally balanced sound and smoother bass while minimizing spatialization effects to ensure more natural sound reproduction of musical performance program material. A third user-selectable sound reproduction mode is a Sport Mode which gives the user enhanced vocal intelligibility for dialogue-rich content, like sporting events, news casts and talk shows. In this mode (e.g., Polk’s own“Voice Adjust"™ sound reproduction system and method) the DSP adjustments boost dialogue clarity and optimize the subwoofer volume levels. Optionally, enhanced VC soundbar subwoofer speaker system 500 is also configured to provide user-selectable sound reproduction mode known as“Night Mode" which reduces bass and volume dynamics while improving voice intelligibility for low-volume listening. In the present invention any of these user-selectable sound reproduction (or DSP) modes are changed when user 106 utters a wake word or speaks a voice command 522, in accordance with the present invention.
[042] Referring again to Fig. 6A, another exemplary enhanced system architecture 500 is set in a VC soundbar speaker system use environment 02 wherein the enhanced VC soundbar speaker system 552 is shown with a first user 106. The user 106 is typically near or proximal to enhanced VC soundbar speaker system 552. In this illustration, enhanced VC soundbar speaker system 552 is physically positioned on a table 108 within the environment 102 and is shown sitting upright and
supported on its base or bottom surface. The enhanced VC soundbar speaker system 552 is shown communicatively coupled to remote entities 110 over a network 112 and remote entities 110 may include individual people, such as person 114, or automated systems (not shown) that serve as far end talkers to verbally interact with the user 106. The remote entities 110 may alternatively comprise cloud services 116 hosted, for example, on one or more servers 118(1), . . . , 118(S). These servers 118(1 )-(S) may be arranged in any number of ways, as described above and cloud services 116 generally refer to a network accessible platform implemented as a computing infrastructure of processors, storage, software, data access, and so forth that is maintained and accessible via a network such as the Internet. Cloud services 116 do not require end-user knowledge of the physical location and configuration of the system that delivers the services. The cloud services 116 may host any number of applications that can process the user input received from the enhanced VC soundbar speaker system 552, and produce a suitable response. Example applications might include web browsing, online shopping, banking, email, work tools, productivity, entertainment, educational, and so forth, and user 106 may communicate with the remote entities 110 via enhanced VC soundbar speaker system 552. Enhanced VC soundbar speaker system 552 incorporates Voice Assist which outputs a VA output signal 520 comprising audible responses or questions (e.g., "What do you want to do?" as represented by dialog bubble 520), and this VA output may represent a question from a far end talker 114, or from cloud service 116 (e.g., an entertainment service). In the exemplary illustration of Fig. 6A, the user 106 is shown replying to the question by stating, "I'd like to Change Sound Modes" as represented by the dialog bubble 522. The enhanced VC soundbar speaker system 552 is equipped with array 524 of microphones 126(1), . . . , 126(M) to receive the voice input from the user 106 as well as any other audio sounds in the environment 102.
[043] The microphone array 524 is preferably arranged in a circular array in the center of a first or top end of enhanced VC soundbar speaker system 552 opposite the base, as best seen in Fig. 6B. The enhanced VC soundbar speaker system 552 further includes a soundbar enclosure speaker array 528 of loudspeaker drivers (similar to the array of drivers illustrated in Fig. 1 F) to output sounds in humanly perceptible frequency ranges. The soundbar enclosure speaker array 528 includes tweeter and mid-bass drivers configured to emit sounds at various frequency ranges, so that each speaker has a different range. In this manner, enhanced VC soundbar speaker system 552 may output high frequency signals, mid frequency signals, and low frequency signals which are supplemented or augmented by low frequency signals from subwoofer 554. The speakers in soundbar enclosure speaker array 528 are generally supported within and aimed by an enclosure front baffle to emit the sound in a selected direction toward a listening position (e.g., in a manner similar to that shown in Figs 1 F and 1G).
[044] The enhanced VC soundbar speaker system 552 may further include computing components 532 (similar to 432, for the embodiment of Figs. 2-5 and including a processor 302, memory 304, wireless unit 320 and related components as illustrated in Fig. 1C) that process the user’s voice input (e.g., 522) as received by the microphone array 524, enable communication with the remote entities 110 over the network 112, and generate the audio to be output by the speaker array 528. The computing components 532 are generally responsive to signals from the microphone array 123 and include DSP circuits used to drive amplifiers connected to the speaker array 528. In the Fig 6A architecture 500, the enhanced VC soundbar speaker system 552 may be configured to produce stereo or non-stereo (e.g., mono or multi- channel home theater) output. In Fig 6A, enhanced VC soundbar speaker system 552 is optionally communicatively coupled over the network 112 to an entertainment service that is part of the cloud services 116. The entertainment service is hosted on one or more servers which may be arranged in any number of configurations, as described above. The entertainment service may be configured to stream or otherwise download entertainment content, such as movies, music, audio books, and the like to the enhanced VC soundbar speaker system 552. When audio content is involved, the enhanced VC soundbar speaker system 552 can play the audio in stereo or in a multi-channel home theater (e.g., soundbar) mode with full spectrum sound quality. In this example scenario, the user 106 is shown directing the enhanced VC soundbar speaker system 552 to select another audio playback mode (e.g., thereby controlling whether the DSP engine is optimized with“Voice Adjust” processing) through the audible statement, " Change the Sound Mode " in dialog bubble 522. To support this scenario, the enhanced VC soundbar speaker system 552 is not only designed to play music in full spectrum stereo, but is also configured to clearly hear the statements and commands spoken by the user 106.
[045] As noted above, the audio content listening modes and effects are user selected on the basis of the nature of the program material and personal taste. For example, a VC speaker configured as an enhanced VC soundbar speaker system 552 will be programmed with appropriate DSP settings for optimal audio reproduction of TV/Movie program material. “Movie mode” may entail augmented surround channel effects, boosted bass and other audio enhancements. These DSP settings generally do not lend themselves to voice control feedback (e.g., 520) whose optimal DSP settings will both enhance speech intelligibility and clarity, while providing the VA assistant’s desired sound quality at the possible expense of bandwidth and soundstage (breadth of acoustic image). A further example pertains to low level listening to movie or music program material when the associated DSP modes apply loudness compensation for improved perceived spectral balance.
When applied to voice control feedback (e.g., formerly VA output signal 120), the loudness compensation settings which prove to be acceptable for music and movie program material may inhibit VA output signal speech intelligibility and render VA speech unnaturally bass-heavy, with a chesty, muffled quality. The system 500 and method of the present invention establishes dedicated DSP settings for voice-control VA feedback (typically programmed into enhanced VC soundbar speaker system 552) so as to optimize VA intelligibility and VA voice (or audio) quality regardless of the DSP mode/effects the user may have imposed on the basis of program material or personal taste, thus providing a dedicated DSP response for use in generating enhanced VA dialog output signal 520.
[046] Furthermore, the range of control for some of the settings, master volume in particular, associated with a voice-controlled audio system may exceed that which is optimal for voice feedback. With regard to master volume, there are settings below and above which voice feedback should not be played for some voice-controlled audio products. With regard to very low master volume settings, the VA’s voice feedback signal (formerly 120) will be difficult to hear for user 106 because the master volume is set too low for VA response intelligibility even if certain program material may be enjoyed at that level under some listening conditions. Similarly, if user 106 is listening to program material at extremely high levels and issues a voice command (e.g., 522), the voice feedback 520 should be played at a comfortable, intelligible sound level - decidedly not that at which the program material was playing before the user’s voice command was issued.
[047] In the method of the present invention, as illustrated in Fig. 6A (and Figs 4 and 5), when user 106 speaks a voice command or wake word (e.g., 522) that command or wake word is detected and that detection triggers (within enhanced VC soundbar speaker system 552) a response which first attenuates or mutes the audio program material or system program audio signal then being processed and amplified for playback through the speakers (e.g., within soundbar speaker array 528). In the next step, the system switches the DSP settings (e.g., amplitude, frequency response, subwoofer level, magnitude shaping, etc, from the program material enhancing DSP settings or mode to a dedicated VA response enhancing mode, as described and shown in Figs 6A-6C and Fig. 7). Next, once the dedicated DSP settings for VA mode have been enabled, enhanced VC soundbar speaker system 552 plays the VA response audio signal for user 106, and then enhanced VC soundbar speaker system 552 is programmed to monitor the microphones to detect and sense any subsequent response (e.g., command 522) from user 106. Once the enhanced soundbar-subwoofer system 500 is connected to the user’s network, user 106 may use commands 522 to ask the VA (e.g.,“Alexa”) to perform many useful tasks. To get the attention of (e.g., Polk Command Bar 522) , user 106 simply says, for example,“Alexa” and then asks the VA (Alexa) for sports updates, weather reports, cooking questions or asks the VA (Alexa) to control the sound bar (e.g., to switch to a different selected input audio signal source, change sound modes, adjust the bass or choose a user-customized sound mode (or DSP setting) such as a Polk Audio’s Voice Adjust® center channel level.
[048] The table of Fig. 5 illustrates, in table form, preferred method steps, responses, and dedicated DSP selection modes for VA signals when reproduced or played aloud using enhanced VC soundbar speaker system 552 also. For the soundbar subwoofer system 500 of Figs 6A-7 the crossover between the soundbar enclosure drivers (e.g., generally within enclosure 552) and the subwoofer (e.g., generally within enclosure 554) is usually about 120Hz, but in the present invention, for the dedicated mode pre-programmed for more intelligible Voice Assistant responses (e.g., 520) that crossover between soundbar and subwoofer is shifted downwardly to about 80Hz, specifically so that substantially all of the audio for Voice Assistant responses (e.g., 520) emanates from the soundbar 552, meaning that soundbar 552 is used over a wider passband (nearly full range) than it is employed for non-VA signals. [049] Persons of skill in the art will appreciate that the system and method of the present invention overcome the problems with typical prior art VC speaker systems by providing an enhanced Voice Assistant in a VC loudspeaker, with dedicated tuning of the signal processing (e.g., DSP) for controlling the sound quality of the VA’s voice as perceived by the user. Sound quality and intelligibility are actually different so the VA voice (e.g., the“Alexa” voice) will, upon enhancement as described and illustrated here, sound at once both friendly (a sound quality aspect of her voice) and extremely intelligible.
[050] Alternative embodiments are also available, so, for example, the microphone array may be housed in an enclosure separate and distinct from the enhanced VC soundbar speaker system 552 and linked with controller/signal processing block 532 wirelessly. So enhanced VC soundbar speaker system 552 may be configured to function in the same manner as a Google mini™ or Amazon Echo™ type device which is configured and programmed to be paired with a loudspeaker to make a voice controlled speaker system with better Voice Assist sound quality and
intelligibility. The operating modes described and illustrated in the Table of Fig. 5 are provided for use with applicant’s prototypes with a variety of functions which can be used in conjunction with the dedicated DSP voice-assistant audio mode in enhanced VC soundbar speaker system 552 (or a variant thereof). Voice Assist (“VA”) audio may include“special effects" that are best reproduced via spatialization techniques. As noted in the Table in Fig 5, there may be circumstances when the surround channels are enabled (for example) for optimal VA reproduction.
[051] Referring again to Figs 6A, 6B, 6C and 7, the system and method of the present invention may be readily adapted for use in a VA controllable Soundbar or enhanced VC soundbar speaker system 552 or Surround-sound or home theater loudspeaker system with Voice Controlled Assistant or VA features configured for use with a plurality of playback channels, each typically served by an amplifier and a loudspeaker. In an exemplary Dolby™ compatible home theater audio playback system (including DSP as illustrated in Fig. 6C), there are typically five channels of substantially full range material plus a Subwoofer channel configured to reproduce band-limited low frequency material. The five substantially full range channels in a Dolby Digital 5.1™ system are typically, center (“C”) , left front (“FL”) , right front (“FR”), left surround (“SL”) and right surround (“SR”). The center channel (“C”) is positioned in a home theater system directly over or under the video display (as shown in Fig. 1G) and in the system and method of the present invention the VA output signal playback 520 is preferably via the soundbar enclosure speakers. As noted above, enhanced VC soundbar speaker system 552 includes a VA speaker subsystem including, preferably, a VC speaker microphone array 524 and related components near the center of enclosure in an upward facing surface (as best seen in Fig. 6B). The enhanced VC soundbar speaker system 552 is otherwise very similar to soundbar 352 of loudspeaker system 350 (as shown in Fig. 1 D, which also incorporates at least left, center and right channels into a single enclosure 352 configured for use near the user’s video display (e.g., as seen in Fig. 1G), but includes the enhanced DSP for generating a more intelligible Voice Controlled Assistant Output Signal 520, as best seen in Fig. 7).
[052] As noted above, soundbar style single enclosure loudspeaker systems (e.g., 350) can provide unsatisfactory performance for listeners in some of the listening positions arrayed in a listening space. There is often only one position directly in front of the center of the soundbar which provides acceptable center-channel performance, meaning that the listener can actually hear dialog, localize the center channel and the dialog as appearing to emanate from the display and appreciate a high fidelity, natural dynamic quality to that portion of the program material rendered in the center channel. The audibility of Voice Controlled Assistant Output Signal (e.g., 120) if emanating from a typical soundbar speaker system (e.g. 350) is, in applicant’s development and testing work, often similarly compromised.
[053] Listeners positioned elsewhere in the listening space must suffer with significantly poorer Voice Controlled Assistant Output Signal intelligibility and localization, and those other listeners usually notice that the sound they hear is difficult to understand, not easy to localize, and distorted or compressed, especially if the VA response (e.g., 120) occurs when simultaneous audio program material is dynamic (e.g., with explosions or other loud effects). The typical soundbar loudspeaker (by definition, multi-element, single-enclosure, e.g., 352) can thus do a poor job of reproducing the Voice Controlled Assistant Output Signal 120, whether discrete within a multichannel mix (such as Dolby Digital 5.1) or derived from a 2- channel mixdown via any appropriate means (such as SRS or Dolby ProLogic algorithms), and so listeners experience poor intelligibility of dialog, a lack of overall clarity, with a chesty or muted unnatural timbre and this poor performance is experienced and appreciated for most of the listener seating and viewing locations in typical (domestic) home theater environments. In order to provide an improved Voice Controlled Assistant Output Signal 520, the Voice Assist output signal 120 is separated from the other audio signal inputs and processed separately from the Soundbar System’s program material (e.g., music or movie soundtrack) signals, as illustrated in the table of Fig. 5 and the signal flow diagram of Fig 7.
[054] In the exemplary embodiment of Figs. 5, 6A and 7, enhanced VC soundbar speaker system 552 is modified with a VA speaker subsystem including, preferably, a VC speaker microphone array (e.g., 524) and related components preferably incorporated near the center of enclosure in an upward facing surface (as shown in Fig. 6B). Soundbar subwoofer systems are designed such that each major loudspeaker subsystem - the soundbar system 552 and subwoofer 554 - reproduces the passband for which it has been designed. Limitations related to output capability, dynamic range and low-frequency extension impose certain requirements on the soundbar when all of the channels of information typically reproduced by the soundbar - front L/R, center, and surround L/R in addition to the voice assistant for voice controlled system - are taken into consideration. With the exception of the VA channel (e.g., 520), notwithstanding special effects mixed to that channel, the content reproduced by a soundbar-subwoofer system 500 is full-range in terms of its frequency content and hence exceeds the bandwidth of each loudspeaker subsystem of the soundbar-subwoofer system. Hence, the enhanced VA controlled soundbar system 500 answers the need for a VA output signal responsive“crossover system” - an active electronic network consisting of
complementary low-pass and high-pass filters acting on the subwoofer and soundbar respectively. Yet, the Voice Assistant’s primary signal (e.g., 520) content - its voice itself - is relatively bandiimited to the range above approximately 80Hz, where the human speaking voice resides. Furthermore, unlike music and other dynamic sound effects associated with the front, surround and (to a lesser extent) the center channel, VA content (i.e., Voice Controlled Assistant Output Signal 520) may be characterized by its relatively low dynamic range. The nature of the VA output signal 520 - that it is relatively both band-limited and restricted in dynamic range - facilitates designation of the soundbar for expanded usage in terms of VA reproduction by the methods described in the present invention and to thereby address the shortfalls of VA reproduction by conventional soundbar-subwoofer systems.
[055] There are three primary aspects to the system and method of the present invention (as illustrated in Figs 6A, 6B and 7), namely:
(1) In a typical voice assisted soundbar-subwoofer system (e.g., 350, and Fig. 6C, with the Voice output signal 120 simply mixed in) the soundbar-subwoofer crossover occurs within the lower register of the human voice which means that some portion of the voice assistant’s voice (e.g., 120) is reproduced by both the subwoofer 354 and the soundbar 352. In applicant’s development work, it was found that when the overall system design - especially with regard to the subwoofer- soundbar crossover - and system settings, such as Bass and DSP mode (Movie, Music, etc.) are such that VA reproduction is spectrally balanced, there is no need for the subwoofer. In system 500, it was discovered that having both elements of the soundbar-subwoofer system contribute to the VA’s reproduction was not helpful, especially when placement of subwoofer 554 relative to the soundbar 552 was sub- optimal. In applicant’s prototype development work, the system’s standard audio program enhancing DSP settings caused the VA response signal (formerly 120) to sound spectrally unbalanced (bass-heavy, chesty or too thin (insufficient bass) due in part to the Bass settings and/or DSP mode settings) and VA spatial cues appeared to originate far from the soundbar due to inappropriate placement of the subwoofer 554 relative to the soundbar 552, so restricting the subwoofer’s passband and (equivalently) expanding the soundbar’s portion of VA content was attempted and found to generate a more intelligible enhanced Voice Controlled Assistant Output Signal 520 which is reproduced by the soundbar. By moving down in frequency the soundbar-subwoofer crossover frequency to 80Hz or even lower, the soundbar 552 would reproduce substantially all of the VA content (not withstanding any low frequency effects mixed to that channel) and thereby avoid the potential problems described above (see Fig. 7). (2) In the more conventional soundbar-subwoofer DSP system of Fig. 6C, all of the channels are subjected to a bass level control. While adjusting spectral balance as means of compensating for the program material, room acoustics, subwoofer placement relative to the soundbar (e.g., such as 352) and/or personal taste is desirable and entirely appropriate for the great majority of typical music, movie soundtracks and other kinds of signal content, doing so for the VA’s signal (e.g., 120) can lead to unintended and undesirable artifacts which deleteriously affect the VA’s intelligibility and/or give rise to unnatural sounding speech
reproduction. The present invention (including the signal processing system and method illustrated in Fig. 7) processes the VA’s signal and effectively bypasses the bass summation and adjustment control, thereby ensuring that the VA signal (now Voice Controlled Assistant Output Signal 520) is clear, natural sounding and highly intelligible regardless of the bass adjustment setting.
(3) Referring now to Fig. 7, to ensure that the VA output signal is sufficiently loud and clear, even when the master volume is set to low levels, a Compression/Expansion (often referred to as a“compander”) feature may be placed in the VA response or voice signal path. Employed in expansion mode, the compander boosts signals above a set threshold (e.g. -45dB) by as much as a prescribed maximum gain level (e.g. +12dB). Additionally, the compander will typically permit setting an expansion ratio (e.g. 3:1). Fig.7 includes a compander within the VA response’s signal path. It should be noted that the compander is placed upstream of the subwoofer and soundbar within the VA’s signal path so that the compander operates on both the subwoofer and soundbar signal components.
[056] The advantage of using the lower crossover frequency (e.g., lower than 120Flz) for playback of VA responses 520, assuming it is sufficiently low (e.g., approximately 80Flz) is that the soundbar 552 will be more likely to cover the voice assistant’s entire passband (thereby providing the enhanced and more intelligible Voice Controlled Assistant Output Signal 520) as opposed to dividing that passband between the soundbar and subwoofer at a more typical, higher, Subwoofer
Bandpass frequency (such as 120Flz). When the subwoofer reproduces any substantial portion of the VA’s bandwidth, unintended consequences such as poor spectral balance and inappropriate spatial cues may result, depending upon system settings (Bass Level) and subwoofer placement relative to the soundbar. Furthermore, there is some advantage to de-coupling the Bass Level control from VA signal content so as to preserve its intended spectral balance and intelligibility.
[057] It will be appreciated by persons of skill in the art that the dedicated system 500 for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal 520 in a Voice-Controlled Soundbar/Subwoofer Loudspeaker Product provides a useful combination of elements including an enhanced VC Voice- Controlled Soundbar/Subwoofer Loudspeaker including a subwoofer 554 and a soundbar 552 including at least one microphone transducer 524 configured and aimed to receive a first user signal (e.g., 522) spoken by a user 106 including wake word, trigger sound or user query or command (e.g., such as 122, 210); and a DSP system programmed into a controller or processor 532 configured to implement a first (“soundbar subwoofer audio program”) playback DSP mode and a second (“soundbar optimized VA output signal”) playback DSP mode which differs from said first playback DSP mode to process the first user signal 522 and generate an enhanced-intelligibility first Voice Assist (“VA") output signal 520 reproduced in a passband between a selected lower frequency (e.g., 80Hz) and 20kHz through the soundbar 552 in response to the first user signal 522. Referring to Figs 4, 5 and 7, it is illustrated that enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to switch from the first audio-playback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility) in response to a sensed user command or wake word (e.g., 522), and then, if no further user commands are sensed, switching the Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 back from said second (VA response enhancing) playback DSP mode to said first playback DSP mode by restoring the first audio-playback DSP settings.
In addition, the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to attenuate or mute program material in response to sensing said user command before changing from said first audioplayback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility). Also, the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to change a master volume setting to a constrained subset of a master volume range when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility), and the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is preferably also programmed to generate substantially no subwoofer signal upon changing to the second VA dedicated DSP settings and filters the VA response signal in a manner which permits all of the VA response signal 520 to be played through the soundbar 552 when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility). The Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is also programmed to (a) disable or mute any“surround left” or“surround right” signals and (b) mute or bypass any specialization processing such as SRS, 3D or D2 Widesound processing when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
[058] The method of the present invention, as described and illustrated (see, e.g., Figs 4, 5 and 7) may be considered to have as many as four dedicated“modes” of audio program or DSP control pre-programmed into the system of the present invention (e.g., 500), namely, a first (“audio program enhancing”) playback DSP mode, a second (“VA response enhancing”) playback DSP mode, a third (movie, music sports, Voice Adjust audio program enhancing) playback DSP mode and a fourth (“Soundbar optimized home theater VA response enhancing”) playback DSP mode. The method for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal in a Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500, comprises:
(a) providing an enhanced VC Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker soundbar system 552 including at least one
microphone transducer 524 configured and aimed to receive a first user signal (e.g., 522) spoken by a user 106 including wake word, trigger sound or user query or command (e.g., such as 122, 210); and a DSP system, controller or processor (e.g., as illustrated in Figs 5 and 7) configured to implement a third (“soundbar subwoofer audio program”) playback DSP mode and a fourth (“soundbar optimized VA output signal”) playback mode which differs from said first or third DSP modes to process the first user signal 522 and generate an enhanced-intelligibility first Voice Assist (“VA") output signal 520 reproduced primarily through the soundbar 552 in response to the first user signal 522;
(b) sensing said first user signal (e.g., 522) spoken by a user 106 (e.g., a user’s spoken voice command or wake word);
(c) attenuating or muting program material playing through said Voice- Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500; and
(d) changing the DSP from the third mode used for program material playback to the fourth VA message playback mode providing enhanced intelligibility.
[059] Having described and illustrated preferred embodiments of a new and improved or enhanced VC speaker system 400, an enhanced Voice-Controlled Soundbar Subwoofer Loudspeaker Product 500 and a novel pre-programmed, dedicated DSP or audio program mode switching method for more intelligible VA response playback, it is believed that other modifications, variations and changes will be suggested to those skilled in the art in view of the teachings set forth herein.
It is therefore to be understood that all such variations, modifications and changes are believed to fall within the scope of the present invention as set forth in the appended claims.

Claims

What is Claimed is:
1. A dedicated system and method for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal in a Voice-Controlled
Loudspeaker Product, comprises:
an enhanced VC speaker system 404 including at least one microphone transducer 124 configured and aimed to receive a first user signal (e.g., 422) spoken by a user 106 including wake word, trigger sound or user query or command (e.g., 422); and
a controller or processor 432 configured to implement a first (“audio program enhancing") playback DSP mode and a second (“VA response enhancing”) playback DSP mode which differs from said first DSP mode to process the first user signal 422 and generate an enhanced-intelligibility first Voice Assist (“VA”) response output signal 420 in response to the first user signal 422.
2. The dedicated system and method for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant in a Voice-Controlled Loudspeaker Product of claim 1 :
wherein the enhanced VC speaker system 404 is programmed to switch from the first audio-playback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility) in response to a sensed user command or wake word (e.g., 422), and then, if no further user commands are sensed, switching enhanced VC speaker system 404 back from said second (VA response enhancing) playback DSP mode to said first playback DSP mode by restoring the first audio-playback DSP settings.
3. The dedicated system and method for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant in a Voice-Controlled Loudspeaker Product of claim 2, wherein the enhanced VC speaker system 404 is programmed to attenuate or mute program material in response to sensing said user command before changing from said first audio-playback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility)
4. The dedicated system and method for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant in a Voice-Controlled Loudspeaker Product of claim 2, wherein the enhanced VC speaker system 404 is programmed to change a master volume setting to a constrained subset of a master volume range when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
5. The dedicated system and method for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant in a Voice-Controlled Loudspeaker Product of claim 2, wherein the enhanced VC speaker system 404 is programmed to mute or bypass any specialization processing such as SRS, 3D or D2 Widesound processing when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
6. The dedicated system and method for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant in a Voice-Controlled Loudspeaker Product of claim 2, wherein the enhanced VC speaker system 404 is programmed to disable or mute any“surround left” or“surround right" signals when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
7. A dedicated system 500 for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal 520 in a Voice-Controlled Soundbar/Subwoofer Loudspeaker Product, comprises:
an enhanced VC Voice-Controlled Soundbar/Subwoofer Loudspeaker including a subwoofer 554 and a soundbar 552 including at least one microphone transducer 524 configured and aimed to receive a first user signal (e.g., 522) spoken by a user 106 including wake word, trigger sound or user query or command (e.g., such as 122, 210); and
a DSP system programmed into a controller or processor 532 configured to implement a first (“soundbar subwoofer audio program”) playback DSP mode and a second (“soundbar optimized VA output signal”) playback DSP mode which differs from said first playback DSP mode to process the first user signal 522 and generate an enhanced-intelligibility first Voice Assist (“VA”) output signal 520 reproduced in a passband between a selected lower frequency (e.g., 80Hz) and 20kHz through the soundbar 552 in response to the first user signal 522.
8. The dedicated system for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal in a Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 of Claim 7, wherein the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to switch from the first audio-playback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility) in response to a sensed user command or wake word (e.g., 522), and then, if no further user commands are sensed, switching the Voice- Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 back from said second (VA response enhancing) playback DSP mode to said first playback DSP mode by restoring the first audio-playback DSP settings.
9. The dedicated system for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal in a Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 of Claim 8,
wherein the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to attenuate or mute program material in response to sensing said user command before changing from said first audioplayback DSP settings (which may include user-selectable settings for movie effects or increased low frequencies, etc.) to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility)
10. The dedicated system for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal in a Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 of Claim 8,
wherein the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to change a master volume setting to a constrained subset of a master volume range when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
11. The dedicated system for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal in a Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 of Claim 8,
wherein the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to generate substantially no subwoofer signal upon changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
12. The dedicated system for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal in a Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 of Claim 11 ,
wherein the enhanced Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 is programmed to filter the VA response signal in a manner which permits all of the VA response signal 520 to be played through the soundbar 552 when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
13. The dedicated system for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal in a Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 of Claim 8,
wherein the Voice-Controlled Soundbar Subwoofer Home Theater
Loudspeaker Product 500 is programmed to disable or mute any“surround left” or “surround right" signals when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
14. The dedicated system for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal in a Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 of Claim 8,
wherein the Voice-Controlled Soundbar Subwoofer Home Theater
Loudspeaker Product 500 is programmed to mute or bypass any specialization processing such as SRS, 3D or D2 Widesound processing when changing to the second VA dedicated DSP settings (for enhanced VA audio quality and intelligibility).
15. A method for optimizing the sound quality of and a user’s intelligibility of a Voice Assistant Output Signal in a Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500, comprises:
providing an enhanced VC Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker soundbar system 552 including at least one microphone transducer 524 configured and aimed to receive a first user signal (e.g., 522) spoken by a user 106 including wake word, trigger sound or user query or command (e.g., such as 122, 210); and a DSP system, controller or processor (e.g., as illustrated in Fig. 7) configured to implement four dedicated DSP or audio program modes including a first (“audio program enhancing") playback DSP mode, a second (“VA response enhancing”) playback DSP mode, a third (“soundbar subwoofer audio program”) playback DSP mode and a fourth (“soundbar optimized VA output signal”) playback mode which differs from said first or third DSP modes to process the first user signal 522 and generate an enhanced-intelligibility first Voice Assist (“VA”) output signal 520 reproduced primarily through the soundbar 552 in response to the first user signal 522;
sensing said first user signal (e.g., 522) spoken by a user 106 (e.g., a user’s spoken voice command or wake word);
attenuating or muting program material playing through said Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500; and
changing the DSP from the third mode used for program material playback to the fourth VA message playback mode providing enhanced intelligibility.
16. The method of claim 15, further comprising:
playing or generating the audible VA response (e.g., 520) to the first user signal (e.g., 522) spoken by a user 106; and
monitoring for a subsequent user signal (e.g., 522) spoken by said user 106 in response to previous VA response, and, if none, resuming playback of said program material playing through said Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 with said third DSP mode settings.
17. The method of claim 15, further including:
Sensing whether said user 106 provides a subsequent user signal (e.g., 522) spoken by a user 106, and, if so, attenuating or muting program material playing through said Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500;
changing the DSP from the third mode used for program material playback to the fourth VA message playback mode providing enhanced intelligibility;
playing or generating the audible VA response (e.g., 520) to the first user signal (e.g., 522) spoken by a user 106; and
monitoring for a subsequent user signal (e.g., 522) spoken by said user 106 in response to previous VA response, and, if none, resuming playback of said program material playing through said Voice-Controlled Soundbar Subwoofer Home Theater Loudspeaker Product 500 with said third DSP mode settings.
18. The method of claim 17, wherein said step of changing the DSP from the third mode used for program material playback to the fourth VA message playback mode providing enhanced intelligibility comprises applying a high pass filter to said voice assist signal and providing said filtered voice assist signal solely to amplifiers driving mid-bass drivers in said enhanced VC soundbar speaker system 552.
19. The method of claim 18, wherein said step of changing the DSP from the third mode used for program material playback to the fourth VA message playback mode providing enhanced intelligibility comprises applying a high pass filter to said voice assist signal and then compressing the dynamic range of said filtered voice assist signal and providing said filtered, compressed voice assist signal solely to amplifiers driving mid-bass drivers in said enhanced VC soundbar speaker system 552.
PCT/US2018/068074 2017-12-29 2018-12-29 Voice-control soundbar loudspeaker system with dedicated dsp settings for voice assistant output signal and mode switching method WO2019133942A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP18895921.7A EP3776169A4 (en) 2017-12-29 2018-12-29 Voice-control soundbar loudspeaker system with dedicated dsp settings for voice assistant output signal and mode switching method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762611832P 2017-12-29 2017-12-29
US62/611,832 2017-12-29

Publications (1)

Publication Number Publication Date
WO2019133942A1 true WO2019133942A1 (en) 2019-07-04

Family

ID=67064150

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/068074 WO2019133942A1 (en) 2017-12-29 2018-12-29 Voice-control soundbar loudspeaker system with dedicated dsp settings for voice assistant output signal and mode switching method

Country Status (2)

Country Link
EP (1) EP3776169A4 (en)
WO (1) WO2019133942A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114302248A (en) * 2021-04-30 2022-04-08 海信视像科技股份有限公司 Display device and multi-window voice broadcasting method
EP4020465A3 (en) * 2021-05-28 2022-11-30 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method and apparatus for denoising voice data, storage medium, and program product
DE202022102864U1 (en) 2022-05-24 2023-09-05 Vierton-Audio AG Loudspeaker unit with hearing aid function

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040247134A1 (en) * 2003-03-18 2004-12-09 Miller Robert E. System and method for compatible 2D/3D (full sphere with height) surround sound reproduction
US6937737B2 (en) 2003-10-27 2005-08-30 Britannia Investment Corporation Multi-channel audio surround sound from front located loudspeakers
US7817812B2 (en) 2005-05-31 2010-10-19 Polk Audio, Inc. Compact audio reproduction system with large perceived acoustic size and image
US20120215537A1 (en) * 2011-02-17 2012-08-23 Yoshihiro Igarashi Sound Recognition Operation Apparatus and Sound Recognition Operation Method
US20140093085A1 (en) * 2012-10-01 2014-04-03 Sonos, Inc. Providing a multi-channel and a multi-zone audio environment
US20140222436A1 (en) * 2013-02-07 2014-08-07 Apple Inc. Voice trigger for a digital assistant
US8971543B1 (en) 2012-06-25 2015-03-03 Rawles Llc Voice controlled assistant with stereo sound from two speakers
US9060224B1 (en) 2012-06-01 2015-06-16 Rawles Llc Voice controlled assistant with coaxial speaker and microphone arrangement
US20160019907A1 (en) * 2013-04-11 2016-01-21 Nuance Communications, Inc. System For Automatic Speech Recognition And Audio Entertainment
US9277044B2 (en) 2013-05-09 2016-03-01 Steven P. Kahn Transportable wireless loudspeaker and system and method for managing multi-user wireless media playback over a media playback system
US9374640B2 (en) 2013-12-06 2016-06-21 Bradley M. Starobin Method and system for optimizing center channel performance in a single enclosure multi-element loudspeaker line array
US9584935B2 (en) 2015-05-29 2017-02-28 Sound United, Llc. Multi-zone media system and method for providing multi-zone media
US9807484B2 (en) 2015-10-02 2017-10-31 Sound United, LLC Loudspeaker system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9692742B1 (en) * 2014-12-23 2017-06-27 Amazon Technologies, Inc. Third party audio announcements

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040247134A1 (en) * 2003-03-18 2004-12-09 Miller Robert E. System and method for compatible 2D/3D (full sphere with height) surround sound reproduction
US6937737B2 (en) 2003-10-27 2005-08-30 Britannia Investment Corporation Multi-channel audio surround sound from front located loudspeakers
US7231053B2 (en) 2003-10-27 2007-06-12 Britannia Investment Corp. Enhanced multi-channel audio surround sound from front located loudspeakers
US7817812B2 (en) 2005-05-31 2010-10-19 Polk Audio, Inc. Compact audio reproduction system with large perceived acoustic size and image
US20120215537A1 (en) * 2011-02-17 2012-08-23 Yoshihiro Igarashi Sound Recognition Operation Apparatus and Sound Recognition Operation Method
US9060224B1 (en) 2012-06-01 2015-06-16 Rawles Llc Voice controlled assistant with coaxial speaker and microphone arrangement
US8971543B1 (en) 2012-06-25 2015-03-03 Rawles Llc Voice controlled assistant with stereo sound from two speakers
US20140093085A1 (en) * 2012-10-01 2014-04-03 Sonos, Inc. Providing a multi-channel and a multi-zone audio environment
US20140222436A1 (en) * 2013-02-07 2014-08-07 Apple Inc. Voice trigger for a digital assistant
US20160019907A1 (en) * 2013-04-11 2016-01-21 Nuance Communications, Inc. System For Automatic Speech Recognition And Audio Entertainment
US9277044B2 (en) 2013-05-09 2016-03-01 Steven P. Kahn Transportable wireless loudspeaker and system and method for managing multi-user wireless media playback over a media playback system
US9374640B2 (en) 2013-12-06 2016-06-21 Bradley M. Starobin Method and system for optimizing center channel performance in a single enclosure multi-element loudspeaker line array
US9584935B2 (en) 2015-05-29 2017-02-28 Sound United, Llc. Multi-zone media system and method for providing multi-zone media
US9706320B2 (en) 2015-05-29 2017-07-11 Sound United, LLC System and method for providing user location-based multi-zone media
US9807484B2 (en) 2015-10-02 2017-10-31 Sound United, LLC Loudspeaker system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3776169A4

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114302248A (en) * 2021-04-30 2022-04-08 海信视像科技股份有限公司 Display device and multi-window voice broadcasting method
CN114302248B (en) * 2021-04-30 2024-04-12 海信视像科技股份有限公司 Display equipment and multi-window voice broadcasting method
EP4020465A3 (en) * 2021-05-28 2022-11-30 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method and apparatus for denoising voice data, storage medium, and program product
US11798573B2 (en) 2021-05-28 2023-10-24 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method for denoising voice data, device, and storage medium
DE202022102864U1 (en) 2022-05-24 2023-09-05 Vierton-Audio AG Loudspeaker unit with hearing aid function

Also Published As

Publication number Publication date
EP3776169A4 (en) 2022-01-26
EP3776169A1 (en) 2021-02-17

Similar Documents

Publication Publication Date Title
AU2018200212B2 (en) Handsfree beam pattern configuration
US10440492B2 (en) Calibration of virtual height speakers using programmable portable devices
US20200107122A1 (en) Spatially ducking audio produced through a beamforming loudspeaker array
US9374640B2 (en) Method and system for optimizing center channel performance in a single enclosure multi-element loudspeaker line array
JP6009547B2 (en) Audio system and method for audio system
JP2011512768A (en) Audio apparatus and operation method thereof
JP5320303B2 (en) Sound reproduction apparatus and video / audio reproduction system
WO2019133942A1 (en) Voice-control soundbar loudspeaker system with dedicated dsp settings for voice assistant output signal and mode switching method
CN106792365B (en) Audio playing method and device
WO2021133779A1 (en) Audio device with speech-based audio signal processing
KR20050085360A (en) Personalized surround sound headphone system
JP2008537374A (en) Audio data processing apparatus, audio data processing method, program element, and computer-readable medium
JP4036140B2 (en) Sound output system
JP2021513263A (en) How to do dynamic sound equalization
US9813039B2 (en) Multiband ducker
JP4418479B2 (en) Sound playback device
WO2019136460A1 (en) Synchronized voice-control module, loudspeaker system and method for incorporating vc functionality into a separate loudspeaker system
WO2019139991A1 (en) System and method for generating an improved voice assist algorithm signal input
US20220360899A1 (en) Dynamics processing across devices with differing playback capabilities
RU2804680C2 (en) Playback at lower level
Sigismondi Personal monitor systems
JP5194614B2 (en) Sound field generator
JP2007295634A (en) Sound output system
KR100703923B1 (en) 3d sound optimizing apparatus and method for multimedia devices
CN113728661A (en) Lower layer reproduction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18895921

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018895921

Country of ref document: EP

Effective date: 20200729

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08.10.2020)