EP4670366A1 - Wiedergabevorrichtungen mit dedizierten hochfrequenzwandlern - Google Patents

Wiedergabevorrichtungen mit dedizierten hochfrequenzwandlern

Info

Publication number
EP4670366A1
EP4670366A1 EP24713302.8A EP24713302A EP4670366A1 EP 4670366 A1 EP4670366 A1 EP 4670366A1 EP 24713302 A EP24713302 A EP 24713302A EP 4670366 A1 EP4670366 A1 EP 4670366A1
Authority
EP
European Patent Office
Prior art keywords
playback
playback device
audio
transducer
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP24713302.8A
Other languages
English (en)
French (fr)
Inventor
Jerad Lewis
Desiree AZZALINA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sonos Inc
Original Assignee
Sonos Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sonos Inc filed Critical Sonos Inc
Publication of EP4670366A1 publication Critical patent/EP4670366A1/de
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R27/00Public address systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/22Arrangements for obtaining desired frequency or directional characteristics for obtaining desired frequency characteristic only 
    • H04R1/24Structural combinations of separate transducers or of two parts of the same transducer and responsive respectively to two or more frequency ranges
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/22Arrangements for obtaining desired frequency or directional characteristics for obtaining desired frequency characteristic only 
    • H04R1/26Spatial arrangements of separate transducers responsive to two or more frequency ranges
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements
    • H04R29/007Monitoring arrangements; Testing arrangements for public address systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R2201/00Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
    • H04R2201/003Mems transducers or their use
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R2201/00Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
    • H04R2201/02Details casings, cabinets or mounting therein for transducers covered by H04R1/02 but not provided for in any of its subgroups
    • H04R2201/028Structural combinations of loudspeakers with built-in power amplifiers, e.g. in the same acoustic enclosure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R2227/00Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
    • H04R2227/005Audio distribution systems for home, i.e. multi-room use

Definitions

  • the present disclosure is related to consumer goods and, more particularly, to methods, systems, products, features, services, and other elements directed to media playback or some aspect thereof.
  • Media content e.g., songs, podcasts, video sound
  • playback devices such that each room with a playback device can play back corresponding different media content.
  • rooms can be grouped together for synchronous playback of the same media content, and/or the same media content can be heard in all rooms synchronously.
  • Figure 1 A is a partial cutaway view of an environment having a media playback system configured in accordance with aspects of the disclosed technology.
  • Figure IB is a schematic diagram of the media playback system of Figure 1A and one or more networks.
  • Figure 1C is a block diagram of a playback device.
  • Figure ID is a block diagram of a playback device.
  • Figure IE is a block diagram of a bonded playback device.
  • Figure IF is a block diagram of a network microphone device.
  • Figure 1G is a block diagram of a playback device.
  • Figure 1H is a partial schematic diagram of a control device.
  • Figures 1-1 through IL are schematic diagrams of corresponding media playback system zones.
  • Figure IM is a schematic diagram of media playback system areas.
  • Figure 2A is a front isometric view of a playback device configured in accordance with aspects of the disclosed technology.
  • Figure 2B is a front isometric view of the playback device of Figure 3 A without a grille.
  • Figure 2C is an exploded view of the playback device of Figure 2A.
  • Figure 3A is a front view of a network microphone device configured in accordance with aspects of the disclosed technology.
  • Figure 3B is a side isometric view of the network microphone device of Figure 3 A.
  • Figure 3C is an exploded view of the network microphone device of Figures 3 A and 3B.
  • Figure 3D is an enlarged view of a portion of Figure 3B.
  • Figure 3E is a block diagram of the network microphone device of Figures 3A-3D.
  • Figure 3F is a schematic diagram of an example voice input.
  • Figures 4A-4D are schematic diagrams of a control device in various stages of operation in accordance with aspects of the disclosed technology.
  • Figure 5 is a front view of a control device in accordance with aspects of the disclosed technology.
  • Figure 6 is a message flow diagram of a media playback system in accordance with aspects of the disclosed technology.
  • Figure 7 is a diagram illustrating an example of a playback device including an auxiliary transducer in accordance with aspects of the disclosed technology.
  • Figure 8 is a block diagram of circuitry as may be included in examples of the playback device of Figure 7 in accordance with aspects of the disclosed technology.
  • Figure 9 is a schematic diagram illustrating an example of an audio-based identification techniques in accordance with aspects of the disclosed technology.
  • Figure 10 is a diagram of a playback device configured to perform room detection using acoustic signals in accordance with aspects of the disclosed technology.
  • Figure 11 is a block diagram of an example of a pair of playback devices configured to perform distance determination using acoustic signals in accordance with aspects of the disclosed technology.
  • Figure 12 is a sequence diagram for one example of a distance determination process in accordance with aspects of the disclosed technology.
  • Figure 13 A is an isometric view of a network device configured in accordance with aspects of the disclosed technology.
  • Figure 13B is a block diagram of circuitry as may be included in examples of the device of Figure 13 A in accordance with some aspects of the disclosed technology.
  • Figure 14 is an isometric view of another network device configured in accordance with aspects of the disclosed technology.
  • Figure 15A is a schematic diagram illustrating an example of a network device communication in accordance with aspects of the disclosed technology.
  • Figure 15B is a schematic diagram illustrating an example of the network device Figure 15A communicating in accordance with aspects of the disclosed technology.
  • Figure 15C is a block diagram of circuitry as may be included in examples of the device of Figures 15A and 15B in accordance with some aspects of the disclosed technology.
  • Figure 16 is a schematic diagram illustrating an example vehicle configured in accordance with aspects of the disclosed technology.
  • Embodiments described herein relate to transmission and reception of acoustic signals for presence detection and other purposes, and to playback devices configured for the same.
  • Playback devices can be controlled to transmit/output acoustic signals, such as acoustic chirp signals, that are separate from the playback of audio content by the playback devices.
  • acoustic signals can be detected by the playback devices and/or other playback devices in a media playback system (via a microphone, for example) and used for a variety of different purposes and applications, including presence detection, room detection, distance determination, or secure set-up functions, to name a few examples.
  • reception of acoustic chirp signals can be used for presence detection of nearby playback devices.
  • the acoustic chirp signals are unique to each playback device within a media playback system and, as such, can be analyzed to identify the one or more playback devices and subsequently determine which playback device is nearest to the receiving device (based on the strength of the acoustic signal, for example). In some examples, this allows the transfer of a playback session between the receiving device and the nearest playback device.
  • the reception of the acoustic chirp signals can be used to determine distances between the receiving device and other playback devices or other structures, such as walls, for example.
  • Playback devices generally use high frequency transducers (e.g., tweeters) to output the acoustic chirp signals.
  • high frequency transducers e.g., tweeters
  • certain playback devices or other devices in media playback systems such as a subwoofer or amplifier, for example, may not include a tweeter and therefore may lack the capability to transmit/output the acoustic chirp signals. Such devices therefore may not be able to employ the functionality associated with the use of acoustic chirp signals.
  • aspects and embodiments are directed to an approach in which devices that otherwise lack a tweeter or other high frequency transducer are equipped with a dedicated onboard auxiliary transducer, such as a piezoelectric or a micro-electromechanical system (MEMS) transducer, for example, configured to emit high frequency acoustic signals, and thereby enable such devices to employ acoustic chirp functionality.
  • a dedicated onboard auxiliary transducer such as a piezoelectric or a micro-electromechanical system (MEMS) transducer, for example, configured to emit high frequency acoustic signals, and thereby enable such devices to employ acoustic chirp functionality.
  • MEMS micro-electromechanical system
  • a playback device includes a first audio transducer, and audio playback circuitry coupled to the first audio transducer to allow the playback device to play back audio content, optionally in synchrony with other playback devices, as discussed below.
  • the playback device further includes a second audio transducer uncoupled from the audio playback circuitry and configured to emit high frequency acoustic signals.
  • the playback device further includes one or more processors and at least one non-transitory computer readable medium storing program instructions executable by the one or more processors to control the playback device to play back one or more channels of audio content via the audio playback circuitry and the first audio transducer, and acoustically transmit, via the second audio transducer, a reference audio signal including an identifier that identifies the playback device.
  • the second audio transducer is a piezoelectric or MEMS transducer, as discussed further below.
  • the playback device may be a subwoofer, for example.
  • Figure 1A is a partial cutaway view of a media playback system 100 distributed in an environment 101 (e.g., a house).
  • the media playback system 100 comprises one or more playback devices 110 (identified individually as playback devices HOa-n), one or more network microphone devices 120 (“NMDs”) (identified individually as NMDs 120a-c), and one or more control devices 130 (identified individually as control devices 130a and 130b).
  • NMDs network microphone devices 120
  • control devices 130 identified individually as control devices 130a and 130b.
  • a playback device can generally refer to a network device configured to receive, process, and output data of a media playback system.
  • a playback device can be a network device that receives and processes audio content.
  • a playback device includes one or more transducers or speakers powered by one or more amplifiers.
  • a playback device includes one of (or neither of) the speaker and the amplifier.
  • a playback device can comprise one or more amplifiers configured to drive one or more speakers external to the playback device via a corresponding wire or cable.
  • the term “NMD” i.e., a “network microphone device” can generally refer to a network device that is configured for audio detection.
  • an NMD is a stand-alone device configured primarily for audio detection.
  • an NMD is incorporated into a playback device (or vice versa).
  • the term “control device” can generally refer to a network device configured to perform functions relevant to facilitating user access, control, and/or configuration of the media playback system 100.
  • Each of the playback devices 110 is configured to receive audio signals or data from one or more media sources (e.g., one or more remote servers, one or more local devices, etc.) and play back the received audio signals or data as sound.
  • the one or more NMDs 120 are configured to receive spoken word commands
  • the one or more control devices 130 are configured to receive user input.
  • the media playback system 100 can play back audio via one or more of the playback devices 110.
  • the playback devices 110 are configured to commence playback of media content in response to a trigger.
  • one or more of the playback devices 110 can be configured to play back a morning playlist upon detection of an associated trigger condition (e.g., presence of a user in a kitchen, detection of a coffee machine operation, etc.).
  • the media playback system 100 is configured to play back audio from a first playback device (e.g., the playback device 110a) in synchrony with a second playback device (e.g., the playback device 110b).
  • a first playback device e.g., the playback device 110a
  • a second playback device e.g., the playback device 110b
  • Interactions between the playback devices 110, NMDs 120, and/or control devices 130 of the media playback system 100 configured in accordance with the various embodiments of the disclosure are described in greater detail below with respect to Figures IB-6.
  • the environment 101 comprises a household having several rooms, spaces, and/or playback zones, including (clockwise from upper left) a master bathroom 101a, a master bedroom 101b, a second bedroom 101c, a family room or den 101 d, an office lOle, a living room lOlf, a dining room 101g, a kitchen lOlh, and an outdoor patio lOli. While certain embodiments and examples are described below in the context of a home environment, the technologies described herein may be implemented in other types of environments.
  • the media playback system 100 can be implemented in one or more commercial settings (e.g., a restaurant, mall, airport, hotel, a retail or other store), one or more vehicles (e.g., a sports utility vehicle, bus, car, a ship, a boat, an airplane, etc.), multiple environments (e.g., a combination of home and vehicle environments), and/or another suitable environment where multi-zone audio may be desirable.
  • a commercial setting e.g., a restaurant, mall, airport, hotel, a retail or other store
  • vehicles e.g., a sports utility vehicle, bus, car, a ship, a boat, an airplane, etc.
  • multiple environments e.g., a combination of home and vehicle environments
  • multi-zone audio may be desirable.
  • the media playback system 100 can comprise one or more playback zones, some of which may correspond to the rooms in the environment 101.
  • the media playback system 100 can be established with one or more playback zones, after which additional zones may be added, or removed, to form, for example, the configuration shown in Figure 1A.
  • Each zone may be given a name according to a different room or space such as the office lOle, master bathroom 101a, master bedroom 101b, the second bedroom 101c, kitchen lOlh, dining room 101g, living room 10 If, and/or the balcony lOli.
  • a single playback zone may include multiple rooms or spaces.
  • a single room or space may include multiple playback zones.
  • the second bedroom 101c, the office lOle, the living room 10 If, the dining room 101g, the kitchen lOlh, and the outdoor patio lOli each include one playback device 110, and the master bathroom 101a, the master bedroom 101b, and the den 101 d include a plurality of playback devices 110.
  • the playback devices 1101 and 110m may be configured, for example, to play back audio content in synchrony as individual ones of playback devices 110, as a bonded playback zone, as a consolidated playback device, and/or any combination thereof.
  • the playback devices HOh-k can be configured, for instance, to play back audio content in synchrony as individual ones of playback devices 110, as one or more bonded playback devices, and/or as one or more consolidated playback devices. Additional details regarding bonded and consolidated playback devices are described below with respect to Figures IB, IE, and II - IM.
  • one or more of the playback zones in the environment 101 may each be playing different audio content.
  • a user may be grilling on the patio lOli and listening to hip hop music being played by the playback device 110c while another user is preparing food in the kitchen lOlh and listening to classical music played by the playback device 110b.
  • a playback zone may play the same audio content in synchrony with another playback zone.
  • the user may be in the office lOle listening to the playback device 1 lOf playing back the same hip hop music being played back by playback device 110c on the patio lOli.
  • Figure IB is a schematic diagram of the media playback system 100 and a cloud network 102. For ease of illustration, certain devices of the media playback system 100 and the cloud network 102 are omitted from Figure IB.
  • One or more communication links 103 (referred to hereinafter as “the links 103”) communicatively couple the media playback system 100 and the cloud network 102.
  • the links 103 can comprise, for example, one or more wired networks, one or more wireless networks, one or more wide area networks (WAN), one or more local area networks (LAN), one or more personal area networks (PAN), one or more telecommunication networks (e.g., one or more Global System for Mobiles (GSM) networks, Code Division Multiple Access (CDMA) networks, Long-Term Evolution (LTE) networks, 5G communication networks, and/or other suitable data transmission protocol networks), etc.
  • GSM Global System for Mobiles
  • CDMA Code Division Multiple Access
  • LTE Long-Term Evolution
  • 5G communication networks and/or other suitable data transmission protocol networks
  • the cloud network 102 is configured to deliver media content (e.g., audio content, video content, photographs, social media content, etc.) to the media playback system 100 in response to a request transmitted from the media playback system 100 via the links 103.
  • the cloud network 102 is further configured to receive data (e.g., voice input data) from the media playback system 100 and correspondingly transmit commands and/
  • the cloud network 102 comprises computing devices 106 (identified separately as a first computing device 106a, a second computing device 106b, and a third computing device 106c).
  • the computing devices 106 can comprise individual computers or servers, such as, for example, a media streaming service server storing audio and/or other media content, a voice service server, a social media server, a media playback system control server, etc.
  • one or more of the computing devices 106 comprise modules of a single computer or server.
  • one or more of the computing devices 106 comprise one or more modules, computers, and/or servers.
  • the cloud network 102 is described above in the context of a single cloud network, in some embodiments the cloud network 102 comprises a plurality of cloud networks comprising communicatively coupled computing devices. Furthermore, while the cloud network 102 is shown in Figure IB as having three of the computing devices 106, in some embodiments, the cloud network 102 comprises fewer (or more than) three computing devices 106.
  • the media playback system 100 is configured to receive media content from the networks 102 via the links 103.
  • the received media content can comprise, for example, a Uniform Resource Identifier (URI) and/or a Uniform Resource Locator (URL).
  • URI Uniform Resource Identifier
  • URL Uniform Resource Locator
  • the media playback system 100 can stream, download, or otherwise obtain data from a URI or a URL corresponding to the received media content.
  • a network 104 communicatively couples the links 103 and at least a portion of the devices (e.g., one or more of the playback devices 110, NMDs 120, and/or control devices 130) of the media playback system 100.
  • the network 104 can include, for example, a wireless network (e.g., a WI-FI network, a BLUETOOTH, a Z-WAVE network, a ZIGBEE network, and/or other suitable wireless communication protocol network) and/or a wired network (e.g., a network comprising Ethernet, Universal Serial Bus (USB), and/or another suitable wired communication).
  • a wireless network e.g., a WI-FI network, a BLUETOOTH, a Z-WAVE network, a ZIGBEE network, and/or other suitable wireless communication protocol network
  • a wired network e.g., a network comprising Ethernet, Universal Serial Bus (USB), and/or another suitable wired communication.
  • WI-FI can refer to several different communication protocols including, for example, Institute of Electrical and Electronics Engineers (IEEE) 802.11a, 802.11b, 802.11g, 802.1 In, 802.1 lac, 802.
  • GHz gigahertz
  • the network 104 comprises a dedicated communication network that the media playback system 100 uses to transmit messages between individual devices and/or to transmit media content to and from media content sources (e.g., one or more of the computing devices 106).
  • the network 104 is configured to be accessible only to devices in the media playback system 100, thereby reducing interference and competition with other household devices.
  • the network 104 comprises an existing household or commercial facility communication network (e.g., a household or commercial facility WI-FI network).
  • the links 103 and the network 104 comprise one or more of the same networks.
  • the links 103 and the network 104 comprise a telecommunication network (e.g., an LTE network, a 5G network, etc.).
  • the media playback system 100 is implemented without the network 104, and devices comprising the media playback system 100 can communicate with each other, for example, via one or more direct connections, PANs, telecommunication networks, and/or other suitable communication links.
  • the network 104 may be referred to herein as a “local communication network” to differentiate the network 104 from the cloud network 102 that couples the media playback system 100 to remote devices, such as cloud servers that host cloud services.
  • audio content sources may be regularly added or removed from the media playback system 100.
  • the media playback system 100 performs an indexing of media items when one or more media content sources are updated, added to, and/or removed from the media playback system 100.
  • the media playback system 100 can scan identifiable media items in some or all folders and/or directories accessible to the playback devices 110, and generate or update a media content database comprising metadata (e.g., title, artist, album, track length, etc.) and other associated information (e.g., URIs, URLs, etc.) for each identifiable media item found.
  • the media content database is stored on one or more of the playback devices 110, network microphone devices 120, and/or control devices 130.
  • the playback devices 1101 and 110m comprise a group 107a.
  • the playback devices 1101 and 110m can be positioned in different rooms and be grouped together in the group 107a on a temporary or permanent basis based on user input received at the control device 130a and/or another control device 130 in the media playback system 100.
  • the playback devices 1101 and 110m can be configured to play back the same or similar audio content in synchrony from one or more audio content sources.
  • the group 107a comprises a bonded zone in which the playback devices 1101 and 110m comprise left audio and right audio channels, respectively, of multi-channel audio content, thereby producing or enhancing a stereo effect of the audio content.
  • the group 107a includes additional playback devices 110.
  • the media playback system 100 omits the group 107a and/or other grouped arrangements of the playback devices 110. Additional details regarding groups and other arrangements of playback devices are described in further detail below with respect to Figures 1-1 through IM.
  • the media playback system 100 includes the NMDs 120a and 120b, each comprising one or more microphones configured to receive voice utterances from a user.
  • the NMD 120a is a standalone device and the NMD 120d is integrated into the playback device 1 lOn.
  • the NMD 120a for example, is configured to receive voice input 121 from a user 123.
  • the NMD 120a transmits data associated with the received voice input 121 to a voice assistant service (VAS) configured to (i) process the received voice input data and (ii) facilitate one or more operations on behalf of the media playback system 100.
  • VAS voice assistant service
  • the computing device 106c comprises one or more modules and/or servers of a VAS (e.g., a VAS operated by one or more of SONOS, AMAZON, GOOGLE, APPLE, MICROSOFT, etc.).
  • the computing device 106c can receive the voice input data from the NMD 120a via the network 104 and the links 103.
  • the computing device 106c In response to receiving the voice input data, the computing device 106c processes the voice input data (i.e., “Play Hey Jude by The Beatles”), and determines that the processed voice input includes a command to play a song (e.g., “Hey Jude”). In some embodiments, after processing the voice input, the computing device 106c accordingly transmits commands to the media playback system 100 to play back “Hey Jude” by the Beatles from a suitable media service (e.g., via one or more of the computing devices 106) on one or more of the playback devices 110. In other embodiments, the computing device 106c may be configured to interface with media services on behalf of the media playback system 100.
  • the computing device 106c after processing the voice input, instead of the computing device 106c transmitting commands to the media playback system 100 causing the media playback system 100 to retrieve the requested media from a suitable media service, the computing device 106c itself causes a suitable media service to provide the requested media to the media playback system 100 in accordance with the user’s voice utterance.
  • the computing device 106c instead of the computing device 106c transmitting commands to the media playback system 100 causing the media playback system 100 to retrieve the requested media from a suitable media service, the computing device 106c itself causes a suitable media service to provide the requested media to the media playback system 100 in accordance with the user’s voice utterance.
  • FIG. 1C is a block diagram of the playback device 110a comprising an input/output 111.
  • the input/output 111 can include an analog I/O I l la (e.g., one or more wires, cables, and/or other suitable communication links configured to carry analog signals) and/or a digital I/O 11 lb (e.g., one or more wires, cables, or other suitable communication links configured to carry digital signals).
  • the analog I/O I l la is an audio line-in input connection comprising, for example, an auto-detecting 3.5mm audio line-in connection.
  • the digital I/O 111b comprises a Sony/Philips Digital Interface Format (S/PDIF) communication interface and/or cable and/or a Toshiba Link (TOSLINK) cable.
  • the digital I/O 111b comprises a High-Definition Multimedia Interface (HDMI) interface and/or cable.
  • the digital I/O 111b includes one or more wireless communication links comprising, for example, a radio frequency (RF), infrared, WI-FI, BLUETOOTH, or another suitable communication link.
  • RF radio frequency
  • the analog I/O I l la and the digital I/O 111b comprise interfaces (e.g., ports, plugs, jacks, etc.) configured to receive connectors of cables transmitting analog and digital signals, respectively, without necessarily including cables.
  • interfaces e.g., ports, plugs, jacks, etc.
  • the playback device 110a can receive media content (e.g., audio content comprising music and/or other sounds) from a local audio source 105 via the input/output 111 (e.g., a cable, a wire, a PAN, a BLUETOOTH connection, an ad hoc wired or wireless communication network, and/or another suitable communication link).
  • media content e.g., audio content comprising music and/or other sounds
  • the input/output 111 e.g., a cable, a wire, a PAN, a BLUETOOTH connection, an ad hoc wired or wireless communication network, and/or another suitable communication link.
  • the local audio source 105 can comprise, for example, a mobile device (e.g., a smartphone, a tablet, a laptop computer, etc.) or another suitable audio component (e.g., a television, a desktop computer, an amplifier, a phonograph (such as an LP turntable), a Blu-ray player, a memory storing digital media files, etc.).
  • the local audio source 105 includes local music libraries on a smartphone, a computer, a networked-attached storage (NAS), and/or another suitable device configured to store media files.
  • one or more of the playback devices 110, NMDs 120, and/or control devices 130 comprise the local audio source 105.
  • the media playback system omits the local audio source 105 altogether.
  • the playback device 110a does not include an input/output 111 and receives all audio content via the network 104.
  • the playback device 110a further comprises electronics 112, a user interface 113 (e.g., one or more buttons, knobs, dials, touch-sensitive surfaces, displays, touchscreens, etc.), and one or more transducers 114 (referred to hereinafter as “the transducers 114”).
  • the electronics 112 are configured to receive audio from an audio source (e.g., the local audio source 105) via the input/output 111 or one or more of the computing devices 106a-c via the network 104 ( Figure IB), amplify the received audio, and output the amplified audio for playback via one or more of the transducers 114.
  • the playback device 110a optionally includes one or more microphones 115 (e.g., a single microphone, a plurality of microphones, a microphone array) (hereinafter referred to as “the microphones 115”).
  • the playback device 110a having one or more of the optional microphones 115 can operate as an NMD configured to receive voice input from a user and correspondingly perform one or more operations based on the received voice input.
  • the electronics 112 comprise one or more processors 112a (referred to hereinafter as “the processors 112a”), memory 112b, software components 112c, a network interface 112d, one or more audio processing components 112g (referred to hereinafter as “the audio components H2g”), one or more audio amplifiers 112h (referred to hereinafter as “the amplifiers 112h”), and power 112i (e.g., one or more power supplies, power cables, power receptacles, batteries, induction coils, Power-over Ethernet (POE) interfaces, and/or other suitable sources of electric power).
  • the electronics 112 optionally include one or more other components 112j (e.g., one or more sensors, video displays, touchscreens, battery charging bases, etc.).
  • the processors 112a can comprise clock-driven computing component(s) configured to process data
  • the memory 112b can comprise a computer-readable medium (e.g., a tangible, non-transitory computer-readable medium loaded with one or more of the software components 112c) configured to store instructions for performing various operations and/or functions.
  • the processors 112a are configured to execute the instructions stored on the memory 112b to perform one or more of the operations.
  • the operations can include, for example, causing the playback device 110a to retrieve audio data from an audio source (e.g., one or more of the computing devices 106a-c ( Figure IB)), and/or another one of the playback devices 110.
  • the operations further include causing the playback device 110a to send audio data to another one of the playback devices 110a and/or another device (e.g., one of the NMDs 120).
  • Certain embodiments include operations causing the playback device 110a to pair with another of the one or more playback devices 110 to enable a multi-channel audio environment (e.g., a stereo pair, a bonded zone, etc.).
  • the processors 112a can be further configured to perform operations causing the playback device 110a to synchronize playback of audio content with another of the one or more playback devices 110.
  • a listener will preferably be unable to perceive time-delay differences between playback of the audio content by the playback device 110a and the other one or more other playback devices 110. Additional details regarding audio playback synchronization among playback devices can be found, for example, in U.S. Patent No. 8,234,395, incorporated by reference above.
  • the memory 112b is further configured to store data associated with the playback device 110a, such as one or more zones and/or zone groups of which the playback device 110a is a member, audio sources accessible to the playback device 110a, and/or a playback queue that the playback device 110a (and/or another of the one or more playback devices) can be associated with.
  • the stored data can comprise one or more state variables that are periodically updated and used to describe a state of the playback device 110a.
  • the memory 112b can also include data associated with a state of one or more of the other devices (e.g., the playback devices 110, NMDs 120, control devices 130) of the media playback system 100.
  • the state data is shared during predetermined intervals of time (e.g., every 5 seconds, every 10 seconds, every 60 seconds, etc.) among at least a portion of the devices of the media playback system 100, so that one or more of the devices have the most recent data associated with the media playback system 100.
  • the network interface 112d is configured to facilitate a transmission of data between the playback device 110a and one or more other devices on a data network such as, for example, the links 103 and/or the network 104 ( Figure IB).
  • the network interface 112d is configured to transmit and receive data corresponding to media content (e.g., audio content, video content, text, photographs) and other signals (e.g., non-transitory signals) comprising digital packet data including an Internet Protocol (IP)-based source address and/or an IP -based destination address.
  • IP Internet Protocol
  • the network interface 112d can parse the digital packet data such that the electronics 112 properly receive and process the data destined for the playback device 110a.
  • the network interface 112d comprises one or more wireless interfaces 112e (referred to hereinafter as “the wireless interface 112e”).
  • the wireless interface 112e e.g., a suitable interface comprising one or more antennae
  • the wireless interface 112e can be configured to wirelessly communicate with one or more other devices (e.g., one or more of the other playback devices 110, NMDs 120, and/or control devices 130) that are communicatively coupled to the network 104 ( Figure IB) in accordance with a suitable wireless communication protocol (e.g., WI-FI, BLUETOOTH, LTE, etc.).
  • a suitable wireless communication protocol e.g., WI-FI, BLUETOOTH, LTE, etc.
  • the network interface 112d optionally includes a wired interface 112f (e.g., an interface or receptacle configured to receive a network cable such as an Ethernet, a USB-A, USB-C, and/or Thunderbolt cable) configured to communicate over a wired connection with other devices in accordance with a suitable wired communication protocol.
  • the network interface 112d includes the wired interface 112f and excludes the wireless interface 112e.
  • the electronics 112 exclude the network interface 112d altogether and transmit and receive media content and/or other data via another communication path (e.g., the input/output 111).
  • the audio components 112g are configured to process and/or filter data comprising media content received by the electronics 112 (e.g., via the input/output 111 and/or the network interface 112d) to produce output audio signals.
  • the audio processing components 112g comprise, for example, one or more digital-to-analog converters (DACs), audio preprocessing components, audio enhancement components, a digital signal processors (DSPs), and/or other suitable audio processing components, modules, circuits, etc.
  • DACs digital-to-analog converters
  • DSPs digital signal processors
  • one or more of the audio processing components 112g can comprise one or more subcomponents of the processors 112a.
  • the electronics 112 omit the audio processing components 112g.
  • the processors 112a execute instructions stored on the memory 112b to perform audio processing operations to produce the output audio signals.
  • the amplifiers 112h are configured to receive and amplify the audio output signals produced by the audio processing components 112g and/or the processors 112a.
  • the amplifiers 112h can comprise electronic devices and/or components configured to amplify audio signals to levels sufficient for driving one or more of the transducers 114.
  • the amplifiers 112h include one or more switching or class-D power amplifiers.
  • the amplifiers 112h include one or more other types of power amplifiers (e.g., linear gain power amplifiers, class-A amplifiers, class-B amplifiers, class- AB amplifiers, class-C amplifiers, class-D amplifiers, class-E amplifiers, class-F amplifiers, class- G amplifiers, class H amplifiers, and/or another suitable type of power amplifier).
  • the amplifiers 112h comprise a suitable combination of two or more of the foregoing types of power amplifiers.
  • individual ones of the amplifiers 112h correspond to individual ones of the transducers 114.
  • the electronics 112 include a single one of the amplifiers 112h configured to output amplified audio signals to a plurality of the transducers 114. In some other embodiments, the electronics 112 omit the amplifiers 112h.
  • the transducers 114 receive the amplified audio signals from the amplifier 112h and render or output the amplified audio signals as sound (e.g., audible sound waves having a frequency between about 20 Hertz (Hz) and 20 kilohertz (kHz)).
  • the transducers 114 can comprise a single transducer. In other embodiments, however, the transducers 114 comprise a plurality of audio transducers. In some embodiments, the transducers 114 comprise more than one type of transducer.
  • the transducers 114 can include one or more low frequency transducers (e.g., subwoofers, woofers), mid-range frequency transducers (e.g., mid-range transducers, mid-woofers), and one or more high frequency transducers (e.g., one or more tweeters).
  • low frequency can generally refer to audible frequencies below about 500 Hz
  • mid-range frequency can generally refer to audible frequencies between about 500 Hz and about 2 kHz
  • “high frequency” can generally refer to audible frequencies above 2 kHz.
  • one or more of the transducers 114 comprise transducers that do not adhere to the foregoing frequency ranges.
  • one of the transducers 114 may comprise a mid-woofer transducer configured to output sound at frequencies between about 200 Hz and about 5 kHz.
  • Sonos, Inc. presently offers (or has offered) for sale certain playback devices including, for example, a “SONOS ONE,” “PLAY:1,” “PLAY:3,” “PLAYA,” “PLAYBAR,” “PLAYBASE,” “CONNECT: AMP,” “CONNECT,” “AMP,” “PORT,” and “SUB.”
  • Other suitable playback devices may additionally or alternatively be used to implement the playback devices of example embodiments disclosed herein.
  • a playback device is not limited to the examples described herein or to SONOS product offerings.
  • one or more playback devices 110 comprise wired or wireless headphones (e.g., over-the-ear headphones, on-ear headphones, in-ear earphones, etc.).
  • one or more of the playback devices 110 comprise a docking station and/or an interface configured to interact with a docking station for personal mobile media playback devices.
  • a playback device may be integral to another device or component such as a television, an LP turntable, a lighting fixture, or some other device for indoor or outdoor use.
  • a playback device omits a user interface and/or one or more audio playback transducers. For example, FIG.
  • a playback device can comprise a device omitting an audio playback transducer that can receive audio data via one or more of a network interface and a hardware interface(s); filter, decode, and/or mix the audio data and send the resulting audio to another playback device without playing back audio itself.
  • Figure IE is a block diagram of a bonded playback device HOq comprising the playback device 110a ( Figure 1C) sonically bonded with the playback device HOi (e.g., a subwoofer) ( Figure 1 A).
  • the playback devices 110a and 1 lOi are separate ones of the playback devices 110 housed in separate enclosures.
  • the bonded playback device HOq comprises a single enclosure housing both the playback devices 110a and HOi.
  • the bonded playback device HOq can be configured to process and reproduce sound differently than an unbonded playback device (e.g., the playback device 110a of Figure 1C) and/or paired or bonded playback devices (e.g., the playback devices 1101 and 110m of Figure IB).
  • the playback device 110a is a full-range playback device configured to render low frequency, midrange frequency, and high frequency audio content
  • the playback device HOi is a subwoofer configured to render low frequency audio content.
  • the playback device 110a when bonded with the first playback device, is configured to render only the midrange and high frequency components of a particular audio content, while the playback device HOi renders the low frequency component of the particular audio content.
  • the bonded playback device HOq includes additional playback devices and/or another bonded playback device. Additional playback device embodiments are described in further detail below with respect to Figures 2A-3D. c. Suitable Network Microphone Devices (NMDs)
  • Figure IF is a block diagram of the NMD 120a ( Figures 1 A and IB).
  • the NMD 120a includes one or more voice processing components 124 (hereinafter “the voice components 124”) and several components described with respect to the playback device 110a ( Figure 1C) including the processors 112a, the memory 112b, and the microphones 115.
  • the NMD 120a optionally comprises other components also included in the playback device 110a ( Figure 1C), such as the user interface 113 and/or the transducers 114.
  • the NMD 120a is configured as a media playback device (e.g., one or more of the playback devices 110), and further includes, for example, one or more of the audio components 112g ( Figure 1C), the amplifiers 112h, and/or other playback device components.
  • the NMD 120a comprises an Internet of Things (loT) device such as, for example, a thermostat, alarm panel, fire and/or smoke detector, etc.
  • the NMD 120a comprises the microphones 115, the voice processing components 124, and only a portion of the components of the electronics 112 described above with respect to Figure 1C.
  • the NMD 120a includes the processor 112a and the memory 112b ( Figure 1C), while omitting one or more other components of the electronics 112.
  • the NMD 120a includes additional components (e.g., one or more sensors, cameras, thermometers, barometers, hygrometers, etc.).
  • an NMD can be integrated into a playback device.
  • Figure 1G is a block diagram of a playback device HOr comprising an NMD 120d.
  • the playback device 11 can comprise many or all of the components of the playback device 110a and further include the microphones 115 and voice processing components 124 ( Figure IF).
  • the playback device 1 lOr optionally includes an integrated control device 130c.
  • the control device 130c can comprise, for example, a user interface (e.g., the user interface 113 of Figure 1C) configured to receive user input (e.g., touch input, voice input, etc.) without a separate control device.
  • the playback device 11 receives commands from another control device (e.g., the control device 130a of Figure IB). Additional NMD embodiments are described in further detail below with respect to Figures 3 A-3F.
  • the microphones 115 are configured to acquire, capture, and/or receive sound from an environment (e.g., the environment 101 of Figure 1A) and/or a room in which the NMD 120a is positioned.
  • the received sound can include, for example, vocal utterances, audio played back by the NMD 120a and/or another playback device, background voices, ambient sounds, etc.
  • the microphones 115 convert the received sound into electrical signals to produce microphone data.
  • the voice processing components 124 receive and analyze the microphone data to determine whether a voice input is present in the microphone data.
  • the voice input can comprise, for example, an activation word followed by an utterance including a user request.
  • an activation word is a word or other audio cue signifying a user voice input. For instance, in querying the AMAZON VAS, a user might speak the activation word “Alexa.” Other examples include “Ok, Google” for invoking the GOOGLE VAS and “Hey, Siri” for invoking the APPLE VAS.
  • voice processing components 124 monitor the microphone data for an accompanying user request in the voice input.
  • the user request may include, for example, a command to control a third-party device, such as a thermostat (e.g., NEST thermostat), an illumination device (e.g., a PHILIPS HUE lighting device), or a media playback device (e.g., a SONOS playback device).
  • a thermostat e.g., NEST thermostat
  • an illumination device e.g., a PHILIPS HUE lighting device
  • a media playback device e.g., a SONOS playback device.
  • a user might speak the activation word “Alexa” followed by the utterance “set the thermostat to 68 degrees” to set a temperature in a home (e.g., the environment 101 of Figure 1 A).
  • the user might speak the same activation word followed by the utterance “turn on the living room” to turn on illumination devices in a living room area of the home.
  • the user may similarly speak an activation word followed by a request to play a particular song, an album, or a playlist of music on a playback device in the home. Additional description regarding receiving and processing voice input data can be found in further detail below with respect to Figures 3A-3F. d. Suitable Control Devices
  • FIG. 1H is a partial schematic diagram of the control device 130a ( Figures 1A and IB).
  • the term “control device” can be used interchangeably with “controller” or “control system.”
  • the control device 130a is configured to receive user input related to the media playback system 100 and, in response, cause one or more devices in the media playback system 100 to perform an action(s) or operation(s) corresponding to the user input.
  • the control device 130a comprises a smartphone (e.g., an iPhoneTM, an Android phone, etc.) on which media playback system controller application software is installed.
  • control device 130a comprises, for example, a tablet (e.g., an iPadTM), a computer (e.g., a laptop computer, a desktop computer, etc.), and/or another suitable device (e.g., a television, an automobile audio head unit, an loT device, etc.).
  • the control device 130a comprises a dedicated controller for the media playback system 100.
  • the control device 130a is integrated into another device in the media playback system 100 (e.g., one more of the playback devices 110, NMDs 120, and/or other suitable devices configured to communicate over a network).
  • the control device 130a includes electronics 132, a user interface 133, one or more speakers 134, and one or more microphones 135.
  • the electronics 132 comprise one or more processors 132a (referred to hereinafter as “the processors 132a”), a memory 132b, software components 132c, and a network interface 132d.
  • the processor 132a can be configured to perform functions relevant to facilitating user access, control, and configuration of the media playback system 100.
  • the memory 132b can comprise data storage that can be loaded with one or more of the software components executable by the processor 132a to perform those functions.
  • the software components 132c can comprise applications and/or other executable software configured to facilitate control of the media playback system 100.
  • the memory 112b can be configured to store, for example, the software components 132c, media playback system controller application software, and/or other data associated with the media playback system 100 and the user.
  • the network interface 132d is configured to facilitate network communications between the control device 130a and one or more other devices in the media playback system 100, and/or one or more remote devices.
  • the network interface 132d is configured to operate according to one or more suitable communication industry standards (e.g., infrared, radio, wired standards including IEEE 802.3, wireless standards including IEEE 802.11a, 802.11b, 802.11g, 802.11n, 802.11ac, 802.15, 4G, 5G, LTE, etc.).
  • suitable communication industry standards e.g., infrared, radio, wired standards including IEEE 802.3, wireless standards including IEEE 802.11a, 802.11b, 802.11g, 802.11n, 802.11ac, 802.15, 4G, 5G, LTE, etc.
  • the network interface 132d can be configured, for example, to transmit data to and/or receive data from the playback devices 110, the NMDs 120, other ones of the control devices 130, one of the computing devices 106 of Figure IB, devices comprising one or more other media playback systems, etc.
  • the transmitted and/or received data can include, for example, playback device control commands, state variables, playback zone and/or zone group configurations.
  • the network interface 132d can transmit a playback device control command (e.g., volume control, audio playback control, audio content selection, etc.) from the control device 130a to one or more of the playback devices 110.
  • a playback device control command e.g., volume control, audio playback control, audio content selection, etc.
  • the network interface 132d can also transmit and/or receive configuration changes such as, for example, adding/removing one or more playback devices 110 to/from a zone, adding/removing one or more zones to/from a zone group, forming a bonded or consolidated player, separating one or more playback devices from a bonded or consolidated player, among others. Additional description of zones and groups can be found below with respect to Figures 1-1 through IM.
  • the user interface 133 is configured to receive user input and can facilitate control of the media playback system 100.
  • the user interface 133 includes media content art 133a (e.g., album art, lyrics, videos, etc.), a playback status indicator 133b (e.g., an elapsed and/or remaining time indicator), media content information region 133c, a playback control region 133d, and a zone indicator 133e.
  • the media content information region 133c can include a display of relevant information (e.g., title, artist, album, genre, release year, etc.) about media content currently playing and/or media content in a queue or playlist.
  • the playback control region 133d can include selectable (e.g., via touch input and/or via a cursor or another suitable selector) icons to cause one or more playback devices in a selected playback zone or zone group to perform playback actions such as, for example, play or pause, fast forward, rewind, skip to next, skip to previous, enter/exit shuffle mode, enter/exit repeat mode, enter/exit cross fade mode, etc.
  • the playback control region 133d may also include selectable icons to modify equalization settings, playback volume, and/or other suitable playback actions.
  • the user interface 133 comprises a display presented on a touch screen interface of a smartphone (e.g., an iPhoneTM, an Android phone, etc.). In some embodiments, however, user interfaces of varying formats, styles, and interactive sequences may alternatively be implemented on one or more network devices to provide comparable control access to a media playback system.
  • the one or more speakers 134 can be configured to output sound to the user of the control device 130a.
  • the one or more speakers comprise individual transducers configured to correspondingly output low frequencies, mid-range frequencies, and/or high frequencies.
  • the control device 130a is configured as a playback device (e.g., one of the playback devices 110).
  • the control device 130a is configured as an NMD (e.g., one of the NMDs 120), receiving voice commands and other sounds via the one or more microphones 135.
  • the one or more microphones 135 can comprise, for example, one or more condenser microphones, electret condenser microphones, dynamic microphones, and/or other suitable types of microphones or transducers. In some embodiments, two or more of the microphones 135 are arranged to capture location information of an audio source (e.g., voice, audible sound, etc.) and/or configured to facilitate filtering of background noise. Moreover, in certain embodiments, the control device 130a is configured to operate as a playback device and an NMD. In other embodiments, however, the control device 130a omits the one or more speakers 134 and/or the one or more microphones 135.
  • an audio source e.g., voice, audible sound, etc.
  • the control device 130a is configured to operate as a playback device and an NMD. In other embodiments, however, the control device 130a omits the one or more speakers 134 and/or the one or more microphones 135.
  • control device 130a may comprise a device (e.g., a thermostat, an loT device, a network device, etc.) comprising a portion of the electronics 132 and the user interface 133 (e.g., a touch screen) without any speakers or microphones. Additional control device embodiments are described in further detail below with respect to Figures 4A-4D and 5. e. Suitable Playback Device Configurations
  • Figures II through IM show example configurations of playback devices in zones and zone groups.
  • a single playback device may belong to a zone.
  • the playback device 110g in the second bedroom 101c (FIG. 1A) may belong to Zone C.
  • multiple playback devices may be “bonded” to form a “bonded pair” which together form a single zone.
  • the playback device 1101 e.g., a left playback device
  • the playback device 110m e.g., a right playback device
  • Bonded playback devices may have different playback responsibilities (e.g., channel responsibilities).
  • multiple playback devices may be merged to form a single zone.
  • the playback device 1 lOh e.g., a front playback device
  • the playback device 1 lOi e.g., a subwoofer
  • the playback devices 1 lOj and 110k e.g., left and right surround speakers, respectively
  • the playback devices 110b and 1 lOd can be merged to form a merged group or a zone group 108b.
  • the merged playback devices 110b and HOd may not be specifically assigned different playback responsibilities. That is, the merged playback devices 110b and 1 lOd may, aside from playing audio content in synchrony, each play audio content as they would if they were not merged.
  • Zone A may be provided as a single entity named Master Bathroom.
  • Zone B may be provided as a single entity named Master Bedroom.
  • Zone C may be provided as a single entity named Second Bedroom.
  • Playback devices that are bonded may have different playback responsibilities, such as responsibilities for certain audio channels.
  • the playback devices 1101 and 110m may be bonded so as to produce or enhance a stereo effect of audio content.
  • the playback device 1101 may be configured to play a left channel audio component
  • the playback device 110m may be configured to play a right channel audio component.
  • stereo bonding may be referred to as “pairing.”
  • bonded playback devices may have additional and/or different respective speaker drivers.
  • the playback device 1 lOh named Front may be bonded with the playback device 1 lOi named SUB.
  • the Front device 1 lOh can be configured to render a range of mid to high frequencies and the SUB device HOi can be configured render low frequencies. When unbonded, however, the Front device 1 lOh can be configured render a full range of frequencies.
  • Figure IK shows the Front and SUB devices I lOh and 1 lOi further bonded with Left and Right playback devices 1 lOj and 110k, respectively.
  • the Left and Right devices HOj and 110k can be configured to form surround or “satellite” channels of a home theater system.
  • the bonded playback devices 1 lOh, 1 lOi, 1 lOj, and 110k may form a single Zone D (FIG. IM).
  • Playback devices that are merged may not have assigned playback responsibilities, and may each render the full range of audio content the respective playback device is capable of. Nevertheless, merged devices may be represented as a single UI entity (i.e., a zone, as discussed above). For instance, the playback devices 110a and HOn in the master bathroom have the single UI entity of Zone A. In one embodiment, the playback devices 110a and 1 lOn may each output the full range of audio content each respective playback devices 110a and 11 On are capable of, in synchrony.
  • an NMD is bonded or merged with another device so as to form a zone.
  • the NMD 120b may be bonded with the playback device I lOe, which together form Zone F, named Living Room.
  • a stand-alone network microphone device may be in a zone by itself. In other embodiments, however, a stand-alone network microphone device may not be associated with a zone. Additional details regarding associating network microphone devices and playback devices as designated or default devices may be found, for example, in U.S. Patent No. 10/499,146, which is hereby incorporated herein by reference in its entirety.
  • Zones of individual, bonded, and/or merged devices may be grouped to form a zone group.
  • Zone A may be grouped with Zone B to form a zone group 108a that includes the two zones.
  • Zone G may be grouped with Zone H to form the zone group 108b.
  • Zone A may be grouped with one or more other Zones C-I.
  • the Zones A-I may be grouped and ungrouped in numerous ways. For example, three, four, five, or more (e.g., all) of the Zones A-I may be grouped.
  • the zones of individual and/or bonded playback devices may play back audio in synchrony with one another, as described in previously referenced U.S. Patent No. 8,234,395. Playback devices may be dynamically grouped and ungrouped to form new or different groups that synchronously play back audio content.
  • the zones in an environment may be the default name of a zone within the group or a combination of the names of the zones within a zone group.
  • Zone Group 108b can be assigned a name such as “Dining + Kitchen”, as shown in Figure IM.
  • a zone group may be given a unique name selected by a user.
  • Certain data may be stored in a memory of a playback device (e.g., the memory 112c of Figure 1C) as one or more state variables that are periodically updated and used to describe the state of a playback zone, the playback device(s), and/or a zone group associated therewith.
  • the memory may also include the data associated with the state of the other devices of the media system, and shared from time to time among the devices so that one or more of the devices have the most recent data associated with the system.
  • the memory may store instances of various variable types associated with the states. Variable instances may be stored with identifiers (e.g., tags) corresponding to type.
  • certain identifiers may be a first type “al” to identify playback device(s) of a zone, a second type “bl” to identify playback device(s) that may be bonded in the zone, and a third type “cl” to identify a zone group to which the zone may belong.
  • identifiers associated with the second bedroom 101c may indicate that the playback device is the only playback device of the Zone C and not in a zone group.
  • Identifiers associated with the Den may indicate that the Den is not grouped with other zones but includes bonded playback devices 11 Oh- 110k.
  • Identifiers associated with the Dining Room may indicate that the Dining Room is part of the Dining + Kitchen zone group 108b and that devices 110b and 1 lOd are grouped (FIG. IL). Identifiers associated with the Kitchen may indicate the same or similar information by virtue of the Kitchen being part of the Dining + Kitchen zone group 108b. Other example zone variables and identifiers are described below.
  • the memory may store variables or identifiers representing other associations of zones and zone groups, such as identifiers associated with Areas, as shown in Figure IM.
  • An area may involve a cluster of zone groups and/or zones not within a zone group.
  • Figure IM shows an Upper Area 109a including Zones A-D and I, and a Lower Area 109b including Zones E-I.
  • an Area may be used to invoke a cluster of zone groups and/or zones that share one or more zones and/or zone groups of another cluster. In another aspect, this differs from a zone group, which does not share a zone with another zone group. Further examples of techniques for implementing Areas may be found, for example, in U.S. Patent No.
  • the media playback system 100 may not implement Areas, in which case the system may not store variables associated with Areas.
  • Figure 2A is a front isometric view of a playback device 210 configured in accordance with aspects of the disclosed technology.
  • Figure 2B is a front isometric view of the playback device 210 without a grille 216e.
  • Figure 2C is an exploded view of the playback device 210.
  • the playback device 210 comprises a housing 216 that includes an upper portion 216a, a right or first side portion 216b, a lower portion, a left or second side portion 216d, the grille 216e, and a rear portion 216f.
  • a plurality of fasteners 216g attaches a frame 216h to the housing 216.
  • a cavity 216j ( Figure 2C) in the housing 216 is configured to receive the frame 216h and electronics 212.
  • the frame 216h is configured to carry a plurality of transducers 214 (identified individually in Figure 2B as transducers 214a-f).
  • the electronics 212 e.g., the electronics 112 of Figure 1C) are configured to receive audio content from an audio source and send electrical signals corresponding to the audio content to the transducers 214 for playback.
  • the transducers 214 are configured to receive the electrical signals from the electronics 112, and further configured to convert the received electrical signals into audible sound during playback.
  • the transducers 214a-c e.g., tweeters
  • the transducers 214d-f e.g., mid-woofers, woofers, midrange speakers
  • the playback device 210 includes a number of transducers different than those illustrated in Figures 2A-2C.
  • the playback device 210 can include fewer than six transducers (e.g., one, two, three). In other embodiments, however, the playback device 210 includes more than six transducers (e.g., nine, ten). Moreover, in some embodiments, all or a portion of the transducers 214 are configured to operate as a phased array to desirably adjust (e.g., narrow or widen) a radiation pattern of the transducers 214, thereby altering a user’s perception of the sound emitted from the playback device 210.
  • a filter is axially aligned with the transducer 214b.
  • the filter can be configured to desirably attenuate a predetermined range of frequencies that the transducer 214b outputs to improve sound quality and a perceived sound stage output collectively by the transducers 214.
  • the playback device 210 omits the filter.
  • the playback device 210 includes one or more additional filters aligned with the transducers 214b and/or at least another of the transducers 214.
  • Figures 3A and 3B are front and right isometric side views, respectively, of an NMD 320 configured in accordance with embodiments of the disclosed technology.
  • Figure 3C is an exploded view of the NMD 320.
  • Figure 3D is an enlarged view of a portion of Figure 3B including a user interface 313 of the NMD 320.
  • the NMD 320 includes a housing 316 comprising an upper portion 316a, a lower portion 316b and an intermediate portion 316c (e.g., a grille).
  • a plurality of ports, holes or apertures 316d in the upper portion 316a allow sound to pass through to one or more microphones 315 ( Figure 3C) positioned within the housing 316.
  • the one or more microphones 315 are configured to receive sound via the apertures 316d and produce electrical signals based on the received sound.
  • a frame 316e ( Figure 3C) of the housing 316 surrounds cavities 316f and 316g configured to house, respectively, a first transducer 314a (e.g., a tweeter) and a second transducer 314b (e.g., a mid-woofer, a midrange speaker, a woofer).
  • the NMD 320 includes a single transducer, or more than two (e.g., two, five, six) transducers. In certain embodiments, the NMD 320 omits the transducers 314a and 314b altogether.
  • Electronics 312 (Figure 3C) includes components configured to drive the transducers 314a and 314b, and further configured to analyze audio data corresponding to the electrical signals produced by the one or more microphones 315.
  • the electronics 312 comprises many or all of the components of the electronics 112 described above with respect to Figure 1C.
  • the electronics 312 includes components described above with respect to Figure IF such as, for example, the one or more processors 112a, the memory 112b, the software components 112c, the network interface 112d, etc.
  • the electronics 312 includes additional suitable components (e.g., proximity or other sensors).
  • the user interface 313 includes a plurality of control surfaces (e.g., buttons, knobs, capacitive surfaces) including a first control surface 313a (e.g., a previous control), a second control surface 313b (e.g., a next control), and a third control surface 313c (e.g., a play and/or pause control) that can be adjusted by a user 323.
  • a fourth control surface 313d is configured to receive touch input corresponding to activation and deactivation of the one or microphones 315.
  • a first indicator 313e e.g., one or more light emitting diodes (LEDs) or another suitable illuminator
  • a second indicator 313f e.g., one or more LEDs
  • the user interface 313 includes additional or fewer control surfaces and illuminators.
  • the user interface 313 includes the first indicator 313e, omitting the second indicator 313f
  • the NMD 320 comprises a playback device and a control device
  • the user interface 313 comprises the user interface of the control device.
  • the NMD 320 is configured to receive voice commands from one or more adj acent users via the one or more microphones 315.
  • the one or more microphones 315 can acquire, capture, or record sound in a vicinity (e.g., a region within 10m or less of the NMD 320) and transmit electrical signals corresponding to the recorded sound to the electronics 312.
  • the electronics 312 can process the electrical signals and can analyze the resulting audio data to determine a presence of one or more voice commands (e.g., one or more activation words).
  • the NMD 320 is configured to transmit a portion of the recorded audio data to another device and/or a remote server (e.g., one or more of the computing devices 106 of Figure IB) for further analysis.
  • the remote server can analyze the audio data, determine an appropriate action based on the voice command, and transmit a message to the NMD 320 to perform the appropriate action.
  • a user may speak “Sonos, play Michael Jackson.”
  • the NMD 320 can, via the one or more microphones 315, record the user’s voice utterance, determine the presence of a voice command, and transmit the audio data having the voice command to a remote server (e.g., one or more of the remote computing devices 106 of Figure IB, one or more servers of a VAS and/or another suitable service).
  • the remote server can analyze the audio data and determine an action corresponding to the command.
  • the remote server can then transmit a command to the NMD 320 to perform the determined action (e.g., play back audio content related to Michael Jackson).
  • the NMD 320 can receive the command and play back the audio content related to Michael Jackson from a media content source.
  • suitable content sources can include a device or storage communicatively coupled to the NMD 320 via a LAN (e.g., the network 104 of Figure IB), a remote server (e.g., one or more of the remote computing devices 106 of Figure IB), etc.
  • a LAN e.g., the network 104 of Figure IB
  • a remote server e.g., one or more of the remote computing devices 106 of Figure IB
  • the NMD 320 determines and/or performs one or more actions corresponding to the one or more voice commands without intervention or involvement of an external device, computer, or server.
  • FIG. 3E is a functional block diagram showing additional features of the NMD 320 in accordance with aspects of the disclosure.
  • the NMD 320 includes components configured to facilitate voice command capture including voice activity detector component(s) 312k, beam former components 3121, acoustic echo cancellation (AEC) and/or self-sound suppression components 312m, activation word detector components 312n, and voice/speech conversion components 312o (e.g., voice-to-text and text-to-voice).
  • voice activity detector component(s) 312k the beam former components 3121
  • AEC acoustic echo cancellation
  • self-sound suppression components 312m activation word detector components 312n
  • voice/speech conversion components 312o e.g., voice-to-text and text-to-voice
  • the foregoing components 312k-312o are shown as separate components. In some embodiments, however, one or more of the components 312k-312o are subcomponents of the processors 112a.
  • the beamforming and self-sound suppression components 3121 and 312m are configured to detect an audio signal and determine aspects of voice input represented in the detected audio signal, such as the direction, amplitude, frequency spectrum, etc.
  • the voice activity detector activity components 312k are operably coupled with the beamforming and AEC components 3121 and 312m and are configured to determine a direction and/or directions from which voice activity is likely to have occurred in the detected audio signal.
  • Potential speech directions can be identified by monitoring metrics which distinguish speech from other sounds. Such metrics can include, for example, energy within the speech band relative to background noise and entropy within the speech band, which is measure of spectral structure. As those of ordinary skill in the art will appreciate, speech typically has a lower entropy than most common background noise.
  • the activation word detector components 312n are configured to monitor and analyze received audio to determine if any activation words (e.g., wake words) are present in the received audio.
  • the activation word detector components 312n may analyze the received audio using an activation word detection algorithm. If the activation word detector 312n detects an activation word, the NMD 320 may process voice input contained in the received audio.
  • Example activation word detection algorithms accept audio as input and provide an indication of whether an activation word is present in the audio.
  • Many first- and third-party activation word detection algorithms are known and commercially available. For instance, operators of a voice service may make their algorithm available for use in third-party devices. Alternatively, an algorithm may be trained to detect certain activation words.
  • the activation word detector 312n runs multiple activation word detection algorithms on the received audio simultaneously (or substantially simultaneously).
  • different voice services e g. AMAZON’S ALEXA, APPLE’S SIRI, or MICROSOFT’S CORT ANA
  • the activation word detector 312n may run the received audio through the activation word detection algorithm for each supported voice service in parallel.
  • the speech/text conversion components 312o may facilitate processing by converting speech in the voice input to text.
  • the electronics 312 can include voice recognition software that is trained to a particular user or a particular set of users associated with a household. Such voice recognition software may implement voice-processing algorithms that are tuned to specific voice profile(s). Tuning to specific voice profiles may require less computationally intensive algorithms than traditional voice activity services, which typically sample from a broad base of users and diverse requests that are not targeted to media playback systems.
  • Figure 3F is a schematic diagram of an example voice input 328 captured by the NMD 320 in accordance with aspects of the disclosure.
  • the voice input 328 can include an activation word portion 328a and a voice utterance portion 328b.
  • the activation word 328a can be a known activation word, such as “Alexa,” which is associated with AMAZON’S ALEXA. In other embodiments, however, the voice input 328 may not include an activation word.
  • a network microphone device may output an audible and/or visible response upon detection of the activation word portion 328a. In addition or alternately, an NMD may output an audible and/or visible response after processing a voice input and/or a series of voice inputs.
  • the voice utterance portion 328b may include, for example, one or more spoken commands (identified individually as a first command 328c and a second command 328e) and one or more spoken keywords (identified individually as a first keyword 328d and a second keyword 328f) .
  • the first command 328c can be a command to play music, such as a specific song, album, playlist, etc.
  • the keywords may be one or words identifying one or more zones in which the music is to be played, such as the Living Room and the Dining Room shown in Figure 1 A.
  • the voice utterance portion 328b can include other information, such as detected pauses (e.g., periods of non-speech) between words spoken by a user, as shown in Figure 3F.
  • the pauses may demarcate the locations of separate commands, keywords, or other information spoke by the user within the voice utterance portion 328b.
  • the media playback system 100 is configured to temporarily reduce the volume of audio content that it is playing while detecting the activation word portion 328a.
  • the media playback system 100 may restore the volume after processing the voice input 328, as shown in Figure 3F.
  • Such a process can be referred to as ducking, examples of which are disclosed in U.S. Patent No. 10,499,146, incorporated by reference above.
  • FIGS 4A-4D are schematic diagrams of a control device 430 (e.g., the control device 130a of Figure 1H, a smartphone, a tablet, a dedicated control device, an loT device, and/or another suitable device) showing corresponding user interface displays in various states of operation.
  • a first user interface display 431a ( Figure 4A) includes a display name 433a (i.e., “Rooms”).
  • a selected group region 433b displays audio content information (e.g., artist name, track name, album art) of audio content played back in the selected group and/or zone.
  • Group regions 433c and 433d display corresponding group and/or zone name, and audio content information audio content played back or next in a playback queue of the respective group or zone.
  • An audio content region 433e includes information related to audio content in the selected group and/or zone (i.e., the group and/or zone indicated in the selected group region 433b).
  • a lower display region 433f is configured to receive touch input to display one or more other user interface displays.
  • the control device 430 can be configured to output a second user interface display 43 lb ( Figure 4B) comprising a plurality of music services 433g (e.g., Spotify, Radio by Tunein, Apple Music, Pandora, Amazon, TV, local music, line-in) through which the user can browse and from which the user can select media content for play back via one or more playback devices (e.g., one of the playback devices 110 of Figure 1A).
  • a user interface display 43 lb Figure 4B
  • the control device 430 can be configured to output a third user interface display 431c ( Figure 4C).
  • a first media content region 433h can include graphical representations (e.g., album art) corresponding to individual albums, stations, or playlists.
  • a second media content region 433i can include graphical representations (e.g., album art) corresponding to individual songs, tracks, or other media content.
  • the control device 430 can be configured to begin play back of audio content corresponding to the graphical representation 433j and output a fourth user interface display 43 Id that includes an enlarged version of the graphical representation 433j , media content information 433k (e.g., track name, artist, album), transport controls 433m (e.g., play, previous, next, pause, volume), and indication 433n of the currently selected group and/or zone name.
  • media content information 433k e.g., track name, artist, album
  • transport controls 433m e.g., play, previous, next, pause, volume
  • indication 433n of the currently selected group and/or zone name e.g., current, next, pause, volume
  • FIG. 5 is a schematic diagram of a control device 530 (e.g., a laptop computer, a desktop computer).
  • the control device 530 includes transducers 534, a microphone 535, and a camera 536.
  • a user interface 531 includes a transport control region 533a, a playback status region 533c, a playback zone region 533b, a playback queue region 533d, and a media content source region 533e.
  • the transport control region comprises one or more controls for controlling media playback including, for example, volume, previous, play/pause, next, repeat, shuffle, track position, crossfade, equalization, etc.
  • the audio content source region 533e includes a listing of one or more media content sources from which a user can select media items for play back and/or adding to a playback queue.
  • the playback zone region 533b can include representations of playback zones within the media playback system 100 ( Figures 1A and IB).
  • the graphical representations of playback zones may be selectable to bring up additional selectable icons to manage or configure the playback zones in the media playback system, such as a creation of bonded zones, creation of zone groups, separation of zone groups, renaming of zone groups, etc.
  • a “group” icon is provided within each of the graphical representations of playback zones.
  • the “group” icon provided within a graphical representation of a particular zone may be selectable to bring up options to select one or more other zones in the media playback system to be grouped with the particular zone.
  • playback devices in the zones that have been grouped with the particular zone can be configured to play audio content in synchrony with the playback device(s) in the particular zone.
  • a “group” icon may be provided within a graphical representation of a zone group.
  • the “group” icon may be selectable to bring up options to deselect one or more zones in the zone group to be removed from the zone group.
  • the control device 530 includes other interactions and implementations for grouping and ungrouping zones via the user interface 531.
  • the representations of playback zones in the playback zone region 533b can be dynamically updated as playback zone or zone group configurations are modified.
  • the playback status region 533c includes graphical representations of audio content that is presently being played, previously played, or scheduled to play next in the selected playback zone or zone group.
  • the selected playback zone or zone group may be visually distinguished on the user interface, such as within the playback zone region 533b and/or the playback queue region 533d.
  • the graphical representations may include track title, artist name, album name, album year, track length, and other relevant information that may be useful for the user to know when controlling the media playback system 100 via the user interface 531.
  • the playback queue region 533d includes graphical representations of audio content in a playback queue associated with the selected playback zone or zone group.
  • each playback zone or zone group may be associated with a playback queue containing information corresponding to zero or more audio items for playback by the playback zone or zone group.
  • each audio item in the playback queue may comprise a uniform resource identifier (URI), a uniform resource locator (URL) or some other identifier that may be used by a playback device in the playback zone or zone group to find and/or retrieve the audio item from a local audio content source or a networked audio content source, possibly for playback by the playback device.
  • URI uniform resource identifier
  • URL uniform resource locator
  • a playlist can be added to a playback queue, in which information corresponding to each audio item in the playlist may be added to the playback queue.
  • audio items in a playback queue may be saved as a playlist.
  • a playback queue may be empty, or populated but “not in use” when the playback zone or zone group is playing continuously streaming audio content, such as Internet radio that may continue to play until otherwise stopped, rather than discrete audio items that have playback durations.
  • a playback queue can include Internet radio and/or other streaming audio content items and be “in use” when the playback zone or zone group is playing those items.
  • playback queues associated with the affected playback zones or zone groups may be cleared or re-associated. For example, if a first playback zone including a first playback queue is grouped with a second playback zone including a second playback queue, the established zone group may have an associated playback queue that is initially empty, that contains audio items from the first playback queue (such as if the second playback zone was added to the first playback zone), that contains audio items from the second playback queue (such as if the first playback zone was added to the second playback zone), or a combination of audio items from both the first and second playback queues.
  • the resulting first playback zone may be re-associated with the previous first playback queue, or be associated with a new playback queue that is empty or contains audio items from the playback queue associated with the established zone group before the established zone group was ungrouped.
  • the resulting second playback zone may be re-associated with the previous second playback queue, or be associated with a new playback queue that is empty, or contains audio items from the playback queue associated with the established zone group before the established zone group was ungrouped.
  • Figure 6 is a message flow diagram illustrating data exchanges between devices of the media playback system 100 ( Figures 1A-1M).
  • the media playback system 100 receives an indication of selected media content (e.g., one or more songs, albums, playlists, podcasts, videos, stations) via the control device 130a.
  • the selected media content can comprise, for example, media items stored locally on one or more devices (e.g., the audio source 105 of Figure 1C) connected to the media playback system and/or media items stored on one or more media service servers (one or more of the remote computing devices 106 of Figure IB).
  • the control device 130a transmits a message 651a to the playback device 110a ( Figures 1A-1C) to add the selected media content to a playback queue on the playback device 110a.
  • the playback device 110a receives the message 65 la and adds the selected media content to the playback queue for play back.
  • the control device 130a receives input corresponding to a command to play back the selected media content.
  • the control device 130a transmits a message 651b to the playback device 110a causing the playback device 110a to play back the selected media content.
  • the playback device 110a transmits a message 651c to the computing device 106a requesting the selected media content.
  • the computing device 106a in response to receiving the message 651c, transmits a message 65 Id comprising data (e.g., audio data, video data, a URL, a URI) corresponding to the requested media content.
  • the playback device 110a receives the message 65 Id with the data corresponding to the requested media content and plays back the associated media content.
  • the playback device 110a optionally causes one or more other devices to play back the selected media content.
  • the playback device 110a is one of a bonded zone of two or more players ( Figure IM).
  • the playback device 110a can receive the selected media content and transmit all or a portion of the media content to other devices in the bonded zone.
  • the playback device 110a is a coordinator of a group and is configured to transmit and receive timing information from one or more other devices in the group.
  • the other one or more devices in the group can receive the selected media content from the computing device 106a, and begin playback of the selected media content in response to a message from the playback device 110a such that all of the devices in the group play back the selected media content in synchrony.
  • aspects and embodiments are directed to playback devices, or other devices in the media playback system 100, that are equipped with a dedicated auxiliary high frequency transducer to enable functionality associated with the acoustic transmission and reception of reference audio signals for a variety of purposes and applications.
  • a playback device 710 which may be any of the playback devices 110 or NMDs 120 discussed above.
  • the playback device 710 is a subwoofer, such as the playback device HOi discussed above.
  • the playback device 710 includes a housing 716 that houses various electronic components of the playback device, as discussed above with reference to Figures 2A-C.
  • the housing 716 defines an interior cavity 716a that includes a plurality of interior sides 716b; however, in other examples, the housing 716 may have a different shape and/or configuration.
  • the playback device 710 includes one or more acoustic sensors or microphones 715 configured to detect acoustic signals, as discussed further below.
  • the playback device 710 includes four microphones 715 positioned on an upper surface 716c of the housing 716; however, in other examples, the playback device 710 may include more than or fewer than four microphones, which may be positioned on any one or more surfaces of the housing 716.
  • the playback device 710 may include numerous components and functionality, as discussed above with reference to Figures 2A-3F, that are not shown in Figure 7.
  • the playback device 710 includes one or more low frequency audio transducers 114 that are coupled to audio playback circuitry to allow the playback device 710 to play back one or more channels of audio content, as discussed above.
  • the playback device 710 does not include a high frequency audio transducer (e.g., a tweeter) coupled to the audio playback circuitry.
  • the playback device 710 includes one or more auxiliary transducers 700.
  • the auxiliary transducer(s) 700 are not used for playback of audio content and as such are not coupled to the audio playback circuitry.
  • the auxiliary transducer(s) 700 can be coupled to circuitry that allows the auxiliary transducer(s) 700 to be configured to output/transmit acoustic signals that include a reference audio signal that can be used for presence detection, room detection, distance determination, and other purposes.
  • the auxiliary transducer(s) 700 are piezoelectric transducers or MEMS transducers.
  • the auxiliary transducer(s) 700 may be placed along any one or more surfaces of the housing 716 of the playback device 710.
  • the playback device 710 may be a multiple orientation device (e.g., one that can be placed with different surfaces of the housing 716 on the floor or another supporting platform, such as a shelf or table, for example, and faced in different directions).
  • the surface of the housing 716 carrying the auxiliary transducer 700 may be blocked because that surface is placed on the supporting surface or against an obstacle, such as a wall or piece of furniture.
  • the playback device 710 is provided with multiple auxiliary transducers 700 that are arranged along two or more surfaces of the housing 716.
  • one or more auxiliary transducers 700 are placed along one or more sides 716b of the internal cavity 716a, as illustrated in FIG. 7.
  • the playback device 710 can include any number of auxiliary transducers 700 in any of numerous arrangements along any one or more surfaces of the housing 716.
  • the playback device 710 is configured to, using the auxiliary transducer(s) 700, output one or more acoustic signals, such as acoustic chirp signals.
  • acoustic chirp is intended to refer to a high frequency (e.g., above 18 kHz) acoustic signal that can be transmitted by the auxiliary transducers 700, or by other high frequency audio transducers (e.g., tweeters) in other playback devices, and which can be detected by the microphone(s) 715.
  • the acoustic signals are ultrasonic signals (e.g., greater than 20 kHz) or near-ultrasonic (e.g., 18-20 kHz). Acoustic signals in such frequencies may avoid propagation of the acoustic signal outside the proximity of the emitting playback device. Further, such signals may also avoid distracting users, since these frequency ranges are above the normal audible range for most people.
  • FIG. 8 is a block diagram of circuitry showing some components of an example of the playback device 710.
  • the circuitry 800 includes an audio digital signal processing module 802 that provides signal paths for both playback of audio content and transmission of the acoustic chirp signals.
  • the circuitry 800 includes an audio source 804 that provides a playback signal including one or more channels of audio content to be played by the playback device 710 using one or more audio transducer(s) 814 (e.g., a speaker).
  • the audio source 804 represents I/O circuitry and electronics configured to receive audio signals, via a wired or wireless connection, that contain the one or more channels of audio content.
  • the audio source 804 provides the playback signal 806, via front-end circuitry represented by mixer 808, to a playback digital signal processing module 812 that is part of the audio digital signal processing module 802.
  • the playback digital signal processing module 812 performs any necessary decoding or other signal processing to prepare the playback signal for transmission by the audio transducer(s) 814.
  • the processed playback signal is provided, via baseband electronics represented by mixer 818, to a digital -to-analog converter (DAC) 822.
  • the DAC 822 converts the playback signal to an analog signal, which is then amplified by an amplifier 824 and provided to the audio transducer(s) 814 for acoustic transmission.
  • the audio playback circuitry discussed above that allows the playback device 710 to play back one or more channels of audio content can include the audio source 804, the playback digital signal processing module 812, the DAC 822, and the amplifier 824.
  • the audio playback circuitry may also include playback-path components included in the mixers 808 and 818.
  • the circuitry 800 includes a waveform, signal, and/or chirp generator 826 that produces a reference audio signal 828.
  • the reference audio signal 828 can include a single tone or a plurality of tones that may encode unique identifying information corresponding to the playback device 710.
  • the reference audio signal 828 is provided, via chirp-path components in the electronics represented by the mixer 808, to a chirp amplifier 832 in the chirp signal path of the audio digital signal processing module 802.
  • the reference audio signal 828 is passed, via chirp-path components in the electronics represented by the mixer 818, to a DAC 834 where it is converted to an analog acoustic signal (e.g., an ultrasonic signal as discussed above).
  • an analog acoustic signal e.g., an ultrasonic signal as discussed above.
  • the resulting acoustic chirp signal is amplified by an amplifier 836 and provided to the auxiliary transducer 700 for acoustic transmission, as discussed above.
  • the auxiliary transducer 700 is configured to emit the acoustic chirp signals and is coupled to electronics and components in the playback device 710 that enable that functionality. However, the auxiliary transducer 700 is decoupled from the audio playback circuitry. By providing the two signal paths and the dedicated auxiliary transducer 700 separate from the audio transducer(s) 814 used for playing audio content, in examples, the playback device 710 can be configured to emit the acoustic chirp signals during playback of one or more channels of audio content.
  • the acoustic chirp signals can be in a frequency range that is generally inaudible to human ears, and therefore output of the acoustic chirp signals may not interfere with a user’s listening experience. In this manner, various functionality associated with the use of the acoustic chirp signals can be performed during a playback session, without disrupting the session or distracting users.
  • transmission and reception of identifiable acoustic chirp signals are used in techniques for identifying the presence of nearby playback devices.
  • an initiating playback device 110 also referred to herein as a receiving device, requests that other playback devices in the media playback system 100 emit an identifiable acoustic chirp signal, such that the receiving device can identify nearby playback devices based on the characteristics of the detected acoustic chirp signal(s).
  • Figure 9 is a schematic diagram illustrating an audio-based identification technique using acoustic chirps that include a reference audio signal.
  • the media playback system includes a receiving playback device 110a, along with other playback devices 110b, 110c and the playback device 710.
  • Each playback device 710, 110b, 110c emits a respective acoustic chirp signal 902a, 902b, 902c, that includes a respective reference audio signal for that playback device.
  • the reference audio signals differ for each playback device.
  • the reference audio signals may be unique to each playback device in existence or may be unique to each playback device within a playback system.
  • Each reference audio signal can be represented by a time-frequency representation having identifiable acoustic characteristics or patterns (such as one or more tones of particular frequencies or symbols) over time.
  • a time-frequency representation indicates how the constituent frequencies of the reference audio signal vary over time.
  • a timefrequency representation is therefore a view of an audio signal represented over both time and frequency.
  • the time-frequency representation of the reference audio signal can be unique to each playback device.
  • the reference audio signal may include an identifier or a code for the playback device emitting the reference audio signal.
  • Each encoded identifier may be different and encoded as a set of tones, for example.
  • Each tone or symbol can be in the form of a pulse where the tone has a duration, envelope length, and a guard interval.
  • the duration of a particular tone can be the time between the beginning and end of the pulse (e.g., 5-15 milliseconds), and the envelope length can be the length of time that pulse takes to reach maximum magnitude from zero (e.g., 1-10 milliseconds).
  • the guard interval i.e. an interval of time
  • the identifier of the playback device may be mapped to a pseudorandom code in a code division multiple access (CDMA) modulation scheme.
  • CDMA code division multiple access
  • at least part of the reference audio signals is generated via a sequence of operations executed by a processor, such as a sequence that invokes a pseudorandom generator.
  • the reference audio signals may be based on Gold codes, Walsh/Hadamard codes, etc.
  • the reference audio signals are generated using a linear feedback shift register.
  • at least part of the reference audio signals is manually generated.
  • the reference audio signals maybe be configured to have low or minimal cross-correlation with other reference audio signals. Hamming correlation may be used to compare reference audio signals to determine their similarity or cross-correlation, in certain examples.
  • the receiving device 110a can identify the particular playback device 110b, 110c, 710 as the source of the reference audio signal. In examples, to identify the closest/nearest playback device, the receiving device 110a may compare the detected acoustic chirp signals.
  • the receiving device 110a may compare various metrics such as sound pressure levels and/or signal-to-noise ratios of the detected acoustic signals to identify the “loudest” acoustic signal (e.g., based on detected sound pressure level), which may be assumed to have been emitted by the playback device that is physically nearest to the receiving device 110a.
  • the receiving device 110a may list or otherwise rank playback devices by relative signal strength (e.g., SNR).
  • the playback devices 110b, 110c, 710 can be configured to emit the acoustic chirp signals at the same or substantially the same volume level.
  • the instructions to emit the acoustic chirp signals include instructions to change to a certain volume level (e.g., decibel, volume level setting). Since different playback devices have different types of transducers and/or amplifiers, the volume level for each playback device emitting the signal may vary based on the type of device. Alternatively, the playback devices may be preconfigured to emit audio signals at the certain volume level taking into account these differences.
  • a certain volume level e.g., decibel, volume level setting
  • the devices can be configured to employ frequency division multiplexing or other techniques to separate the acoustic chirp signals.
  • the playback device 710 may employ acoustic chirp signals, using the auxiliary transducer(s) 700 and microphone(s) 715, to determine one or more dimensions of a room 1001 in which the playback device 710 is located.
  • providing a sub-woofer with an auxiliary transducer 700 and microphone 715 may offer some benefits over other types of playback devices with respect to determining room dimensions. For example, in most configurations of a media playback system, a sub-woofer is placed in a default on the floor 101 la of the room 1001 in a default orientation.
  • the height(s) of the auxiliary transducer(s) 700 and microphone(s) 715 are known (based on known dimensions of the playback device 710 and placement of the auxiliary transducer(s) 700 and microphone(s) 715 within the playback device 710), which can offer advantages when determining distance to walls 101 lb of the room 1001 and/or to other playback devices within the room 1001.
  • the playback device 710 is a sub-woofer located on the floor 1011a of the room 1001
  • the playback device 710 can be used to determine the ceiling height in the room 1001, which can be relevant information used for configuring various playback characteristics (e.g., playback delays of surround and/or height channels) of the media playback system 100.
  • the playback device 710 can emit an acoustic chirp signal 902 towards the ceiling 1011c using one or more auxiliary transducers 700, and detect, using one or more microphones 715, a reflection of the acoustic chirp signal from the ceiling 1011c. Using time-of-flight or other calculations, the playback device 710 may thus determine the distance between the playback device 710 and the ceiling 1011c, and therefore the height of the ceiling 1011c relative to the floor 911a.
  • knowing the absolute distance between playback devices can facilitate tuning of many different player setups, including, for example, stereo pairs, home theater (HT) setups, and non-HT bonded zones with a sub-woofer and one or more other playback devices. Allowing this distance calculation to be made automatically may eliminate the need for a coarse setting in the HT configuration setup, in which a user indicates an approximate range of distance (e.g., less than 2 feet, 2 - 10 feet, or more than 10 feet, for example) between certain devices, such as between the HT primary device (e.g., a soundbar) and one or more satellite devices.
  • the HT primary device e.g., a soundbar
  • PCT/US2022/077233 (which is hereby incorporated herein by reference), for example, describes using automatic distance calculation between two devices using a speed-of-light modality (e.g., ultrawideband) and/or a speed-of-sound modality. While ultrawideband (UWB) is a suitable tool for this application, the approach uses additional hardware installed on both devices. Acoustic chirp technology, on the other hand, does not require additional hardware on devices that are equipped with a tweeter or a dedicated auxiliary transducer 700 as described above.
  • a speed-of-light modality e.g., ultrawideband
  • UWB ultrawideband
  • Acoustic chirp technology does not require additional hardware on devices that are equipped with a tweeter or a dedicated auxiliary transducer 700 as described above.
  • the WI-FI clock may not be sufficiently synchronized between the two devices to accurately calculate a signal transit time between emission at one device and detection at the other. For example, every 1 millisecond that the timing on these two devices is out of sync may translate to an error in the distance calculation of about 30 centimeters.
  • aspects and embodiments provide a method for absolute distance determination between two playback devices 1110a, 1110b that does not rely on having a synchronized clock between the two devices.
  • Each of the two playback devices 1110a, 1110b is equipped with at least one acoustic chirp-capable transmitter 1114 and at least one microphone 1115.
  • the acoustic chirp-capable transmitter 1114 may be a high frequency audio transducer, such as a tweeter or the auxiliary transducer 700 discussed above.
  • the two playback devices 1110a, 1110b are capable of transmitting and detecting acoustic chirp signals 1102a, 1102b, (collecting 1102).
  • the playback devices 1110a, 1110b may be any of the playback devices 110, NMDs 120, or playback device 710 discussed above.
  • Figure 12 is a sequence diagram illustrating an example of a distance determination process that can be performed by the playback devices 1110a, 1110b to determine the distance between the two devices.
  • the first playback device 1110a transmits an acoustic chirp signal 1102a to the second playback device 1110b.
  • the second playback device 1110b transmits an acoustic chirp signal 1102b to the first playback device 1110a.
  • the two playback devices 1110a, 1110b transmit their respective acoustic chirp signals 1102 at what they believe is the same time according to their respective internal clocks.
  • the clocks of the two playback devices 1110a, 1110b do not need to be synchronized. Any latency or error between the two clocks is removed in the distance calculation, as discussed further below.
  • the playback devices can be configured to use slightly different frequency bands or other techniques to separate the signals, as discussed above.
  • the microphones 1115 on the playback devices 1110a, 1110b are configured to be listening for the acoustic chirp signal 1102 emitted by the other playback device.
  • the second playback device 1110b detects the acoustic chirp signal 1102a emitted by the first playback device 1110a.
  • the first playback device 1110a detects the acoustic chirp signal 1102b emitted by the second playback device 1110b.
  • the time of flight of the acoustic chirp signal 1102a from the first playback device 1110a to the second playback device 1110b is the same as the time of flight of the acoustic chirp signal 1102b from the second playback device 1110b to the first playback device 1110a.
  • each playback device 1110a, 1110b calculates the time between when it emitted its own acoustic chirp signal 1102 and when it detected the acoustic chirp signal from the other playback device. Thus, each playback device 1110a, 1110b calculates an estimated time of flight of the acoustic chirp signals 1102 between the two playback devices 1110a, 1110b. It will be appreciated that the two playback devices 1110a, 1110b do not necessarily perform their respective calculations at 1210 at the same time (due to differences in their internal clocks, for example). The two playback devices 1110a, 1110b exchange their time of flight estimations at 1212.
  • one or both playback devices 1110a, 1110b can then compare their own timing calculation (obtained at 1210) with the timing calculation received from the other playback device at 1212 to compute the distance between the two playback devices 1110a, 1110b.
  • the actual time of flight of the acoustic chirp signals 1102 between the two playback devices 1110a, 1110b is the mean of the two calculated times (the time of flight calculated by the first playback device 1110a and the time of flight calculated by the second playback device 1110b). Because it is known that the two calculated times of flight should be equal, averaging the two removes the latency or offset difference introduced by the clocks on the individual playback devices not being synchronized.
  • the distance between the two playback devices 1110a, 1110b can be calculated based on the speed of sound in air and the actual time of flight of the acoustic chirp signals 1102 between the two playback devices 1110a, 1110b.
  • both playback devices 1110a, 1110b are shown performing the distance calculation at 1214; however, in other examples only one playback device may perform the distance calculation.
  • both playback devices 1110a, 1110b can transmit their time of flight calculations to a third device in the media playback system 100 (e.g., another playback device 110, NMD 120, or controller device 130) that then calculates the distance between the two playback devices 1110a, 1110b as described above.
  • a third device in the media playback system 100 e.g., another playback device 110, NMD 120, or controller device 130
  • the true distance between the playback devices 1110a, 1110b can be determined without error introduced by lack of synchronization between the device clocks.
  • aspects and embodiments provide playback devices that can be equipped with dedicated auxiliary acoustic chirp-capable transducers, and methods using acoustic chirp signals that can be employed by these and other chirp-capable playback devices.
  • the dedicated auxiliary transducers 700 are MEMS transducers or piezoelectric transducers, rather than typical high-frequency audio transducers (e.g., tweeters) used in playback devices for the playback of audio content.
  • Using piezoelectric or MEMS transducers for the auxiliary transducers 700 may offer several benefits and advantages. For example, these transducers may be smaller and/or cheaper than typical tweeters.
  • auxiliary transducers 700 For transmission of the acoustic chirp signals, it may not be necessary for the auxiliary transducers 700 to provide the same audio quality and/or range of output frequencies that may be desirable from a tweeter used for playing audio content. Accordingly, the use of a simpler, smaller, potentially less expensive auxiliary transducer can accomplish the desired functionality without consuming significant space in the playback device 710 and/or adding significant cost.
  • the use of piezoelectric transducers may provide advantages in that typically tweeters are highly directional, whereas piezoelectric transducers tend to be more omnidirectional. For example, this can allow presence detection and distance determination to be performed without significant constraints on the orientations or relative positions of the playback devices.
  • audio playback devices that have one or more audio transducers configured for audible sound output and one or more auxiliary transducers configured for audio signal transmission.
  • the techniques described herein can be applied to many other types of devices, including devices that do not otherwise output sound directly (i.e., lack an audio transducer specifically tailored for audible sound output) or even devices that lack an audio generation or playback capability.
  • Example devices that may benefit from these techniques include electronic devices such as audio devices (e.g., amplifiers), networking devices (e.g., routers, signal replicators, etc.), internet-of-things (loT) and/or other smart devices (lamps, printers, thermostats, etc.), televisions or other display devices (e.g., projectors), streaming media devices, set-top boxes, gaming devices, computing devices (e.g., servers, mobile devices such as smartphones, tablets, etc.), or other suitable devices not listed here.
  • audio devices e.g., amplifiers
  • networking devices e.g., routers, signal replicators, etc.
  • IoT internet-of-things
  • other smart devices lamps, printers, thermostats, etc.
  • televisions or other display devices e.g., projectors
  • streaming media devices set-top boxes
  • gaming devices e.g., computing devices (e.g., servers, mobile devices such as smartphones, tablets, etc.), or other suitable devices not listed here.
  • Figure 13 A is an isometric view of an example device 1310 equipped with one or more dedicated, signal-transmitting transducer(s) 1300.
  • the transducer 1300 is similar or identical to the auxiliary transducer 700 described above with respect to Figures 7 and 8.
  • the device 1310 is a network device comprising an audio amplifier, but otherwise lacking other transducers whose primary use is to output audio sound such as, for instance, the transducer 814 described above with respect to Figure 8.
  • An enclosure or housing 1316 houses, surrounds, and/or otherwise carries various electronic components of the device, such as one or more of the components discussed above with reference to Figures 2A-C, and 8.
  • the device 1310 can include one or more audio inputs and/or outputs such as interfaces 1320.
  • the interfaces 1320 can be configured to communicatively couple and/or electrically connect the device 1310 with one or more devices, loudspeakers, etc..
  • the interfaces 1320 provide output signals (e.g., via an audio plug, and audio adapter, and/or an audio cable, etc.) configured to drive one or more passive loudspeakers for playback of audio content processed via the device 1310.
  • the interfaces 1320 facilitate transmission of control signals and/or media data to one or more corresponding playback devices comprising one or more active loudspeakers.
  • the interfaces 1320 provide data to one or more devices that may or may not include audio transducers.
  • the interfaces are configured to receive data, either exclusively or in combination with outputting data.
  • Figure 13B is a block diagram of an audio chain, electronics or circuitry 1312 that is configured to be included or housed in the housing 1316 ( Figure 13 A) similar to the circuitry 800 ( Figure 8).
  • the transducer 1300 is configured to emit and/or transmit ultrasonic or near ultrasonic signals such as chirp signals.
  • the transducer 1300 can comprise for instance, a dynamic loudspeaker, a piezoelectric transducer, and/or a micro-electromechanical systems (MEMS) transducer configured to transmit sound in a frequency range suitable for data transmission (e.g., 18kHz or higher). In other examples, however, another suitable transducer type may be used.
  • MEMS micro-electromechanical systems
  • the transducer 1300 is capable of outputting sound in audible frequencies (e.g., one or more frequencies between about 20Hz and 20kHz).
  • the output of the amplifier 824 is transmitted to one or more external devices, transducers, etc. via the interfaces 1320, rather than the audio transducer(s) 814 ( Figure 8).
  • the transducer 1300 can be positioned at any suitable location in and/or on the device. As those of ordinary skill in the art will appreciate, however, the transducer 1300 will typically be disposed somewhere on or near an exterior surface of the housing 1316 to facilitate fluid coupling with air outside the device 1310. In some examples, the transducer 1300 is placed inside and/or within the housing 1316 and fluidly coupled with outside air via a port or vent (not shown). In some instances, the transducer can be placed on/near the center of the device 1310, for example to facilitate a more even reception of the acoustic signal around the device.
  • the transducer 1300 placed on/near the top of the device 1310, for example to reduce the risk that the signal is blocked by the surface where the device is placed and/or by any objects such as furniture surrounding the device.
  • the auxiliary transducer(s) can be placed on/near any of the walls of the housing 1316. Other placement arrangements are possible.
  • the device 1310 is not configured itself to output audio content, but instead, processes received media data and transmits the processed data in the form of a digital bitstream or an analog audio signal (or both) to one or more playback devices via the interfaces 1320.
  • the device 1310 may not include any transducer configured to output audible signals and/or configured to play back audio content.
  • the device 1310 may not include any transducer configured to output audible signals and/or configured to play back audio content.
  • These audible signals can be used, for instance, to inform a user of procedures that the device is undertaking or alert the user of an error condition or other issue requiring attention (e.g., excessive temperature, humidity, etc. or perhaps another malfunction).
  • the device can issue an earcon to notify the user that the device is being set up, and/or that there is information being exchanged between devices.
  • the transducer 1300 itself is capable of playing back one or more tones in audible frequencies, without needing the substantial electronics or amplification that a standard tweeter or other transducer may require.
  • the transducer 1300 can be configured to play back the one or more tones in audible frequencies as indications to a user that the device is undergoing some kind of processing (e.g., setup via chirp).
  • the transducer 1300 allows the device to transmit data (in the form of sound signals) and output audible alerts or notifications through the use of a single transducer, rather than requiring dedicated transducers for both functions. Doing so can obviate the need for a separate tweeter (or other suitable audio transducer) that would be used only once or twice (e.g., during setup) or minimal percentage of the time (e.g., less 1%) that the device 1310 operates.
  • using the transducer 1300 for at least the dual purposes described above can reduce cost and free up space needed for a dedicated tweeter.
  • the signals issued by the auxiliary transducers 700 can be used in many applications such as during setup or configuration procedures undertaken by the device.
  • Example setup procedures where the auxiliary transducers 700 can be used to transfer setup information such as a PIN or other data are described in U.S. Pat. Pub. No. 2022/0104015, filed September 24, 2021, titled “Intelligent Setup for Playback Devices,” which incorporated herein by reference in its entirety.
  • the device may include two or more transducers 1300.
  • Figure 14, for example, shows an example network device 1410 comprising two or more of the transducers 1300 disposed on multiple faces or portions of the device housing 1316.
  • the device 1410 includes two or more of the transducers 1300 disposed near opposite ends of the housing 1316 (e.g., near the front and the back, or near opposite sides of the device). In this way, the signals output by the device can be received from different distances and/or positions around the device.
  • the device 1410 can include circuitry and/or arbitration logic to arbitrate between the two or more transducers 1300.
  • the device 1410 can include one or more sensors 1415 (e.g., one or more microphones, cameras, infrared sensors, depth or motions sensors, and/or other suitable sensors) configured to detect the presence of a user/device nearer to one side of the device 1310.
  • sensors 1415 e.g., one or more microphones, cameras, infrared sensors, depth or motions sensors, and/or other suitable sensors
  • the transducers 1300 emits a sensor signal (e.g., a predetermined waveform) whose reflections are received via at least one or more of the sensors 1415.
  • the arbitration logic can receive the sensor signal and determine that the transducer 1300 closer to the back of the device is the one to output the chirp or other sound signal.
  • the two or more transducers 1300 all transmit the same chirp/sound signal and there is no arbitration logic to arbitrate between them.
  • Figures 15A and 15B illustrate schematic diagrams of a device 1511 configured to communicate via audio signals 1590 and 1592 with a first playback device 1510 ( Figure 15A) or a user device ( Figure 15B) such as the control device 130 described in more detail above with respect to Figure 1H.
  • the device 1511 may communicate with two or more playback devices, such as the first playback device 1510 and a second playback device 1510’.
  • the device 1511 may be configured to receive the audio signal(s) 1590 from the first playback device 1510 and correspondingly transmit the audio signal(s) 1592 toward the second playback device 1510’ (or vice versa).
  • the device 1511 includes electronics or circuitry 1512 ( Figure 15C) comprising an audio source 1504 (e.g., an audio storage including one or more sound files, earcons, etc.) that may or may not be output via the audio transducer 1300.
  • the circuitry 1512 omits the audio source 1504 and the audio transducer 1300 is used primarily as an audio signal output based on signals generated by the chirp generator 826 shown in Figure 15C and described above in more detail with respect to Figure 8.
  • the device 1511 lacks a network interface (such as the network interface 112d of Figure 1C) and/or is otherwise incapable of network communication.
  • a network interface such as the network interface 112d of Figure 1C
  • One benefit of this arrangement is the ability to “air-gap” the device 1511, effectively making it disconnected from a network such as a LAN or WAN, thereby severely hampering, or perhaps completely preventing, unauthorized attempts to access the device.
  • including the one or more audio transducer(s) 1300 with the device 1511 provides the additional benefit of confirming a particular room presence, particularly when used in conjunction with a playback device such as the first playback device 1510 and/or the second playback device 1510’.
  • the device 1511 is configured as a security device such as digital wallet or a so-called “smart wallet,” which can be configured to digitally store currency (e.g., cryptocurrency), etc.
  • a security device such as digital wallet or a so-called “smart wallet,” which can be configured to digitally store currency (e.g., cryptocurrency), etc.
  • the communication method necessarily requires that a user be in the same room (or at least in earshot or audio range) of the device 1511. Accordingly, the present technology can be implemented in security devices to significantly reduce “hacking” or unauthorized access by bad actors (e.g., hackers) from remote locations.
  • an audio parameter such as a room acoustic characteristic(s) is used to further verify a user’s presence in a particular space before access is granted (via the signal(s) 1590 and/or 1592) to the device 1511.
  • the device 1511 receives, via the sensors 1415 (e.g.., microphones) one or more tones emitted by at least one of the first playback device 1510 and/or the second playback device 1510’ to confirm a proper location (or vice versa) to obtain a room signature or “fingerprint” to thereby confirm an authorized room or environment.
  • the device 1511 may confirm an authorized location by detecting, via the one or more sensors 1415, a particular acoustic signature of the environment without any tones output by the playback devices. In certain examples, the device 1511 may determine authorized access by detecting (e.g., via the one or more sensors 1415) playback of a particular song or audio via the first playback device 1510 and/or second playback device 1510’.
  • FIG. 16 illustrates one example in which a vehicle 1610 (e.g., automobile, bus, boat, airplane, truck) or another suitable housing includes one or more auxiliary transducers 1600 on an exterior surface of the housing.
  • a vehicle 1610 e.g., automobile, bus, boat, airplane, truck
  • the vehicle 1610 may not be capable of using the one or more audio transducers 1614 to communicate using audio signals with other external devices, vehicles, etc. since the housing (i.e., cabin) substantially much (or all) of the high frequency acoustic energy from being transmitted outside the vehicle 1610.
  • the one or more auxiliary transducers 1600 disposed on the outside of the vehicle 1610 and fluidly coupled to the air therearound can enable audio communication with an external device (not shown) to facilitate vehicle setup, confirm vehicle presence, enable or disable vehicle access (e.g., door lock or unlock) and/or communicate other data associated with the vehicle.
  • an external device not shown
  • references herein to “embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one example embodiment of an invention.
  • the appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
  • the embodiments described herein, explicitly and implicitly understood by one skilled in the art can be combined with other embodiments.
  • At least one of the elements in at least one example is hereby expressly defined to include a tangible, non-transitory medium such as a memory, DVD, CD, Blu-ray, and so on, storing the software and/or firmware.
  • Example 1 provides a playback device comprising a first audio transducer, audio playback circuitry coupled to the first audio transducer, a second audio transducer uncoupled from the audio playback circuitry, one or more processors, and at least one non-transitory computer readable medium storing program instructions executable by the one or more processors to control the playback device to play back one or more channels of audio content via the audio playback circuitry and the first audio transducer, and acoustically transmit, via the second audio transducer, a reference audio signal including an identifier that identifies the playback device.
  • Example 2 includes the playback device of Example 1, wherein the second audio transducer is a micro-electromechanical system (MEMS) transducer.
  • MEMS micro-electromechanical system
  • Example 3 includes the playback device of Example 1, wherein the second audio transducer is a piezoelectric transducer.
  • Example 4 includes the playback device of any one of Examples 1-3, further comprising a microphone configured to detect an acoustic signal.
  • Example 5 includes the playback device of Example 4, wherein the at least one non- transitory computer readable medium further includes program instructions executable by the one or more processors to control the playback device to, based on the acoustic signal, detect a presence of at least one other playback device.
  • Example 6 includes the playback device of Example 4, wherein the acoustic signal comprises a superposition of a plurality of audio signals acoustically transmitted by a plurality of respective playback devices, and wherein the at least one non-transitory computer readable medium further includes program instructions executable by the one or more processors to control the playback device to determine a time-frequency representation of the acoustic signal, based on the time-frequency representation of the acoustic signal, determine that the acoustic signal includes a first audio signal transmitted by a first playback device and a second audio signal transmitted by a second playback device, and based on magnitudes of the first and second audio signals, determine that the first playback device is positioned closer than the second playback device to the playback device.
  • Example 7 includes the playback device of Example 6, wherein the at least one non- transitory computer readable medium further includes program instructions executable by the one or more processors to control the playback device to obtain data associating playback devices with respective reference audio signals each having a predefined time-frequency representation.
  • Example 8 includes the playback device of Example 4, wherein the at least one non- transitory computer readable medium further includes program instructions executable by the one or more processors to control the playback device to based on the acoustic signal, determine an estimated distance between the playback device and a surface of an environment in which the playback device is located.
  • Example 9 includes the playback device of Example 8, wherein the surface is a ceiling.
  • Example 10 includes the playback device of Example 4, wherein the at least one non- transitory computer readable medium further includes program instructions executable by the one or more processors to control the playback device to acoustically transmit the reference audio signal at a predetermined time, detect, via the microphone, the acoustic signal transmitted by at least one other playback device, determine a first time difference between the predetermined time and a time of detection of the acoustic signal, acquire information including a second time difference between a time of transmission of the acoustic signal by the at least one other playback device and a time of detection of the reference audio signal at the at least one other playback device, determine an average of the first and second time differences, and determine an estimated distance between the playback device and the at least one other playback device based on the average of the first and second time differences.
  • Example 11 includes the playback device of one of Examples 1-10, wherein the first audio transducer is configured to emit first sound waves in a first frequency range, and wherein the second audio transducer is configured to emit second sound waves in a second frequency range higher in frequency than the first frequency range.
  • Example 12 includes the playback device of Example 11 wherein the first frequency range includes frequencies below about 1 kHz, and wherein the second frequency range includes frequencies above aboutl8 kHz, optionally, in a range of about 18 kHz - 20 kHz.
  • Example 13 includes the playback device of any one of Examples 1-12, wherein the reference audio signal comprises a sequence of tones.
  • Example 14 includes the playback device of any one of Examples 1-13, wherein the playback device is a subwoofer.
  • Example 15 provides a playback device comprising a first audio transducer configured to produce a first acoustic output in a first frequency range, the first acoustic output corresponding to playback of at least one channel of audio content, a second audio transducer configured to produce a second acoustic output in a second frequency range higher in frequency than the first frequency range, one or more processors, and at least one non-transitory computer readable medium storing program instructions executable by the one or more processors to control the playback device to play back the at least one channel of audio content via the first audio transducer, encode data into a reference audio signal comprising a sequence of tones encoding the data, wherein the data includes an identifier that identifies the playback device, and acoustically transmit, via the second audio transducer, the reference audio signal, wherein the reference audio signal corresponds to the second acoustic output.
  • Example 17 include the playback device of Example 15, wherein the second audio transducer is a piezoelectric transducer.
  • Example 18 includes the playback device of any one of Examplesl5-17, wherein the playback device is a subwoofer.
  • Example 19 includes the playback device of any one of Examples 15-18, further comprising a microphone configured to detect an acoustic signal.
  • Example 20 includes the playback device of Example 19, wherein the at least one non- transitory computer readable medium further includes program instructions executable by the one or more processors to control the playback device to, based on the acoustic signal, detect a presence of at least one other playback device.
  • Example 21 includes the playback device of Example 20, wherein the at least one non- transitory computer readable medium further includes program instructions executable by the one or more processors to control the playback device to acoustically transmit the reference audio signal at a predetermined time, detect, via the microphone, the acoustic signal transmitted by the at least one other playback device, determine a first time difference between the predetermined time and a time of detection of the acoustic signal, acquire information including a second time difference between a time of transmission of the acoustic signal by the at least one other playback device and a time of detection of the reference audio signal at the at least one other playback device, determine an average of the first and second time differences, and determine an estimated distance between the playback device and the at least one other playback device based on the average of the first and second time differences.
  • Example 22 includes the playback device of Example 19, wherein the at least one non- transitory computer readable medium further includes program instructions executable by the one or more processors to control the playback device to based on the acoustic signal, determine an estimated distance between the playback device and a surface of an environment in which the playback device is located.
  • Example 23 includes the playback device of Example 22, wherein the surface is a ceiling.
  • Example 24 provides a subwoofer playback device comprising a first audio transducer configured to produce a first acoustic output in a first frequency range, the first acoustic output corresponding to playback of at least one channel of audio content, at least one auxiliary transducer configured to produce a second acoustic output in a second frequency range higher in frequency than the first frequency range, wherein the second acoustic output corresponds to a reference audio signal comprising a sequence of tones and including an identifier that identifies the subwoofer playback device, and at least one microphone configured to detect an acoustic signal.
  • Example 25 includes the subwoofer playback device of Example 24, wherein the at least one auxiliary transducer comprises at least one piezoelectric transducer.
  • Example 26 includes the subwoofer playback device of Example 24, wherein the at least one auxiliary transducer comprises at least one MEMS transducer.
  • Example 27 includes the subwoofer playback device of any one of Examples 24-26, further comprising a housing having multiple surfaces, wherein the at least one auxiliary transducer includes a plurality of auxiliary transducers arranged along two or more surfaces of the multiple surfaces of the housing.
  • Example 28 includes the subwoofer playback device of Example 27, wherein the at least one microphone is arranged on a third surface of the multiple surfaces of the housing.
  • Example 29 includes the subwoofer playback device of claim 28, wherein the two or more surfaces of the multiple surfaces of the housing are arranged orthogonal to the third surface.
  • Example 30 includes the subwoofer playback device of any one of Examples 24-29, further comprising one or more processors, and at least one non-transitory computer readable medium storing program instructions executable by the one or more processors to control the playback device to acoustically transmit the reference audio signal via the at least one auxiliary transducer at a predetermined time, determine a first time difference between the predetermined time and a time of detection of the acoustic signal, acquire information including a second time difference between a time of transmission of the acoustic signal by at least one other playback device and a time of detection of the reference audio signal at the at least one other playback device, determine an average of the first and second time differences, and determine an estimated distance between the subwoofer playback device and the at least one other playback device based on the average of the first and second time differences.
  • Example 31 includes the subwoofer playback device of any one of Examples 24-29, further comprising one or more processors, and at least one non-transitory computer readable medium storing program instructions executable by the one or more processors to control the playback device to, based on the acoustic signal, detect a presence of at least one other playback device.
  • Example 32 includes the subwoofer playback device of any one of Examples 24-29, further comprising one or more processors, and at least one non-transitory computer readable medium storing program instructions executable by the one or more processors to control the playback device to, based on the acoustic signal, determine an estimated distance between the playback device and a surface of an environment in which the playback device is located.
  • Example 33 includes the subwoofer playback device of Example 32, wherein the surface is a ceiling.
  • Example 34 includes the subwoofer playback device of any one of Examples 24-33, wherein the first frequency range includes frequencies below approximately 1 kHz.
  • Example 35 includes the subwoofer playback device of any one of Examples 24-34, wherein the second frequency range includes frequencies above approximately 18 kHz, optionally in a range of approximately 18 kHz - 20 kHz.
  • Example 36 provides a method of determining a distance between first and second playback devices, the method comprising instructing the first playback device to emit a first acoustic signal at a predetermined time as measured by a clock of the first playback device, instructing the second playback device to emit a second acoustic signal at the predetermined time as measured by a clock of the second playback device, determining a first time difference between the predetermined time as measured by the clock of the first playback device and a time of detection of the second acoustic signal at the first playback device, determining a second time difference between the predetermined time as measured by the clock of the second playback device and a time of detection of the first acoustic signal at the second playback device, calculating an average of the first and second time differences, and determining the distance between the first and second playback devices based on the average.
  • Example 37 provides a playback device configured to implement the method of Example 36.
  • Example 38 provides a first playback device comprising a network interface, an audio transducer, , one or more processors, and at least one non-transitory computer readable medium storing program instructions executable by the one or more processors to control the first playback device to filter, mix, amplify, and/or decode input audio content into one or more channels of audio content; cause, via a second playback device, playback of one or more channels of output audio content , and acoustically transmit, via the audio transducer, a reference audio signal including an identifier that identifies the first playback device.
  • Example 39 includes the first playback device of Example 38, wherein the at least one non-transitory computer readable medium includes program instructions executable by the one or more processors to control the first playback device to transmit, via the network interface, the one or more channels of output audio content.
  • Example 40 includes the first playback device of Example 38, wherein the first playback device comprises one or more hardware interfaces , and wherein causing, via the second playback device, playback of one or more channels of output audio content comprises outputting and/or sending the one or more channels of output audio content via the one or more hardware interfaces.
  • Example 41 includes the first playback device of Example 38, wherein the first playback device comprises an analog media.
  • Example 42 provides a network device comprising a housing, one or more audio interfaces for coupling to one or more external playback devices, an audio amplifier disposed within the housing and coupled to the one or more audio interfaces, a transducer disposed at least partially within the housing and configured to emit an acoustic data signal, and one or more processors disposed within the housing.
  • the network device further comprises at least one non-transitory computer readable medium disposed within the housing and storing program instructions executable by the one or more processors to control the network device to acoustically transmit, via the transducer, the acoustic data signal, wherein the acoustic data signal comprises at least one of a reference signal including an identifier that identifies the network device, or an audible notification signal.
  • Example 43 includes the network device of Example 42, wherein the transducer is a micro-electromechanical system (MEMS) transducer.
  • MEMS micro-electromechanical system
  • Example 44 includes the network device of Example 42, wherein the transducer is a piezoelectric transducer.
  • Example 45 includes the network device of any one of Examples 42-44, further comprising a chirp generator coupled to the transducer and configured to produce the acoustic data signal.
  • Example 46 includes the network device of Example 45, wherein the reference signal is an ultrasonic signal.
  • Example 47 includes the network device of any one of Examples 42-46, wherein the transducer is positioned proximate an external surface of the housing.
  • Example 48 includes the network device of Example 47, wherein the transducer is fluidly coupled with an external environment of the network device.
  • Example 49 includes the network device of any one of Examples 42-48, wherein the acoustic data signal includes the audible notification signal, and wherein the program instructions comprise program instructions executable by the one or more processors to control the network device to acoustically transmit the audible notification signal based on the network device undergoing a configuration procedure.
  • Example 50 includes the network device of any one of Examples 42-48, wherein the acoustic data signal includes the audible notification signal, and wherein the program instructions comprise program instructions executable by the one or more processors to control the network device to acoustically transmit the audible notification signal based on the network device experiencing an error condition.
  • Example 51 includes the network device of any one of Examples 42-50, wherein the transducer is a first transducer, the network device further comprising a second transducer at least partially disposed within the housing and configured to emit the acoustic data signal.
  • Example 52 includes the network device of Example 51, wherein the first and second transducers are disposed proximate opposite ends of the housing.
  • Example 53 includes the network device of Example 51, wherein the first transducer is disposed on a first face of the housing and the second transducer is disposed on a second face of the housing.
  • Example 54 includes the network device of any one of Examples 42-53, wherein the housing is a vehicle.
  • Example 55 includes the network device of any one of Examples 51-53, wherein at least one of the first and second transducers is configured to emit a sensor signal, and wherein the network device further comprises at least one sensor configured to detect a reflection of the sensor signal.
  • Example 56 includes the network device of Example 55, wherein the at least one sensor includes a microphone.
  • Example 57 includes the network device of any one of Examples 42-53, wherein the reference audio signal comprises a sequence of tones.
  • Example 58 includes the network device of any one of Examples 42-53 or 55-57, further comprising audio processing circuitry coupled to the audio amplifier, wherein the program instructions comprise program instructions executable by the one or more processors to control the network device to process media data using the audio processing circuitry to produce an output signal, amplify the output signal with the audio amplifier to produce an amplified output signal, and transmit the amplified output signal to the one or more external playback devices via the one or more audio interfaces.
  • the program instructions comprise program instructions executable by the one or more processors to control the network device to process media data using the audio processing circuitry to produce an output signal, amplify the output signal with the audio amplifier to produce an amplified output signal, and transmit the amplified output signal to the one or more external playback devices via the one or more audio interfaces.
  • Example 58 includes the network device of Example 57, wherein the one or more external playback devices comprise at least one passive speaker.
  • Example 59 includes the network device of any one of Examples 42-53 or 55-58, further comprising a microphone configured to detect an acoustic signal.
  • Example 60 includes the network device of Example 59, wherein the at least one non- transitory computer readable medium further includes program instructions executable by the one or more processors to control the network device to, based on the acoustic signal, detect a presence of at least one playback device.
  • Example 61 includes the network device of Example 60, wherein the acoustic signal comprises a superposition of a plurality of audio signals acoustically transmitted by a plurality of respective playback devices, and wherein the at least one non-transitory computer readable medium further includes program instructions executable by the one or more processors to control the network device to determine a time-frequency representation of the acoustic signal, based on the time-frequency representation of the acoustic signal, determine that the acoustic signal includes a first audio signal transmitted by a first playback device and a second audio signal transmitted by a second playback device, and based on magnitudes of the first and second audio signals, determine that the first playback device is positioned closer than the second playback device to the network device.
  • Example 62 includes the network device of Example 61, wherein the at least one non- transitory computer readable medium further includes program instructions executable by the one or more processors to control the network device to obtain data associating playback devices with respective reference audio signals each having a predefined time-frequency representation.
  • Example 63 includes the network device of Example 59, wherein the at least one non- transitory computer readable medium further includes program instructions executable by the one or more processors to control the network device to acoustically transmit the reference signal at a predetermined time, detect, via the microphone, the acoustic signal transmitted by at least one playback device, determine a first time difference between the predetermined time and a time of detection of the acoustic signal, acquire information including a second time difference between a time of transmission of the acoustic signal by the at least one playback device and a time of detection of the reference signal at the at least one playback device, determine an average of the first and second time differences, and determine an estimated distance between the network device and the at least one playback device based on the average of the first and second time differences.
  • Example 64 includes the network device of Example 59, wherein the at least one non- transitory computer readable medium further includes program instructions executable by the one or more processors to control the network device to determine that the network device is in an authorized location based on the acoustic signal.
  • Example 65 includes the network device of Example 64, wherein the acoustic signal comprises a sequence of tones.
  • Example 66 includes the network device of Example 64, wherein the acoustic signal comprises specified audio content.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • General Health & Medical Sciences (AREA)
  • Circuit For Audible Band Transducer (AREA)
EP24713302.8A 2023-02-24 2024-02-23 Wiedergabevorrichtungen mit dedizierten hochfrequenzwandlern Pending EP4670366A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363486831P 2023-02-24 2023-02-24
PCT/US2024/017126 WO2024178362A1 (en) 2023-02-24 2024-02-23 Playback devices with dedicated high-frequency transducers

Publications (1)

Publication Number Publication Date
EP4670366A1 true EP4670366A1 (de) 2025-12-31

Family

ID=90368221

Family Applications (1)

Application Number Title Priority Date Filing Date
EP24713302.8A Pending EP4670366A1 (de) 2023-02-24 2024-02-23 Wiedergabevorrichtungen mit dedizierten hochfrequenzwandlern

Country Status (2)

Country Link
EP (1) EP4670366A1 (de)
WO (1) WO2024178362A1 (de)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8234395B2 (en) 2003-07-28 2012-07-31 Sonos, Inc. System and method for synchronizing operations among a plurality of independently clocked digital data processing devices
US8483853B1 (en) 2006-09-12 2013-07-09 Sonos, Inc. Controlling and manipulating groupings in a multi-zone media system
US10743101B2 (en) 2016-02-22 2020-08-11 Sonos, Inc. Content mixing
US10712997B2 (en) 2016-10-17 2020-07-14 Sonos, Inc. Room association based on name
US10299039B2 (en) * 2017-06-02 2019-05-21 Apple Inc. Audio adaptation to room
US11988784B2 (en) 2020-08-31 2024-05-21 Sonos, Inc. Detecting an audio signal with a microphone to determine presence of a playback device
EP4672764A3 (de) 2020-09-25 2026-01-21 Sonos, Inc. Intelligente einrichtung für wiedergabevorrichtungen

Also Published As

Publication number Publication date
WO2024178362A1 (en) 2024-08-29

Similar Documents

Publication Publication Date Title
US12143800B2 (en) Systems and methods for authenticating and calibrating passive speakers with a graphical user interface
US12016062B2 (en) Systems and methods for configuring a media player device on a local network using a graphical user interface
US11184702B2 (en) Systems and methods of user localization
US11178504B2 (en) Wireless multi-channel headphone systems and methods
US11988784B2 (en) Detecting an audio signal with a microphone to determine presence of a playback device
US10735803B2 (en) Playback device setup
US20240187791A1 (en) Automatically allocating audio portions to playback devices
US11974090B1 (en) Headphone ear cushion attachment mechanism and methods for using
WO2024178362A1 (en) Playback devices with dedicated high-frequency transducers
US12593167B2 (en) Systems and methods of user localization
US20240334144A1 (en) Techniques for providing accessory attachment feedback
US20250133406A1 (en) Remote pairing initiation for audio devices
WO2025064375A1 (en) Wireless communication profile management
WO2026064319A1 (en) Systems and methods for configuring a wearable playback device

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20250924

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20260128