WO2024212126A1

WO2024212126A1 - Dynamic buffering in a multi-device audio playback environment

Info

Publication number: WO2024212126A1
Application number: PCT/CN2023/087797
Authority: WO
Inventors: Baihui XUE; Suma SUO; Yulin WAN
Original assignee: Harman International Industries, Incorporated
Priority date: 2023-04-12
Filing date: 2023-04-12
Publication date: 2024-10-17

Abstract

The present disclosure includes computer-implemented techniques for determining a buffer size in a network-connected audio system. The techniques include requesting one or more audio output devices to provide latency data and receiving, from the one or more audio output devices, a plurality of data samples, where each data sample in the plurality of data samples includes a latency value. The techniques further include selecting largest latency values from the latency values included in the plurality of data samples, determining an aggregate latency value based on the largest latency values, determining the buffer size based on the aggregate latency value, and distributing the buffer size to the one or more audio output devices. The buffer size is usable by the one or more audio output devices to configure a respective buffer.

Description

DYNAMIC BUFFERING IN A MULTI-DEVICE AUDIO PLAYBACK ENVIRONMENT

BACKGROUND

Field of the Various Embodiments

The various embodiments relate generally to audio systems, and more specifically, to techniques for dynamic buffering in a network-connected audio playback environment having multiple audio output devices.

Description of the Related Art

Improvements in network connectivity have facilitated the proliferation of complex network-connected audio systems having multiple audio output devices. Such systems typically include a primary device, which may or may not include an audio output device, and a plurality of satellite devices including respective audio output devices that simultaneously playback audio, such as multi-channel audio. The primary device is responsible for retrieving audio data from one or more media sources and distributing the audio data in the form of audio packets to the plurality of satellite devices for simultaneous playback. For example, the primary device transmits, over a network, multi-channel audio in the form of audio packets to respective satellite devices for playback.

However, the connections between devices in a network-connected system, whether wired or wireless, are susceptible to network jitter and/or other connectivity issues. Network jitter is a variation in the delay of packet arrival to the plurality of satellite devices that playback audio. Network jitter can result in result in the loss of audio data transmitted over the network by the primary device, which adversely impacts the playback quality of the satellite devices. Therefore, conventional network-connected audio systems have attempted various approaches to mitigate the negative impact that network jitter has on the quality of audio playback.

One of the primary causes of network jitter is network congestion, which occurs when an amount of data transmitted over a network exceeds the bandwidth capacity of the network and results in decreased transmission speeds across the network and the delayed reception of packets. To address the negative effects of jitter and congestion in network-connected audio systems, some systems have implemented complex data compression algorithms to reduce the size of packets that are transmitted over the network. However, at least one drawback to this approach is the large amount of processing power needed to perform the sophisticated mathematical operations used to compress and decompress the data packets. As many of the satellite devices in network-connected audio systems have limited processing power, a large portion of a respective satellite device’s processing resources are consumed by decompressing the data packets. Accordingly, a limited amount of the satellite device’s processing resources are left to process the audio data for playback, which results in a decrease in the quality of audio played back by the satellite device.

It is often difficult to anticipate the buffer size that will be needed as network delays and jitter can vary significantly for different networks. Thus, a predetermined default buffer size will not work for all situations. In addition, it is often difficult for a user to determine the appropriate buffer size for a network. These concerns can be addressed by increasing the buffer size of the satellite devices that playback the audio. By increasing the buffer size of a respective satellite device, the satellite device can store more audio packets thereby sustaining audio playback for longer periods of time while network delays occur. That is, storing a larger amount of audio packets in the buffer prevents the satellite device from running out of audio packets to playback even if network delays cause subsequent audio packets transmitted by the primary device to arrive later than expected.

However, at least one drawback to increasing the buffer size of a satellite device is the associated increase in the latency of data processing by the satellite device. For example, when a buffer in a satellite device queues a large number of audio packets, the latency, or delay, associated with processing a respective audio packet at the beginning of the queue is increased. At least another drawback to increasing the buffer size of a respective satellite device in a network-connected audio system is that not all satellite devices in the network-connected audio system have the same buffer size. Thus, when the buffer size of one satellite device in the network-connected audio system is increased, there is a resultant mismatch between the respective buffer sizes of satellite devices in the network-connected audio system. When respective buffer sizes of satellite devices in a network-connected audio system are mismatched, satellite devices with larger buffers take longer to process and playback audio packets than satellite devices with smaller buffers, thereby disrupting the synchronization of audio playback in the network-connected system. Unsynchronized audio playback in a network-connected audio system, such as a multi-channel audio system, decreases the quality of the listening experience.

As the foregoing illustrates, what is needed are more effective techniques for buffering audio data for playback by a network-connected audio system having multiple audio output devices.

SUMMARY

Various embodiments of the present disclosure set forth a computer-implemented method for determining a buffer size in a network-connected audio system. The method includes requesting one or more audio output devices to provide latency data and receiving, from the one or more audio output devices, a plurality of data samples, where each data sample in the plurality of data samples includes a latency value. The method further includes selecting largest latency values from the latency values included in the plurality of data samples, determining an aggregate latency value based on the largest latency values, determining the buffer size based on the aggregate latency value, and distributing the buffer size to the one or more audio output devices. The buffer size is usable by the one or more audio output devices to configure a respective buffer.

Further embodiments provide, among other things, one or more non-transitory computer-readable media and systems configured to implement the method set forth above.

At least one technical advantage of the disclosed techniques relative to the prior art is that, with the disclosed techniques, the buffer size of a satellite device in a network-connected audio system can be dynamically adjusted based on the network conditions. Thus, when network conditions are light on traffic, the buffer size of a satellite device can be decreased without an increased risk of the satellite device running out of audio packets to playback while also decreasing the processing latency of the satellite device. In contrast, when the network is congested, the buffer size of a satellite device can be increased as needed to store additional audio packets for playback. Another technical advantage is that the buffer size of all satellite devices in the network-connected audio system can be synchronized, which eliminates the adverse effects on audio playback synchrony that are attributed to satellite devices with mismatched buffer sizes. Furthermore, another technical advantage is that the disclosed techniques do not implement the use of complex, resource intensive algorithms that strain the processing capabilities of satellite devices. Instead, the disclosed techniques use only a small portion of a satellite device’s processing resources thereby allowing more of the satellite device’s processing power to be used for audio processing. As a result, listeners can enjoy a higher quality listening experience relative to conventional techniques. These technical advantages provide one or more technological improvements over prior art approaches.

BRIEF DESCRIPTION OF THE DRAWINGS

So that the manner in which the above recited features of the various embodiments can be understood in detail, a more particular description of the inventive concepts, briefly summarized above, may be had by reference to various embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of the inventive concepts and are therefore not to be considered limiting of scope in any way, and that there are other equally effective embodiments.

FIG. 1 illustrates a network-connected audio system, according to one or more aspects of the various embodiments;

FIG. 2 is a block diagram of the primary computing device of FIG. 1, according to one or more aspects of the various embodiments;

FIG. 3 is a block diagram of the satellite audio output device of FIG. 1, according to one or more aspects of the various embodiments;

FIG. 4 illustrates a normal distribution for latency data, according to one or more aspects of the various embodiments;

FIG. 5 illustrates a normal distribution for the largest descending-sorted latency data, according to one or more aspects of the various embodiments;

FIG. 6 is a block diagram of an audio buffer application that may be implemented by the primary computing device of FIG. 1, according to one or more aspects of the various embodiments;

FIG. 7 is a flow chart of method steps for determining a buffer size in a network-connected audio system, according to one or more aspects of the various embodiments; and

FIG. 8 is a flow chart of method steps for adjusting a buffer size in a satellite audio output device, according to one or more aspects of the various embodiments.

DETAILED DESCRIPTION

In the following description, numerous specific details are set forth to provide a more thorough understanding of the various embodiments. However, it will be apparent to one of skilled in the art that the inventive concepts may be practiced without one or more of these specific details.

FIG. 1 illustrates a network-connected audio system 100 configured to implement one or more aspects of the various embodiments. As shown, network-connected audio system 100 includes, without limitation, a primary computing device 102 and multiple satellite audio output devices 110-1 through 110-N. Primary computing device 102 is communicatively coupled to satellite audio output devices 110 via audio device network 112.

In some embodiments, audio device network 112 is a wireless network, such as a Wi-Fi network, an ad-hoc Wi-Fi network, a Bluetooth network, and/or the like, over which primary computing device 102 and satellite audio output devices 110 communicate. In other embodiments, audio device network 112 is a wired network. Communications in audio device network 112 can use standard protocols (e.g., Bluetooth, Wi-Fi) or proprietary protocols (e.g., a proprietary protocol associated with a specific manufacturer) . For example, primary computing device 102 can communicate, via audio device network 112, with satellite audio output devices 110 using standard protocols or proprietary protocols. As another example, satellite audio output devices 110 can communicate, via audio device network 112, with other satellite audio output devices 110 using standard protocols or proprietary protocols.

Network-connected audio system 100 further includes one or more media content services 120 that are communicatively coupled to primary computing device 102 via one or more networks 122. Media content service (s) 120 includes one or more computerized services configured to provide (e.g., distribute) media content to devices (e.g., to primary computing device 102) . Media or media content as used herein, includes, without limitation, audio content (e.g., spoken and/or musical audio content, audio content files, streaming audio content, audio track of a video, and/or the like) and/or video content. Examples of media contents service (s) 120 include, without limitation, Spotify, Apple Music, Pandora, YouTube Music, Tidal and/or other audio content streaming services. Some other examples of media content service (s) 120 include, without limitation, media content streaming services such as digital media content sellers, media servers (local and/or remote) , YouTube, Netflix, HBO, and/or other media content streaming services. More generally, media content service (s) 120 include one or more computer systems (e.g., a server, a cloud computing system, a networked computing system, a distributed computing system, etc. ) for storing and distributing media content.

As described above, primary computing device 102 can communicatively couple with media content service (s) 120 via network (s) 122 to download and/or stream media content from media content service (s) 120. Network (s) 122 can be any technically feasible type of communications network that allows data to be exchanged between primary computing device 102 and other systems or devices, such as media content service (s) 120, a server, a cloud computing system, or other networked computing devices or system. For example, network (s) 122 can include a wide area network (WAN) , a local area network (LAN) , a wireless network (e.g., a Wi-Fi network, a cellular data network, an ad-hoc network) , and/or the Internet, among others. Communications in network (s) can use standard protocols (e.g., Bluetooth, Wi-Fi) or proprietary protocols (e.g., a proprietary protocol associated with a specific manufacturer) . For example, primary computing device 102 can communicate, via network (s) 122, with media content service (s) 120 using standard protocols or proprietary protocols. In some embodiments, network (s) 122 include audio device network 112. In other embodiments, audio device network 112 is separate from network (s) 122. Although not shown, in some embodiments, satellite audio output devices 110 can communicatively couple with media content service (s) 120 via network (s) 122.

FIG. 2 illustrates a block diagram of primary computing device 102 that can be implemented in conjunction with network-connected audio system 100, according to one or more aspects of the various embodiments. Primary computing device 102 manages the playback of audio in network-connected audio system 100. For example, primary computing device 102 receives, via network (s) 122, audio content from media content service (s) 120, and transmits, via audio device network 112, the audio content to one or more satellite audio output devices 110 for playback. As will be described in more detail below, primary computing device 102 can transmit one or more additional messages to satellite audio output devices 110 for managing audio playback in network-connected audio system 100. For example, primary computing device 102 can transmit one or more messages to satellite audio output devices 110 that instruct satellite audio output devices 110 to adjust the size of their audio buffers.

As shown, primary computing device 102 includes, without limitation, one or more processing units 202, network interface 210, input/output (I/O) devices interface 212, input device (s) 220, output device (s) 222, system storage 230, and system memory 232. Primary computing device 102 further includes an interconnect 240 that is configured to facilitate transmission of data, such as programming instructions and application data, between processing unit (s) 202, network interface 210, I/O devices interface 212, system storage 230, and system memory 232.

Processing unit (s) 202 can be any technically feasible processing device configured to process data and execute program instructions. For example, processing unit (s) 202 could include one or more central processing units (CPUs) , digital signal processors (DSPs) , graphics processing units (GPUs) , application-specific integrated circuits (ASICs) , field-programmable gate arrays (FPGAs) , microprocessors, microcontrollers, other types of processing units, and/or a combination of different processing units. Processing unit (s) 202 can further include a real-time clock (RTC) (not shown) according to which processing unit (s) 202 maintains an estimate of the current time. The estimate of the current time can be expressed in Universal Coordinated Time (UTC) , although any other standard of time measurement can also be used. Processing unit (s) 202 is configured to retrieve and execute programming instructions, such as audio playback application 242 and audio buffer application 250, stored in system memory 232. Similarly, processing unit (s) 202 is configured to store application data (e.g., software libraries) and retrieve application data from the system memory 232.

Primary computing device 102 can connect with audio device network 112 and/or network (s) 122 via network interface 210. For example, primary computing device 102 connects, via network interface 210, to audio device network 112 to communicate with satellite audio output devices 110. As another example, primary computing device 102 connects, via network interface 210, to network (s) 122 to communicate with media content service (s) 120. In some embodiments, network interface 210 is hardware, software, or a combination of hardware and software, that is configured to connect to and interface with audio device network 112 and/or network (s) 122. In some embodiments, network interface 210 facilitates communication with other devices or systems via one or more standard and/or proprietary protocols (e.g., Bluetooth, a proprietary protocol associated with a specific manufacturer, etc. )

I/O devices interface 212 is configured to receive input data from input device (s) 220 and transmit the input data to processing unit (s) 202 via the interconnect 240. For example, input device (s) 220 can include one or more buttons, a keyboard, a mouse, a graphical user interface, a touchscreen display, and/or other input devices. The I/O devices interface 212 is further configured to receive output data from processing unit (s) 202 via the interconnect 240 and transmit the output data to the output device (s) 222. For example, output device (s) 222 can include one or more of a display device, a touchscreen display, a graphical user interface, and/or other output devices. In some embodiments, primary computing device 102 is an audio output device. In such embodiments, output device (s) 222 include one or more loudspeakers 252 configured to playback audio content. In some embodiments, one or more satellite audio output devices 110 are coupled to primary computing device 102 via I/O devices interface 212.

System storage 230 can include non-volatile storage for applications, software modules, and data, and can include fixed or removable disk drives, flash memory devices, and CD-ROM, DVD-ROM, Blu-Ray, HD-DVD, or other magnetic, optical, solid state storage devices, and/or the like. System storage 230 can be fully or partially located in a remote storage system, referred to herein as “the cloud, ” and accessed through connections such as audio device network 112 and/or network (s) 122. System storage 230 is configured to store non-volatile data such as files (e.g., audio files, video files, subtitles, application files, software libraries, etc. ) .

System memory 232 can include a random access memory (RAM) module, a flash memory unit, or any other type of memory unit or combination thereof. One or more of processing unit (s) 202, network interface 210, and I/O devices interface 212, are configured to read data from and write data to system memory 232. System memory 232 includes various software programs and modules (e.g., an operating system, one or more applications) that can be executed by processing unit (s) 202 and application data (e.g., data loaded from system storage 230) associated with said software programs. For example, as will be described in more detail below, system memory 232 includes audio playback application 242 and audio buffer application 250. In some embodiments, audio playback application 242 and audio buffer application 250 are combined into a single application.

When executed by processing unit (s) 202, audio playback application 242 manages playback of audio in network-connected audio system 100. For example, audio playback application 242 retrieves, via network (s) 122, audio content from media content service (s) 120 and distributes, via audio device network 112, the audio content to satellite audio output devices 110 for playback. In some embodiments, audio playback application 242 synchronizes, using one or more suitable methods, playback of audio across satellite audio output devices 110. In some embodiments, audio playback application 242 also transmits playback timing information and/or other information, such as audio channel assignments or volume controls, associated with audio playback to satellite audio output devices 110.

As will be described in more detail below, when executed by processing unit (s) 202, audio buffer application 250 manages the buffer size of audio buffers included in satellite audio output devices 110. In operation, audio buffer application 250 determines, based on one or more conditions of audio device network 112, a dynamic buffer size for the audio buffers included in satellite audio output devices 110. For example, audio buffer application 250 determines the dynamic buffer size based on latency data associated with the latencies, or transit delays, of messages transmitted by primary computing device 102 to satellite audio output devices 110 over audio device network 112. After determining the dynamic buffer size, audio buffer application 250 distributes, via audio device network 112, the dynamic buffer size to satellite audio output devices 110.

Audio buffer application 250 also manages the retrieval of latency data used for determining the dynamic buffer size of audio buffers included in satellite audio output devices 110. As will be described in more detail below, audio buffer application 250 requests, via audio device network 112, satellite audio output devices 110 to provide latency data to audio buffer application 250. In response to receiving the request from audio buffer application 250, a respective satellite audio device 110 transmits its latency data to audio buffer application 250, and audio buffer application 250 then stores the latency data as latency data 262 in database (s) 260 in system memory 232. Latency data received from a respective satellite audio output device 110 includes one or more latency data samples associated with the latencies, or transit delays, of messages transmitted by primary computing device 102 the respective satellite audio output device 110. Each latency data sample is associated with a particular message transmitted by primary computing device 102 and received by satellite audio output device 110. Further, each latency data sample includes a latency value that is indicative of a transit delay of the message. The transit delay of a message is a time difference between a time at which the message was transmitted by primary computing device 102 and a time at which the message was received by a respective satellite audio output device 110. A latency data sample can further include a time stamp that indicates a time at which the latency data sample was generated by satellite audio output device 110 and/or a time at which the message was received by satellite audio output device 110.

In some embodiments, audio buffer application 250 requests satellite audio output devices 110 to provide their respective latency data on a periodic basis (e.g., every minute, every 5 minutes, every half hour, every hour, etc. ) . In some embodiments, audio buffer application 250 requests satellite audio output devices 110 to provide their respective latency data on an ad-hoc basis. In some embodiments, audio buffer application 250 requests satellite audio output devices 110 to provide their respective latency data in response to a trigger event. For example, audio buffer application 250 can request satellite audio output devices 110 to provide their latency data in response to detecting that an amount of traffic in audio device network 112 exceeds a threshold, in response to detecting that an amount of traffic in audio device network 112 has decreased below a threshold, in response to detecting that an amount of time taken for a satellite audio output device 110 to respond to primary computing device 102 exceeds a threshold, and/or in response to some other trigger event.

In some embodiments, audio buffer application 250 determines latencies associated with messages transmitted by primary computing device 102 to satellite audio output devices 110 without requesting the satellite audio output devices 110 to provide their respective latency data. In such embodiments, audio buffer application 250 determines the latency associated with any message transmitted by primary computing device 102 to a satellite audio output device 110 over audio device network 115. For example, audio buffer application 250 can timestamp, with the time at which the message is transmitted, each message transmitted by primary computing device 102 to a satellite audio output device 110. In such an example, the satellite audio output device 110 receives the timestamped message from primary computing device 102 and transmits a response message to primary computing device 102. The response message includes a first time stamp indicative of the time at which the primary computing device 102 transmitted the message and a second timestamp indicative of the time at which the satellite audio output device 110 received the message transmitted by primary computing device 102. Accordingly, audio buffer application 250 can determine, based on the difference between the first and second time stamps included in the response message transmitted by satellite audio output device 110, the latency of a message transmitted by primary computing device 102 to satellite audio output device 110. The audio buffer application 250 can then store the determined message latency as latency data 262 in database (s) 260.

In some embodiments, a respective satellite audio output device 110 transmits a response message, which includes a first timestamp indicative of the time a message was transmitted by primary computing device 102 and a second timestamp indicative of the time the message was received by satellite audio output device 110, to every message received from primary computing device 102. Accordingly, in such embodiments, audio buffer application 250 determines, based on a response message received from satellite audio output device 110, a latency of every message transmitted by primary computing device 102. In some embodiments, a respective satellite audio output device 110 transmits a response message, which includes a first timestamp indicative of the time a message was transmitted by primary computing device 102 and a second timestamp indicative of the time the message was received by satellite audio output device 110, to only some of the messages received from primary computing device 102. In such embodiments, audio buffer application 250 determines the respective latencies of messages transmitted by primary computing device 102 for which satellite audio output device 110 transmitted a response message. In other embodiments, satellite audio output device 110 transmits response messages, which include the first and second time stamps, in response to receiving certain types of messages (e.g., control messages or messages containing audio packets) from primary computing device 102. In such embodiments, audio buffer application 250 determines the respective latencies of the certain types of messages transmitted by primary computing device 102 for which satellite audio output device 110 transmitted a response message. In some embodiments, the satellite audio output device 110 determines the latency of the message received from primary computing device 102 and transmits a response message including the message latency to primary computing device 102. Accordingly, in such embodiments, audio buffer application 250 determines the message latency to be the latency value included in the response message. In some embodiments, audio buffer application 250 determines the latency of a message transmitted by a satellite audio output device 110 over audio device network 115. In such embodiments, a message transmitted by satellite audio output device 110 includes a timestamp indicative of the time at which the message was transmitted. For example, audio buffer application 250 can determine the latency of a message transmitted by a satellite audio output device 110 based on the difference between the time at which the message was transmitted by the satellite audio output device 110 and the time at which the audio buffer application 250 receives the message.

As described above, in some embodiments, system memory 232 further includes one or more databases 260 that are loaded from system storage 230 into system memory 232. Database (s) 260 includes application data, user data, media content, and/or other data that is associated with one or more applications that can be executed by processing unit (s) 202. In the illustrated example, database (s) 260 further includes latency data 262. Latency data 262 includes latency data samples received from satellite audio output devices 110. Audio buffer application 250 receives one or more latency data samples from satellite audio output devices 110 and stores the received one or more latency data samples as latency data 262 in database (s) 260. This latency data 262 is then used by audio buffer application 250 to determine the dynamic buffer size for satellite audio output devices 110.

FIG. 3 illustrates a block diagram of a satellite audio output device 110 that can be implemented in conjunction with network-connected audio system 100, according to one or more aspects of the various embodiments. For example, satellite audio output device 110 is used to implement any satellite audio output device 110-1 through 110-N of FIG. 1. In some embodiments, satellite audio output device 110 is used to implement primary computing device 102 of FIG. 1. In such embodiments, satellite audio output device 110 is additionally configured to perform one or more of the actions described herein as being performed by primary computing device 102.

In operation, satellite audio output device 110 plays back audio content received, via audio device network 112, from primary computing device 102. For example, primary computing device 102 transmits messages including one or more packets of audio content to satellite audio output device 110 and satellite audio output device 110 plays back, or outputs, the audio content. Satellite audio output device 110 outputs the audio content in a synchronized or near-synchronized manner with one or more other satellite audio output devices 110 and/or primary computing device 102 coupled to network-connected audio system 100. For example, if satellite audio output device 110 is used to implement satellite audio output device 110-1 of FIG. 1, satellite audio output device 110-1 outputs audio content in a synchronized or near-synchronized manner with one or more of satellite audio output devices 110-2 through 110-N and/or primary computing device 102. In some embodiments, satellite audio output device 110 outputs one or more individual channels of audio content received from primary computing device 102.

As shown, satellite audio output device 110 includes, without limitation, one or more processing units 302, network interface 310, I/O devices interface 312, input device (s) 320, loudspeaker (s) 322, output device (s) 330, system memory 332, and audio processing circuit 340. Satellite audio output device 110 further includes an interconnect 342 that is configured to facilitate transmission of data, such as programming instructions and application data, between processing unit (s) 302, network interface 310, I/O devices interface 312, system memory 332, and audio processing circuit 340.

Processing unit (s) 302 can be any technically feasible processing device configured to process data and execute program instructions. For example, processing unit (s) 302 could include one or more CPUs, DSPs, GPUs, ASICs, FPGAs, microprocessors, microcontrollers, other types of processing units, and/or a combination of different processing units. Processing unit (s) 302 can further include an RTC (not shown) according to which processing unit (s) 302 maintains an estimate of the current time. The estimate of the current time can be expressed in UTC, although any other standard of time measurement can also be used. In some embodiments, the RTC included in processing unit (s) 302 is synchronized with the RTC included in processing unit (s) 202 of primary computing device 102. Processing unit (s) 302 is configured to retrieve and execute programming instructions, such as audio playback application 350, latency application 352, and audio buffer application 360, stored in system memory 332. Similarly, processing unit (s) 302 is configured to store application data (e.g., software libraries) and retrieve application data from the system memory 332.

Satellite audio output device 110 can connect with audio device network 112 via network interface 310. For example, satellite audio output device 110 connects, via network interface 310, to audio device network 112 to communicate with primary computing device 102 and/or other satellite audio output devices 110. In some embodiments, network interface 310 is hardware, software, or a combination of hardware and software, that is configured to connect to and interface with audio device network 112. In some embodiments, network interface 310 facilitates communication with other devices or systems via one or more standard and/or proprietary protocols (e.g., Bluetooth, a proprietary protocol associated with a specific manufacturer, etc. ) . In some embodiments, satellite audio output device 110 also connects with network (s) 122 via network interface 310.

I/O devices interface 312 is configured to receive input data from input device (s) 320 and transmit the input data to processing unit (s) 302 via the interconnect 342. For example, input device (s) 320 can include one or more buttons, knobs, a keyboard, a mouse, a graphical user interface, a touchscreen display, and/or other input devices. The I/O devices interface 312 is further configured to receive output data from processing unit (s) 302 and/or audio processing circuit 340 via the interconnect 342 and transmit the output data to loudspeaker (s) 322 and/or other output device (s) 330. Loudspeaker (s) 322 is configured to playback, or output, audio content received from primary computing device 102. For example, audio processing circuit 340 processes audio content received from primary computing device 102 and transmits, via interconnect 342 and I/O devices interface 312, the processed audio content to loudspeaker (s) 322 for playback. Other output device (s) 330 can include one or more of light emitting diode (LED) indicators, a display device, a touchscreen display, a graphical user interface, and/or other output devices.

System memory 332 can include a random access memory (RAM) module, a flash memory unit, or any other type of memory unit or combination thereof. One or more of processing unit (s) 302, network interface 310, and I/O devices interface 312, are configured to read data from and write data to system memory 332. System memory 332 includes various software programs and modules (e.g., an operating system, one or more applications) that can be executed by processing unit (s) 302 and application data (e.g., data loaded from system storage 230) associated with said software programs. For example, as will be described in more detail below, system memory 332 includes audio playback application 350, latency application 352, audio buffer application 360, and an audio buffer 362. In some embodiments, one or more of audio playback application 350, latency application 352, and audio buffer application 360 are combined into a single application.

In some embodiments, satellite audio output device 110 further includes system storage. In such embodiments, the system storage includes non-volatile storage for applications, software modules, and data. The system storage can include fixed or removable disk drives, flash memory devices, and CD-ROM, DVD-ROM, Blu-Ray, HD-DVD, or other magnetic, optical, solid state storage devices, and/or the like. In some instances, the system storage is fully or partially located in a remote storage system, referred to herein as “the cloud, ” and accessed through connections such as audio device network 112 and/or network (s) 122. The system storage is configured to store non-volatile data such as files (e.g., audio files, video files, subtitles, application files, software libraries, etc. ) .

When executed by processing unit (s) 302, audio playback application 350 manages playback of audio by satellite audio output device 110. Managing playback of audio by satellite audio output device 110 can include, without limitation, managing receipt of audio content from primary computing device 102, storing received audio content in audio buffer 362 before playback, providing audio content stored in audio buffer 362 to audio processing circuit 340 for processing the audio content to prepare the audio content for playback, and/or managing delivery of processed audio content output by audio processing circuit 340 to loudspeaker (s) 322 via interconnect 342 and/or I/O devices interface 312. For example, audio playback application 350 stores received audio content in audio buffer 362 and/or retrieves audio content stored in audio buffer 362 for processing based on playback timing information and/or other information associated with playback of the audio content that is received from primary computing device 102. As another example, audio playback application 350 manages the processing, via audio processing circuit 340, of audio content and/or the delivery of processed audio content from audio processing circuit 340 to loudspeaker (s) 322 based on playback timing information and/or other information associated with playback of the audio content that is received from primary computing device 102.

Audio processing circuit 340 be any technically feasible processing circuit configured to process audio content for playback by loudspeaker (s) 322. For example, audio processing circuit 340 can include one or more DSPs, one or more of a digital-to-analog (DAC) converter, one or more filters, and/or an audio amplifier. In operation, audio processing circuit 340 processes audio content stored in audio buffer 362 for playback by loudspeaker (s) 322. For example, based on one or more instructions from audio playback application 350, audio processing processes audio content stored in audio buffer 362 and delivers, via interconnect 342 and I/O devices interface 312, processed audio content to loudspeaker (s) 322 for playback. Processing audio content can include, without limitation, converting, via a DAC, audio content stored in a digital format to an analog format, applying one or more filters to the audio content, and/or amplifying the audio content.

When executed by processing unit (s) 302, latency application 352 determines the latency, or transit delay, of a message (or packet) that satellite audio output device 110 receives, via audio device network 112, from primary computing device 102. For example, when primary computing device 102 transmits a message, such as a message including one or more packets of audio content, primary computing device 102 timestamps the message with a time at which the message is transmitted by primary computing device 102. When satellite audio output device 110 receives the message transmitted by primary computing device 102, latency application 352 timestamps the received message with a time at which the message was received by satellite audio output device 110. Then, latency application 352 determines a latency, or transit delay time, of the message transmitted by primary computing device 102 based on the transmission time of the message and the reception time of the message.

In some embodiments, latency application 352 determines the latency of the message to be the time difference between the time at which the message was transmitted by primary computing device 102 and the time at which the message was received by satellite audio output device 110. For example, in such embodiments, if a message is transmitted by primary computing device 102 at 10: 01: 29.100 and received by satellite audio output device 110 at 10: 01: 29.250, latency application 352 determines that the latency, or transit delay, of the message is 150 milliseconds (ms) . In some embodiments, latency application 352 further considers an amount of time taken to process a message received from primary computing device 102 when determining a latency of the message. In such embodiments, latency application 352 adds the processing time to the determined time difference between the time at which the message was transmitted by primary computing device 102 and the time at which the message was received by satellite audio output device 110. Referring to the above example, if it assumed that a message transmitted by primary computing device 102 at 10: 01: 29.100 and received by satellite audio output device 110 at 10: 01: 29.250 takes an additional 5 ms to be processed by satellite audio output device 110, latency application 352 determines that the latency of the message is 155 ms. An amount of time taken to process a message can include one or more of a time taken by processing unit (s) 302 to process the message, a time taken by audio playback application 350 to process the message, a time taken by latency application 352 to process the message, a time taken for the audio content included in the message to move through audio buffer 362, a time taken by audio processing circuit 340 to process the audio content included in the message, a time difference between receipt of the message and playback, by loudspeaker (s) 322, of audio content included in the message, and/or some other amount of time associated with processing the message at the satellite audio output device 110.

In some embodiments, latency application 352 determines a latency of every message received from primary computing device 102. In some embodiments, latency application 352 determines the latencies of only some of the messages received from primary computing device 102. In some embodiments, latency application 352 determines the latencies of messages received from primary computing device 102 on a periodic basis. For example, latency application 352 determines the latency of every tenth message received from primary computing device 102, every hundredth message received from primary computing device 102, every thousandth message received from primary computing device 102, and/or determines latencies of messages received from primary computing device 102 at some other interval. In some embodiments, latency application 352 determines the latencies of the messages received from primary computing device 102 on an ad-hoc basis.

After determining the latency of a message received from primary computing device 102, latency application 352 generates a latency data sample including the determined latency value and stores the latency data sample in system memory 332 and/or some other system storage of satellite audio output device 110. In some embodiments, a latency data sample further includes a time stamp indicative of a time at which the latency data sample was generated and/or indicative of a time at which latency application 352 received the message associated with the latency data sample.

At a given time, one or more latency data samples generated by latency application 352 are stored in system memory 332 and/or some other system storage of satellite audio output device 110. In some embodiments, in response to receiving a request for latency data from primary computing device 102, latency application 352 transmits the one or more latency data samples to primary computing device 102. In some embodiments, latency application 352 transmits the one or more latency data samples to primary computing device 102 without receiving a request for latency data from primary computing device 102. In such embodiments, latency application 352 transmits one or more latency data samples to primary computing device 102 on a periodic basis (e.g., every minute, every 5 minutes, every half hour, every hour, etc. ) and/or an ad-hoc basis. In some embodiments, latency application 352 transmits one or more latency data samples to primary computing device 102 in response to a trigger event. For example, latency application 352 transmits one or more latency data samples to primary computing device 102 in response to detecting that an amount of traffic in audio device network 112 exceeds a threshold, in response to detecting that an amount of traffic in audio device network 112 has decreased below a threshold, in response to determining that a latency of a message received from primary computing device 102 exceeds a threshold, and/or in response to some other trigger event. In some embodiments, latency application 352 deletes the one or more latency data samples from system memory 332 and/or some other system storage of satellite audio output device 110 after transmitting the one or more latency data samples to primary computing device 102.

As described above with respect to audio buffer application 250, in some embodiments, satellite audio output device 110 transmits a response message to some or all messages received from primary computing device 102, even if the received messages do not explicitly request latency data. In such embodiments, in response to receiving a message from primary computing device 102 that includes a times stamp indicative of the time at which the message was transmitted by primary computing device 102, latency application 352 transmits a response message to primary computing device 102. In some examples, the response message includes a first time stamp indicative of the time at which the primary computing device 102 transmitted the message and a second time stamp indicative of the time at which latency application 352 received the message from primary computing device 102. In such examples, audio buffer application 250 determines the latency of the message based on the first and second time stamps included in the response message. In other examples, the latency application 250 determines the latency of the message transmitted by primary computing device 102 based on the time stamp indicative of the time at which primary computing device 102 transmitted the message. In such examples, the response message transmitted by latency application 352 includes the determined message latency.

When executed by processing unit (s) 302, audio buffer application 360 manages the size of audio buffer 362. Audio buffer application 360 configures the size of audio buffer 362 based on the dynamic buffer size determined by audio buffer application 250 of primary computing device 102. For example, audio buffer application 360 receives, via audio device network 112, a message including a dynamic buffer size from audio buffer application 250 of primary computing device 102 and then configures the size of audio buffer 362 to be the dynamic buffer size.

Audio buffer 362 stores packets, or samples, of audio content received from primary computing device 102 before the packets of audio content are processed by audio processing circuit 340 and/or played back by loudspeaker (s) 322. The size of audio buffer 362 can be increased to store more packets of audio content or decreased to store fewer packets of audio content. As described above, it is difficult to anticipate an appropriate buffer size as network delays and/or jitter in audio device network 112 can vary over time. For example, if the size of audio buffer 362 is too small, satellite audio output device 110 can run out of audio packets to playback when the delays and/or jitter in audio device network 112 are relatively high. In contrast, if the size of audio buffer 362 buffer is too large, the time taken by satellite audio output device 110 process and playback audio packets stored in audio buffer 362 increases. As a consequence, this increase in processing time can cause the playback of satellite audio output device 110 to fall out of sync with the audio playback of other satellite audio output devices 110 and/or primary computing device 102 coupled to network-connected audio system 100.

Accordingly, to prevent the above-described drawbacks associated with making the size of audio buffer 362 too small or too large, audio buffer application 250 of primary computing device 102 manages the buffer size of each respective audio buffer 362 included in a satellite audio output device 110 coupled to network-connected audio system 100. For example, with respect to the illustrated example of FIG. 1, audio buffer application 250 of primary computing device 102 manages the size of each respective audio buffer 362 included in satellite audio output devices 110-1 through 110-N.

As described above, audio buffer application 250 manages the size of respective audio buffers 362 included in satellite audio output devices 110 based on conditions, such as jitter and/or delays, in audio device network 112. In particular, audio buffer application 250 determines a dynamic buffer size for the audio buffers 362 included in satellite audio output devices 110 based on latencies associated with the transmission of messages from primary computing device 102 to satellite audio output devices 110 and, after determining the dynamic buffer size, distributes the dynamic buffer size to satellite audio output devices 110. For example, audio buffer application 250 determines the dynamic buffer size based on latency data 262, which includes a plurality of latency data samples received from satellite audio output devices 110.

As described above, each latency data sample is associated with a particular message that was transmitted by primary computing device 102 and received by a respective satellite audio output device 110. A latency data sample includes a time indicative of the latency, or transit delay, associated with transmission of a message from primary computing device 102 to satellite audio output device 110 over audio device network 112. As described above, in some embodiments, the latency value associated with transmission of the message from primary computing device 102 to satellite audio output device 110 is equal to the difference between the time at which primary computing device 102 transmitted the message and the time at which satellite audio output device 110 received the message. In other embodiments, as described above, the latency value associated with transmission of the message from primary computing device 102 to satellite audio output device 110 additionally includes time associated with processing the message at the satellite audio output device 110. Furthermore, each latency data sample can include an indication as to which satellite audio output device 110 (e.g., satellite audio output device 110-1, satellite audio output device 110-2, etc. ) generated the latency data sample, a time stamp indicative of a time at which the latency data sample was generated, and/or a time stamp indicative of the time at which the message associated with the latency data value was received by satellite audio output device 110.

In some embodiments, audio buffer application 250 determines the dynamic buffer size for audio buffers 362 included in satellite audio output devices 110 based on an aggregate latency value that is determined from the plurality of latency data samples. In some embodiments, audio buffer application 250 determines the dynamic audio buffer size as a function of one or more of a sampling rate at which audio packets stored in audio buffer 362 are sampled, or output, by satellite audio output devices 110, a number of audio channels in the audio content being played back by the satellite audio output devices 110, a bitdepth of the audio packets stored in audio buffer 362, and/or an aggregate latency value that is determined from the plurality of latency data samples.

For example, audio buffer application 250 can determine the dynamic buffer size of audio buffers 362 included in satellite audio output devices 110 using Equation 1 as follows:
Buffer Size = Rate*Channels*Bitdepth*Latency_agg Equation 1

where Rate is the sampling rate (e.g., 44.1 kilohertz (kHz) , 48 kHz, etc. ) at which audio packets stored in an audio buffer 362 are sampled and/or output by satellite audio output device 110, Channels is a number of audio channels included in the audio content being played back by satellite audio output devices 110 (e.g., 2 channels for stereo audio, 5.1 channels for surround sound audio, etc. ) , Bitdepth is a number of bits (e.g., 10 bits, 16 bits, 24 bits, 32 bits, etc. ) captured in each sample of the audio packets, and Latency_agg is an aggregate latency (e.g., 100 ms, 150 ms, etc. ) determined from the plurality of latency data samples.

Audio buffer application 250 can determine the number of audio channels in the audio content being played back by satellite audio output devices 110 from audio playback application 242, which manages the playback of audio in network-connected audio system 100. In addition, audio buffer application 250 can determine the sampling rate and/or bitdepth of audio content being played back by satellite audio output devices 110 based on one or more of information provided by audio playback application 242, device specifications of satellite audio output devices 110, and/or parameters of the audio content being played back by satellite audio output devices 110.

Audio buffer application 250 determines the aggregate latency value for calculating the dynamic buffer size based on latency values included in the plurality of latency data samples. In some embodiments, audio buffer application 250 determines the aggregate latency value to be the mean of the latency values included in the plurality of latency data samples. Thus, in such embodiments, audio buffer application 250 determines the aggregate latency value by dividing the sum of latency values included in the plurality of latency data samples by the number of latency data samples. However, for instances in which audio buffer application 250 simply determines that the aggregate latency value is the mean of latency values included in the plurality of latency data samples, a dynamic buffer size that is determined based on this aggregate latency value can result in a buffer size that is too small and/or too large.

For example, for instances in which a majority of the latency values of the latency data samples are relatively small (e.g., less than 10 ms) , an aggregate latency value that is determined to be the mean of the latency values could significantly differ from frequently occurring latency values in the latency data samples that are relatively large (e.g., greater than 100 ms) . Thus, with respect to this example aggregate latency value, audio buffer application 250 could determine, based on the aggregate latency value, a dynamic buffer size that is too small to account for the large frequently occurring latencies, or transit delays, of messages transmitted from primary computing device 102 to satellite audio output devices 110 over audio device network 112. Accordingly, a significant amount of audio data (e.g., greater than 1%) could be lost during playback of media content by network-connected audio system 100. Losing 1%of audio data during playback of a 90 minute movie, for example, would mean that 54 seconds of audio data is lost during playback of the movie. Such significant losses of audio data during the playback of media content is unacceptable.

As another example, for instances in which the latency data samples include outliers having relatively very large latency values (e.g., greater than 500 ms) , an aggregate latency value that is determined to be the mean of the latency values could differ significantly from relatively small latency values (e.g., less than 10 ms) that are representative of the majority of the latency values in the latency data samples. Thus, with respect to this example aggregate latency value, audio buffer application 250 could determine, based on the aggregate latency value, a dynamic buffer size that is larger than necessary. Accordingly, when audio buffers 362 included in satellite audio output devices are configured with a size that is too large, audio content played back by satellite audio output devices 110 could become unsynchronized as a result of the added processing time attributed to the large buffer size.

In some embodiments, audio buffer application 250 determines the aggregate latency value based on the mean of the latency values included in the plurality of latency data samples and a factor of the standard deviation from the mean of the latency values included in the plurality of latency data samples. In such embodiments, audio buffer application 250 determines the aggregate latency value to be the sum of the mean of the latency values and a multiple of the standard deviation from the mean. For example, if it is assumed that the mean of the latency values is 10 ms and the standard deviation from the mean is 20 ms, audio buffer application 250 determines the aggregate latency value to be the sum of 10 ms and a multiple of 20 ms. In this example, if the audio buffer application 250 applies the 2-sigma rule when determining the aggregate latency value, audio buffer application 250 determines the aggregate latency value to be 50 ms. Continuing with this example, if the audio buffer application 250 applies the 3-sigma rule when determining the aggregate latency value, audio buffer application 250 determines the aggregate latency value to be 70 ms.

FIG. 4 illustrates an example normal distribution 400 of latency values included in latency data samples. These latency data samples could be, for example, generated by satellite audio output devices 110 and used by audio buffer application 250 to determine a dynamic buffer size for audio buffers 362 included in satellite audio output devices 110. Persons skilled in the art will understand that the latency data samples included in normal distribution 400 are just one non-limiting example of sample latency data that could be generated by satellite audio output devices 110 and/or used by audio buffer application 250 to determine a dynamic buffer size for the audio buffers 362 included in satellite audio output devices 110.

As shown in the normal distribution 400, the mean latency value of the latency data samples is approximately 6.01 ms and one standard deviation from the mean latency value is approximately 20.26 ms. Furthermore, in the illustrated example of FIG. 4, the 3-sigma rule is applied to the normal distribution 400 to determine that a latency value of 66.78 ms is 3 standard deviations from the mean latency value of 6.01 ms. That is, applying the 3-sigma rule to the normal distribution 400 includes determining the latency value that deviates from the mean latency value by 3 standard deviations. In this example, it will be assumed that audio buffer application 250 determines the aggregate latency value to be 66.78 ms.

Continuing with this example, when audio buffer application 250 determines a dynamic buffer size, for example using Equation 1, based on the aggregate latency value that is 66.78 ms and distributes the determined dynamic buffer size to satellite audio output devices 110, the satellite audio output devices 110 lose approximately 1.8%of audio data during playback of media content. Losing 1.8%of audio data during playback of a 90 minute movie is equivalent to losing 102 seconds of audio throughout playback of the movie. Losing 102 seconds of audio data, as determined in this example, is unacceptable for a high-quality listening experience. In this example, the significant loss of audio data during playback of the media content can be attributed jitter and/or delays in audio device network 112 that resulted in, at times, message latencies that were significantly larger (e.g., greater than 100 ms) than the aggregate latency value used to determine the dynamic buffer size.

Accordingly, to better account for large latency values included latency data samples when determining a dynamic buffer size, in some embodiments, audio buffer application 250 determines the aggregate latency value by using a weighted average function that more heavily weights the larger latency values included in the latency data samples. In such embodiments, audio buffer application 250 sorts the latency values included in latency data samples in descending order and determines, using a weighted average function, the aggregate latency value based on the N largest latency values included in the descending-sorted list of latency values.

In some embodiments, audio buffer application 250 determines the number N of largest latency values based on one or more of the total number of latency data samples generated by satellite audio output devices 110, the number of satellite audio output devices 110 coupled to network-connected audio system 100, a detected amount of jitter and/or congestion in the audio device network 112, and/or some other parameter of network-connected audio system 100. In some embodiments, audio buffer application 250 determines N as a function of the total number D of devices (e.g., combined number of primary computing device 102 and satellite audio output devices 110) coupled to network-connected audio system 100. For example, audio buffer application 250 can determine N by multiplying the total number D of devices by a scalar value (e.g., 5, 10 20, etc. ) . In other embodiments, N can be any value, such as but not limited to 5, 10, 50, 200, 500, 1000, etc., determined by audio buffer application 250.

For example, audio buffer application 250 can determine the aggregate latency value, which is used for determining the dynamic buffer size, using Equation 2 as follows:

where N is the number of largest latency values used to calculate the aggregate latency value and W_i is the weight assigned to a respective latency value Latency_i included in the N largest latency values used to calculate the aggregate latency value. As shown, Equation 2 is a weighted average formula in which a respective weight value W_i is multiplied by, or assigned to, a corresponding latency value Latency_i. Furthermore, the sum of the respective weight values W_i is equal to 1.

In some embodiments, a respective weight value W_i is proportional to the size of the corresponding latency value Latency_i by which the weight value W_i is multiplied. For example, audio buffer application 250 assigns the largest weight value W_i to the largest latency value Latency_i and assigns the smallest weight value W_i to the smallest latency value Latency_i. In some embodiments, audio buffer application 250 randomly generates and/or randomly assigns weight values W_i to the latency values Latency_i included in the N largest latency values used to calculate the aggregate latency value. In some embodiments, audio buffer application 250 assigns a different weight value W_i to each latency value Latency_i included in the N largest latency values used to calculate the aggregate latency value. In some embodiments, audio buffer application 250 assigns the same weight value W_i to more than one latency value Latency_i.

In some embodiments, audio buffer application 250 assigns a first weight value W₁ to the largest latency value Latency₁ included in the N largest latency values used to calculate the aggregate latency value and assigns a second weight value W_n to each of the remaining latency values Latency_i included in the N largest latency values used to calculate the aggregate latency value. For example, audio buffer application 250 can determine the weight value W₁ assigned to the largest latency value Latency_max included in the N latency values using Equation 3 as follows:
W₁= (Latency_mean+a*Latency_std) /Latency_max Equation 3

where Latency_mean is the mean value of the N largest latency values used to calculate the aggregate latency value, Latency_std is the standard deviation from the mean value of the N largest latency values used to calculate the aggregate latency value, and Latency_max is the largest latency value included the N largest latency values used to calculate the aggregate latency value. In one non-limiting example, W₁ was determined to have a value of 0.636. However, persons skilled in the art will understand that 0.636 is just one non-limiting example of a value for the first weight W₁. In other non-limiting examples, the value of the first weight W₁ is less than 0.5, greater than 0.5, within a range from 0.6 to 0.7, less than 0.75, or some other value. Equation 3 also includes a variable a, which is a number of standard deviations from the mean of the N largest latency values. In some embodiments, a is determined based on size of the standard deviation Latency_std and/or based on the number N of latency values being used to determine the aggregate latency value. In some embodiments, a is chosen in accordance with one or more settings of audio buffer application 250. The value of a can be any number that is desired, such as but not limited to, 0.5, 1, 1.5, 2, 3, 4, or some other number.

Referring back to the embodiments in which audio buffer application 250 assigns a first weight value W₁ to the largest latency value Latency_max included in the N largest latency values used to calculate the aggregate latency value and assigns a second weight value W_n to each of the remaining latency values included in the N largest latency values used to calculate the aggregate latency value, audio buffer application 250 can determine the second weight value W_n assigned to the remaining latency values included in the N latency values using Equation 4 as follows:

where W₁ is the first weight value assigned to the largest latency value Latency_max included in the N largest latency values used to calculate the aggregate latency value.

FIG. 5 illustrates an example normal distribution 500 of the N largest descending-sorted latency values included in latency data samples. The latency data samples illustrated in FIG. 5 could be, for example, generated by satellite audio output devices 110 and used by audio buffer application 250 to determine a dynamic buffer size of the audio buffers 362 included in satellite audio output devices 110. Persons skilled in the art will understand that latency data samples included in normal distribution 500 are just one non-limiting example of sample latency data that could be generated by satellite audio output devices 110 and/or used by audio buffer application 250 to determine a dynamic buffer size for the audio buffers 362 included in satellite audio output devices 110.

As shown in the normal distribution 500, the mean latency value of the N largest latency values included in the latency data samples is approximately 134.32 ms and one standard deviation from the mean latency of the N largest latency values is approximately 41.72 ms. Furthermore, in the illustrated example of FIG. 5, the 2-sigma rule is applied to the normal distribution 500 to determine that a latency value of 217.76 ms is 2 standard deviations from the mean latency value of 134.32 ms. That is, applying the 2-sigma rule to the normal distribution 500 includes determining the latency value that deviates from the mean latency of the N largest latency values by 2 standard deviations.

In this example, it will be assumed that audio buffer application 250 determines the aggregate latency value, using Equations 2-4, based on the N largest latency values of the latency data samples shown in normal distribution 500. Moreover, in this example, it will be assumed that audio buffer application 250 uses a value of 2 for the variable a included in Equation 3. After determining the aggregate latency value in this example, audio buffer application 250 determines a dynamic buffer size, for example using Equation 1, based on the aggregate latency value and distributes the dynamic buffer size to satellite audio output devices 110. When satellite audio output devices 110 playback media content with audio buffers 362 configured with the dynamic buffer size determined in this example, the satellite audio output devices only lose approximately 0.04%of audio data during playback of media content. Losing 0.04%of audio data during playback of a 90-minute movie is equivalent to losing only 2 seconds of audio data throughout playback of the movie. When compared to the example of FIG. 4, in which 102 seconds of audio data was lost during playback of a 90-minute movie, losing only 2 seconds of audio data is a significant improvement and is acceptable for a high-quality listening experience. Accordingly, when a dynamic buffer size is determined based on an aggregate latency value that was determined by more heavily weighting larger latency values included in latency data samples generated by satellite audio output devices 110, satellite audio output devices 110 operating with audio buffers 362 of the dynamic buffer size can synchronously playback audio content without losing significant amounts of audio data.

During operation of network-connected audio system 100, conditions in audio device network 112 can change with time. Thus, a first amount of jitter and/or delays in audio device network 112 at a first point in time can be different than a second amount of jitter and/or delays in audio device network 112 at a second point in time. Therefore, when audio buffer application 250 determines the dynamic buffer size based on latency data samples generated by satellite audio output devices 110, older latency data samples generated by satellite audio output devices 110 (e.g., latency data samples generated 5 or more minutes ago) might no longer be relevant to current conditions of audio device network 112. Accordingly, when determining an aggregate latency value and/or the dynamic buffer size for audio buffers 362 included in satellite audio output devices 110, audio buffer application 250 does not use latency data samples that are older than a time threshold (e.g., 30 seconds, 1 minutes, 5 minutes, 10 minutes, etc. ) . In some embodiments, audio buffer application 250 also discards latency data samples that are older than the time threshold. For example, audio buffer application 250 deletes latency data samples that are older than the time threshold from latency data 262 stored in database (s) 260.

As an example, if there are currently small amounts of jitter and/or delays in audio device network 112, latency data samples that were generated by satellite audio output devices 110 at a past time (e.g., 5 minutes or more in the past) at which there was a large amount of jitter in audio device network 112 are no longer reflective of the current condition of audio device network 112, and thus, should not be used by audio buffer application 250 to determine an aggregate latency value and/or the dynamic buffer size. Accordingly, in this example , when determining an aggregate latency value and/or the dynamic buffer size for audio buffers 362 included in satellite audio output devices 110, audio buffer application 250 discards and does not use latency data samples that are older than the time threshold (e.g., 5 minutes) to avoid determining a dynamic buffer size that is suited to the previous (e.g., 5 minutes prior) congested condition of audio device network 112.

Furthermore, as the condition of audio device network 112 can change over time, audio buffer application 250 frequently updates, or determines a new value of, the dynamic buffer size for audio buffers 362 included in satellite audio output devices 110. In some embodiments, audio buffer application 250 updates the dynamic buffer size on a periodic basis (e.g., every 30 seconds, every minute, every 5 minutes, every half hour, every hour, etc. ) . In some embodiments, audio buffer application 250 updates the dynamic buffer size on an ad-hoc basis. In some embodiments, audio buffer application 250 updates the dynamic buffer size in response to a trigger event. For example, audio buffer application 250 can update the dynamic buffer size in response to detecting that an amount of traffic in audio device network 112 exceeds a threshold, in response to detecting that an amount of traffic in audio device network 112 has decreased below a threshold, in response to detecting that an amount of time taken for a satellite audio output device 110 to respond to primary computing device 102 exceeds a threshold, and/or in response to some other trigger event. When audio buffer application 250 updates, or determines a new, dynamic buffer size, audio buffer application 250 uses the most recent latency data samples generated by and received from satellite audio output devices 110.

FIG. 6 is a block diagram of audio buffer application 250, according to various embodiments of the present invention. As shown, audio buffer application 250 maintains a list of descending-sorted latency data samples 602. Each entry, or latency data sample, in the list of descending-sorted latency data samples 602 includes a respective entry number, a latency value, and a time stamp. In some embodiments, the time stamp is indicative of the time at which the respective latency data sample was generated by a satellite audio output device 110. In other embodiments, the time stamp is indicative of the time at which the respective latency data sample was received and/or processed by audio buffer application 250.

As described above, audio buffer application 250 receives latency data samples from satellite audio output devices 110 and stores the received latency data samples in database (s) 260 as latency data 262. In some embodiments, audio buffer application 250 timestamps a respective latency data sample upon receiving the latency data sample. When the latency data samples are stored in database (s) 260, the latency data samples might be stored in a random order that is not sorted based on the size of latency values included in the latency data samples. Accordingly, when audio buffer application 250 determines an aggregate latency value and/or a dynamic buffer size, audio buffer application 250 retrieves the unsorted latency data samples from database (s) 260 and sorts the latency data samples in descending order according to the size of the latency values included in the latency data samples. As shown in the example of FIG. 6, the first entry in the list of descending-sorted latency data samples 602 has a latency value of 375 ms, which is the largest latency value included in the latency data samples. Further, each entry in the list of descending-sorted latency data samples 602 after the first entry has a smaller latency value.

In some embodiments, audio buffer application 250 continuously updates the list of descending-sorted latency data samples 602. In some embodiments, audio buffer application 250 updates the list of descending-sorted latency data samples 602 on a periodic basis (e.g., every 30 seconds, every minute, every 5 minutes, every half hour, every hour, etc. ) . In some embodiments, audio buffer application 250 updates the list of descending-sorted latency data samples 602 on an ad-hoc basis. In some embodiments, audio buffer application 250 updates the list of descending-sorted latency data samples 602 when audio buffer application 250 updates the dynamic buffer size. In some embodiments, audio buffer application 250 updates the list of descending-sorted latency data samples 602 when audio buffer application 250 receive one or more new latency data samples from satellite audio output devices 110. In some embodiments, audio buffer application 250 updates the list of descending-sorted latency data samples 602 in response to a trigger event. For example, audio buffer application 250 can update the list of descending-sorted latency data samples 602 in response to detecting that an amount of traffic in audio device network 112 exceeds a threshold, in response to detecting that an amount of traffic in audio device network 112 has decreased below a threshold, in response to detecting that an amount of time taken for a satellite audio output device 110 to respond to primary computing device 102 exceeds a threshold, and/or in response to some other trigger event.

Updating the list of descending-sorted latency data samples 602 includes adding new latency data samples (e.g., new latency data samples received from satellite audio output devices 110) to and removing old latency data samples (e.g., latency data samples older than a time threshold) from the list of descending-sorted latency data samples 602. For example, audio buffer application 250 adds a new latency data sample to the list of descending-sorted latency data samples 602 as the latency data sample is received from a satellite audio output device 110 and/or after storing the latency data sample in database (s) 260.

As shown in FIG. 6, audio buffer application 250 keeps track of the current time 610, which is determined in accordance with the RTC of processing unit (s) 202 and maintains a time stamp threshold 612. In the illustrated example, the time stamp threshold 612 is 5 minutes. However, persons skilled in the art will understand that 5 minutes is just one non-limiting example of a value of the time stamp threshold 612. When a difference between the time stamp of a latency data sample and the current time 610 exceeds the time stamp threshold 612, audio buffer application 250 removes the latency data sample from the list of descending-sorted latency data samples 602 and/or deletes the latency data sample from database (s) 260.

Updating the list of descending-sorted latency data samples 602 further includes determining and updating a mean 620 of the N largest latency values included in the list of descending-sorted latency data samples 602 and a standard deviation 622 from the mean of the N largest latency values included in the list of descending-sorted latency data samples 602. The N largest latency values in the list of descending-sorted latency data samples 602 are the respective latency values included in the first through Nth entries in the list of descending-sorted latency data samples 602. Audio buffer application 250 determines the mean 620 of the N largest latency values included in the list of descending-sorted latency data samples 602 and then determines the standard deviation 622 from the mean 620 of the N largest latency values included in the list of descending-sorted latency data samples 602. As shown, in the illustrated example, the mean 620 of the N largest latency values is 134.32 ms and a standard deviation 622 from the mean 620 of the N largest latency values is 41.72 ms. These example values are consistent with the example values described above with respect to FIG. 5. However, persons skilled in the art will understand that 134.32 ms is just one non-limiting example of the mean 620 of the N largest latency values included in the list of descending-sorted latency data samples 602. Similarly, persons skilled in the art will understand that 41.72 ms is just one non-limiting example of the standard deviation 622 from the mean 620 of the N largest latency values included in the list of descending-sorted latency data samples 602.

As shown in FIG. 6, audio buffer application 250 further determines and maintains the aggregate latency value 630 used for determining the dynamic buffer size. For example, audio buffer application 250 determines the aggregate latency value 630 using Equations 2-4 described herein. In the illustrated example of FIG. 6, the aggregate latency value is approximately 200 ms. In some embodiments, audio buffer application 250 determines a new, or updates, aggregate latency value 630 each time the list of descending-sorted latency data samples 602 is updated. The dashed box in FIG. 6 indicates which of the latency data samples included in the list of descending-sorted latency data samples 602 are used by audio buffer application 250 to determine the aggregate latency value 630 and/or a new dynamic buffer size. In the illustrated example, the first N entries in the list of descending-sorted data samples are contained in the dashed box.

FIG. 7 is a flow chart of method steps for determining a buffer size in a network-connected audio system, according to one or more aspects of the various embodiments. Although the method steps are described with respect to the systems and examples of FIGs. 1-6, persons skilled in the art will understand that any system configured to perform the method steps, in any order, falls within the scope of the various embodiments.

As shown, a method 700 begins at step 702, where an audio buffer application 250 executing on processing unit (s) 202 of a primary computing device 102 requests satellite audio output devices 110 to provide the audio buffer application 250 with latency data. The latency data includes, for example, latency data samples generated by the satellite audio output devices 110. As described above, each latency data sample includes a latency value that is associated with a message transmitted by primary computing device 102 and received by a satellite audio output device 110. The latency value is, for example, the time difference between the time at which the message was transmitted by primary computing device 102 and the time at which the message was received by the satellite audio output device 110. In some examples, the latency value additionally includes time associated with the time taken by satellite audio output device 110 to process the message. In some examples, each latency data sample also includes a time stamp indicative of the time at which the respective latency data sample was generated and/or a time stamp associated with the time at which the message was received by satellite audio output device 110.

At step 704, audio buffer application 250 receives one or more responses including latency data from satellite audio output devices 110. In some examples, audio buffer application 250 timestamps the responses with respective times at which the responses were received by audio buffer application 250. At step 706, audio buffer application 250 sorts, in descending order, the latency values included in the latency data received at step 704. For example, as described above, each latency data sample included in the latency data received at step 704 includes a respective latency value. Thus, as described above with respect to FIG. 6, audio buffer application 250 sorts the latency data samples in descending order based on the respective sizes of the latency values included in the latency data samples.

At step 708, audio buffer application 250 selects the N largest latency values included in the latency values that were sorted in descending order at step 706. In some examples, selecting the N largest latency values includes determining a mean of the N largest latency values and selecting the latency values included in the N largest latency values that are within a threshold number of standard deviations from the mean of the N largest latency values.

At step 710, audio buffer application 250 determines an aggregate latency value based on the N largest latency values selected at step 708. In some examples, determining the aggregate latency value includes applying a weighted average formula to the N largest latency values selected at step 708, where the larger of the latency values included in the selected N largest latency values are weighted more heavily than the smaller of the latency values included in the selected N largest latency values. For example, audio buffer application 250 can use Equations 2-4 to determine the aggregate latency value based on the N largest latency values selected at step 708.

At step 712, audio buffer application 250 determines a buffer size, such as the dynamic buffer size, based on the aggregate latency value determined at step 710. For example, audio buffer application 250 determines the buffer size using Equation 1 and/or some other technique described herein. At step 714, audio buffer application 250 distributes the buffer size to satellite audio output devices 110. In some examples, distributing the buffer size includes transmitting, via audio device network 112, one or more messages including the buffer size to satellite audio output devices 110. In some examples, the one or more messages including the buffer size further include instructions to each satellite audio output device 110 to configure the size of its audio buffer 362 to be the buffer size.

At step 716, audio buffer application 250 discards outdated latency. In some examples, discarding outdated latency data includes deleting latency data when a time stamp associated with the latency data, such as a time stamp applied by audio buffer application 250 to the latency data that is indicative of the time at which audio buffer application 250 received the latency data, is older than a time threshold. In some examples, discarding outdated latency data includes deleting latency data when the time at which the latency data was generated by a satellite audio output device 110 exceeds a time threshold.

The method 700 then returns to step 702 at which audio buffer application 250 requests satellite audio output devices 110 to provide audio buffer application 250 with new, or updated, latency data. In this manner, the method 700 dynamically determines the buffer size of audio output devices playing back audio in a network-connected audio system.

FIG. 8 is a flow chart of method steps for adjusting a buffer size in a satellite audio output device, according to one or more aspects of the various embodiments. Although the method steps are described with respect to the systems and examples of FIGs. 1-6, persons skilled in the art will understand that any system configured to perform the method steps, in any order, falls within the scope of the various embodiments.

As shown, a method 800 begins at step 802, where a latency application 352 executing on processing unit (s) 302 of a satellite audio output device 110 receives a timestamped message from primary computing device 102. In some examples, the timestamped message includes a time stamp indicative of the time at which primary computing device 102 transmitted the message. In some examples, the timestamped message further includes audio content, such as one or more audio packets, to be played back by satellite audio output device 110.

At step 804, latency application 352 determines a latency, or transit delay, of the timestamped message received at step 802 and stores the latency of the timestamped message as a latency data sample. In some examples, determining the latency of the timestamped message received at step 802 includes determining a time difference between a time at which the timestamped message was transmitted by primary computing device 102 and a time at which latency application 352 received the timestamped message. In some examples, determining the latency of the timestamped message further includes adding an amount of time taken to process, by satellite audio output device 110, the timestamped message to the difference between the time at which the timestamped message was transmitted by primary computing device 102 and the time at which the timestamped message was received by latency application 352. In some examples, storing the latency of the timestamped message as a latency data sample includes generating a latency data sample including the latency of the timestamped message and storing the latency data sample in system memory 332 or some other system storage of the satellite audio output device 110. In some examples, latency application 352 timestamps the latency data sample with a time at which the timestamped message was received by latency application 352 and/or with a time at which the latency data sample was generated by latency application 352.

At step 806, latency application 352 receives a request for latency data from primary computing device 102. For example, the request for latency data includes a request for latency data samples, such as the latency data sample generated at step 804, stored in system memory 332 and/or some other system storage of satellite audio output device 110.

At step 808, in response to receiving the request for latency data from primary computing device 102, latency application 352 transmits one or more latency data samples stored in system memory 332 and/or some other system storage to primary computing device 102. In some examples, latency application 352 transmits all of the latency data samples stored in system memory 332 and/or some other system storage to primary computing device 102. In other examples, latency application 352 transmits only some of the stored latency data samples.

At step 810, latency application 352 deletes the stored latency data samples from system memory 332 and/or some other system storage after transmitting the latency data samples to primary computing device 102 at step 808.

At step 812, satellite audio output device 110 receives a message including a buffer size from primary computing device 102. For example, audio buffer application 360 executing on processing unit (s) 302 of satellite audio output device 110 receives a message including a buffer size from primary computing device 102. In some examples, the message further includes an instruction to configure the size of audio buffer 362 of satellite audio output device 110 based on the buffer size included in the message.

At step 814, in response to receiving the message including the buffer size from primary computing device 102, audio buffer application 360 adjusts the size of audio buffer 362 based on the buffer size included in the message received from primary computing device 102. In some examples, adjusting the size of audio buffer 362 based on the buffer size included in the message includes setting, or configuring, the size of audio buffer 362 to be the buffer size included in the message received from primary computing device 102.

The method 800 then returns to step 802 at which latency application 352 receives another timestamped message from primary computing device 102. In this manner, the method 800 dynamically adjusts the size of a buffer included in an audio output device based on latency data generated by one or more audio output devices in a network-connected audio system.

In sum, a primary computing device and multiple satellite audio output devices are coupled together to form a network-connected audio system. The primary computing device requests the multiple satellite audio output devices to provide latency data. Latency data includes latency data samples generated by the multiple satellite audio output devices. Each latency data sample includes a respective latency value that is indicative of a time difference between the time at which the primary computing device transmits a message and the time at which a satellite audio output device receives the message. The disclosed techniques determine an aggregate latency value based on the latency data samples received from the multiple satellite audio output devices. In some examples, the primary computing device determines the aggregate latency value using a weighted average function that more heavily weights the larger latency values included in the latency data samples. The disclosed techniques further include determining, based on the aggregate latency value, a buffer size of the audio buffers included in the multiple satellite audio output devices. The disclosed techniques further include distributing the buffer size to the multiple satellite audio output devices such that the multiple satellite audio output devices configured the size of their respective audio buffers based on the distributed buffer size. As network conditions of the network-connected audio system change over time, the primary computing device determines updated values of the buffer size and distributes the updated values of the buffer size to the multiple audio output devices.

1. In some embodiments, a computer-implemented for determining a buffer size in a network-connected audio system comprises requesting one or more audio output devices to provide latency data; receiving, from the one or more audio output devices, a plurality of latency data samples, where each latency data sample in the plurality of latency data samples includes a latency value; and selecting largest latency values from the latency values included in the plurality of latency data samples; determining an aggregate latency value based on the largest latency values; determining the buffer size based on the aggregate latency value; and distributing the buffer size to the one or more audio output devices, wherein the buffer size is usable by the one or more audio output devices to configure a respective buffer.

2. The computer-implemented method of clause 1, wherein each latency value is indicative of a difference between a time at which a message was transmitted by a computing device and a time at which the message was received by an audio output device.

3. The computer-implemented method of clauses 1 or 2, wherein selecting the largest latency values from the latency values included in the plurality of latency data samples further comprises sorting the latency values in descending order; and selecting a number of largest latency values from the descending-sorted latency values.

4. The computer-implemented method of any of claims 1-3, wherein the number of largest latency values is based on a number of audio output devices coupled to the network-connected audio system.

5. The computer-implemented of any of clauses 1-4, wherein determining the aggregate latency value based on the largest latency values further comprises determining a weighted average of the largest latency values; and determining the aggregate latency value to be the weighted average of the largest latency values.

6. The computer-implemented method of any of claims 1-5, wherein determining the weighted average of the largest latency values further comprises assigning a first weight value to a largest latency value included in the largest latency values; and assigning a second weight value to other latency values included in the largest latency values, the second weight value being less than the first weight value.

7. The computer-implemented method of any of claims 1-6, further comprising determining a mean value of the largest latency values; determining a value of a standard deviation of the largest latency values; and determining the first weight value based on the mean value, the value of the standard deviation from the mean value, and the largest latency value included in the largest latency values.

8. The computer-implemented of any of clauses 1-7, wherein a first latency data sample included in the plurality of latency data samples includes a time stamp indicative of a time at which the first latency data sample was generated.

9. The computer-implemented method of any of claims 1-8, further comprising in response to determining that the time at which the first latency data sample was generated is older than a threshold, discarding the first latency data sample.

10. The computer-implemented of any of clauses 1-9, wherein a latency value includes an amount of time taken by an audio output device to process a message that was transmitted by a computing device.

11. In some embodiments, one or more non-transitory computer-readable storage media including instructions that, when executed by one or more processors at a first computing device, cause the one or more processors to perform steps of requesting one or more audio output devices to provide latency data; receiving, from the one or more audio output devices, a plurality of latency data samples, where each latency data sample in the plurality of latency data samples includes a latency value; selecting largest latency values from the latency values included in the plurality of latency data samples; determining an aggregate latency value based on the largest latency values; determining a buffer size based on the aggregate latency value; and distributing the buffer size to the one or more audio output devices, wherein the buffer size is usable by the one or more audio output devices to configure a respective buffer.

12. The one or more non-transitory computer-readable media of clause 11, wherein each latency value is indicative of a difference between a time at which a message was transmitted by a computing device and a time at which the message was received by an audio output device.

13. The one or more non-transitory computer-readable media of clauses 11 or 12, wherein the step of selecting the largest latency values from the latency values included in the plurality of latency data samples further comprises sorting the latency values in descending order; and selecting a number of largest latency values from the descending-sorted latency values.

14. The one or more non-transitory computer-readable media of any of clauses 11-13, wherein the number of largest latency values is based on a number of audio output devices coupled to the first computing device.

15. The one or more non-transitory computer-readable media of any of clauses 11-14, wherein the step of determining the aggregate latency value based on the largest latency values further comprises determining a weighted average of the largest latency values; and determining the aggregate latency value to be the weighted average of the largest latency values.

16. The one or more non-transitory computer-readable media of any of clauses 11-15, wherein the step of determining the weighted average of the largest latency values further comprises assigning a first weight value to a largest latency value included in the largest latency values; and assigning a second weight value to other latency values included in the largest latency values, the second weight value being less than the first weight value.

17. The one or more non-transitory computer-readable media of any of clauses 11-16, wherein the steps further comprise determining a mean value of the largest latency values; determining a value of a standard deviation of the largest latency values; and determining the first weight value based on the mean value, the value of the standard deviation from the mean value, and the largest latency value included in the largest latency values.

18. The one or more non-transitory computer-readable media of any of clauses 11-17, wherein a first data sample included in the plurality of latency data samples includes a time stamp indicative of a time at which the first data sample was generated; and the steps further comprise in response to determining that the time at which the first data sample was generated is older than a threshold, discarding the first data sample.

19. In some embodiments, a computing device comprises a memory storing an application; and one or more processors that, when executing the application, are configured to request one or more audio output devices to provide latency data; receive, from the one or more audio output devices, a plurality of latency data samples, where each latency data sample in the plurality of latency data samples includes a latency value; select largest latency values from the latency values included in the plurality of latency data samples; determine an aggregate latency value based on the largest latency values; determine a buffer size based on the aggregate latency value; and distribute the buffer size to the one or more audio output devices, wherein the buffer size is usable by the one or more audio output devices to configure a respective buffer.

20. The computing device of clause 19, wherein each latency value is indicative of a difference between a time at which a message was transmitted by a computing device and a time at which the message was received by an audio output device.

Any and all combinations of any of the claim elements recited in any of the claims and/or any elements described in this application, in any fashion, fall within the contemplated scope of the present disclosure and protection.

The descriptions of the various embodiments have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Aspects of the present embodiments may be embodied as a system, method or computer program product. Accordingly, aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc. ) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “module, ” a “system, ” or a “computer. ” In addition, any hardware and/or software technique, process, function, component, engine, module, or system described in the present disclosure may be implemented as a circuit or set of circuits. Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable medium (s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium (s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , an optical fiber, a portable compact disc read-only memory (CD-ROM) , an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine. The instructions, when executed via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such processors may be, without limitation, general purpose processors, special-purpose processors, application-specific processors, or field-programmable gate arrays.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function (s) . It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

While the preceding is directed to embodiments of the present disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims

A computer-implemented method for determining a buffer size in a network-connected audio system, the method comprising:

requesting one or more audio output devices to provide latency data;

receiving, from the one or more audio output devices, a plurality of latency data samples, where each latency data sample in the plurality of latency data samples includes a latency value;

selecting largest latency values from the latency values included in the plurality of latency data samples;

determining an aggregate latency value based on the largest latency values;

determining the buffer size based on the aggregate latency value; and

distributing the buffer size to the one or more audio output devices, wherein the buffer size is usable by the one or more audio output devices to configure a respective buffer.
The computer-implemented method of claim 1, wherein each latency value is indicative of a difference between a time at which a message was transmitted by a computing device and a time at which the message was received by an audio output device.
The computer-implemented method of claim 1, wherein selecting the largest latency values from the latency values included in the plurality of latency data samples further comprises:

sorting the latency values in descending order; and

selecting a number of largest latency values from the descending-sorted latency values.
The computer-implemented method of claim 3, wherein the number of largest latency values is based on a number of audio output devices coupled to the network-connected audio system.
The computer-implemented method of claim 1, wherein determining the aggregate latency value based on the largest latency values further comprises:

determining a weighted average of the largest latency values; and

determining the aggregate latency value to be the weighted average of the largest latency values.
The computer-implemented method of claim 5, wherein determining the weighted average of the largest latency values further comprises:

assigning a first weight value to a largest latency value included in the largest latency values; and

assigning a second weight value to other latency values included in the largest latency values, the second weight value being less than the first weight value.
The computer-implemented method of claim 6, further comprising:

determining a mean value of the largest latency values;

determining a value of a standard deviation of the largest latency values; and

determining the first weight value based on the mean value, the value of the standard deviation from the mean value, and the largest latency value included in the largest latency values.
The computer-implemented method of claim 1, wherein a first latency data sample included in the plurality of latency data samples includes a time stamp indicative of a time at which the first latency data sample was generated.
The computer-implemented method of claim 8, further comprising in response to determining that the time at which the first latency data sample was generated is older than a threshold, discarding the first latency data sample.
The computer-implemented method of claim 1, wherein a latency value includes an amount of time taken by an audio output device to process a message that was transmitted by a computing device.
One or more non-transitory computer-readable storage media including instructions that, when executed by one or more processors at a first computing device, cause the one or more processors to perform steps of:

requesting one or more audio output devices to provide latency data;

receiving, from the one or more audio output devices, a plurality of latency data samples, where each latency data sample in the plurality of latency data samples includes a latency value;

selecting largest latency values from the latency values included in the plurality of latency data samples;

determining an aggregate latency value based on the largest latency values;

determining a buffer size based on the aggregate latency value; and

distributing the buffer size to the one or more audio output devices, wherein the buffer size is usable by the one or more audio output devices to configure a respective buffer.
The one or more non-transitory computer-readable storage media of claim 11, wherein each latency value is indicative of a difference between a time at which a message was transmitted by a computing device and a time at which the message was received by an audio output device.
The one or more non-transitory computer-readable storage media of claim 11, wherein the step of selecting the largest latency values from the latency values included in the plurality of latency data samples further comprises:

sorting the latency values in descending order; and

selecting a number of largest latency values from the descending-sorted latency values.
The one or more non-transitory computer-readable storage media of claim 13, wherein the number of largest latency values is based on a number of audio output devices coupled to the first computing device.
The one or more non-transitory computer-readable storage media of claim 11, wherein the step of determining the aggregate latency value based on the largest latency values further comprises:

determining a weighted average of the largest latency values; and

determining the aggregate latency value to be the weighted average of the largest latency values.
The one or more non-transitory computer-readable storage media of claim 15, wherein the step of determining the weighted average of the largest latency values further comprises:

assigning a first weight value to a largest latency value included in the largest latency values; and

assigning a second weight value to other latency values included in the largest latency values, the second weight value being less than the first weight value.
The one or more non-transitory computer-readable storage media of claim 16, wherein the steps further comprise:

determining a mean value of the largest latency values;

determining a value of a standard deviation of the largest latency values; and

determining the first weight value based on the mean value, the value of the standard deviation from the mean value, and the largest latency value included in the largest latency values.
The one or more non-transitory computer-readable storage media of claim 11, wherein:

a first data sample included in the plurality of latency data samples includes a time stamp indicative of a time at which the first data sample was generated; and

the steps further comprise in response to determining that the time at which the first data sample was generated is older than a threshold, discarding the first data sample.
A computing device comprising:

a memory storing an application; and

one or more processors that, when executing the application, are configured to:

request one or more audio output devices to provide latency data;

receive, from the one or more audio output devices, a plurality of latency data samples, where each latency data sample in the plurality of latency data samples includes a latency value;

select largest latency values from the latency values included in the plurality of latency data samples;

determine an aggregate latency value based on the largest latency values;

determine a buffer size based on the aggregate latency value; and

distribute the buffer size to the one or more audio output devices, wherein the buffer size is usable by the one or more audio output devices to configure a respective buffer.
The computing device of claim 19, wherein each latency value is indicative of a difference between a time at which a message was transmitted by a computing device and a time at which the message was received by an audio output device.