CA2311095A1

CA2311095A1 - Dynamic traffic shaping of a multi-channel audio stream over the internet

Info

Publication number: CA2311095A1
Application number: CA002311095A
Authority: CA
Inventors: Aoxiang Xu; Yixia Si; Stephane Doutriaux
Original assignee: MEDIASTACK AUDIONET Inc
Current assignee: MEDIASTACK AUDIONET Inc
Priority date: 2000-06-09
Filing date: 2000-06-09
Publication date: 2001-12-09

Description

DYNAMIC TRAFFIC SHAPING OF A
MULTI-CHANNEL AUDIO STREAM OVER THE INTERNET
Field of the invention The present invention concerns a method and system for dynamic traffic shaping of a multi-channel audio stream and a packet based network.
Background of the invention Streaming digital music over a packet-based network such as the Internet is available today through solutions such as ReaINetwork's ReaIPlayer~ and Microsoft's Windows Media Player. Unfortunately, the quality of current music formats on the Internet is relatively low. For example, the popular MP3 format uses a rather a low-end compression scheme, inferior to CD-quality. Furthermore, none of the current Internet formats support multi-channel audio, which will play an important part in the future of audio.
The limiting factor in delivering high quality audio over the Internet is typically the available bandwidth of the underlying networks. This limit is especially felt by users whose last-mile connection to the Internet is a slow telephone modem. The recent emergence of residential broadband technologies such as DSL and cable modems provides an opportunity to implement systems for streaming high quality, high bandwidth, multi-channel audio over the Internet. However, even though transfer speeds are increased considerably, data congestion still occurs since the broadband connections are shared by many customers using multiple transfer methods such as web surfing and file downloading, to access enriched content.
The present invention concerns a real time, dynamic traffic-shaping method based on the concept of layered coding congestion control. In a preferred embodiment, the method is tailored for delivering audio encoded in Dolby Digital~ 5.1 format, the

2 most popular, high quality, multi-channel audio format, as used in DVDs. This method optimizes transmission over "best effort" packet networks such as the Internet.
DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION
ABBREVIATIONS
For the purpose of the present document, the following abbreviations are adopted:
~ AOD -- Audio On Demand ~ kbps -- Kilo-bits per second ~ Mbps -- Mega-bits per second ~ REID --~ RTT -- Round Trip Time WHY DOLBY DIGITAL~?
The Dolby Digital~ 5.1 audio coding scheme (also known as AC-3) was developed by Dolby Laboratories to provide high-quality, multi-channel digital audio to the consumer market. The AC-3 algorithm provides audio in 5.1 channels (left, center, right, left and right surround, and a 1/10 bandwidth subwoofer), and is used in a large majority of home-theatre setups. By applying psycho-acoustic digital signal-processing algorithms, Dolby Digital~ is able to achieve excellent compression ratio while still preserving almost all of the important features of the original audio stream. In most current applications, the AC-3 coding scheme compresses a PCM
representation requiring more than SMbps (6 channels x 48hHz x 18bit) into a kbps bit stream. Due to its excellent sound quality and multi-channels nature, it has been adopted by high definition TV (HDTV) and digital versatile disk (DVD) for their audio systems. This audio format is highly superior to anything currently available on the Internet.

3 Recently, another multi-channel audio format known as AAC (MPEG-II Advanced Audio Coding) has captured great attention within the circle of professorial audio engineers. AAC provides greater sound fidelity than Dolby Digital~ with even better compression rate. Although preliminary contact with some influential members of the Audio Engineering Society suggests that widespread adoption of AAC is several years away, the present invention can be adapted to support AAC.
Dolby statistics for May 2000 show that there have been over 45,389,120 products incorporating Dolby Digital~ sold. Due to the widespread acceptance of Dolby Digital~, it will likely remain the leading multi-channel audio format for the consumer market, for years to come.
LAYERED CODING BASED CONGESTION CONTROL
This section presents an introduction to layered coding congestion control which is an aspect of the method for a dynamic traffic-shaping algorithm of the invention.
When the traffic on the Internet exceeds the available bandwidth, there are buffer overflows on the intermediate routers. Most modern routers drop packets from the tail of the queue when a buffer overflows happen. Congestion control is the algorithm that reduces the traffic according to the congestion of the network.
By reducing the traffic, congestion can be quickly relieved. More importantly, all user traffic can share the Internet more fairly.
The classical congestion control algorithms are the ones deployed in TCP. The idea is that the TCP sender will back-off and slow down its rate of sending packets in case of network congestion, which is indicated by the packet loss. Although being a classical algorithm, this kind of congestion control is not suitable for streaming of audio data since it will cause discontinuity of the music stream, therefore degrading the audio experience tremendously. WinAmp and many other popular MP-3 players support streaming over the Internet. However, it is frequent

4 for users of these systems to experience music breaks when the network is busy, since they use TCP as their transmission protocol. To counter unpleasant breaks in the music, many systems impose a buffering delay at the start of the transmission.
Layered coding, also known as hierarchical coding or embedded coding, was first developed for packet audio transmission. In layered coding, a signal is separated into subsignals of various importance in order for them to be coded and transmitted separately. In fact, the data format of packetized voice, as specified by ITU-T 6.764 is arranged in layers. This technique was also extended to video coding and transmission. In layered coding, network congestion will always first affect the subsignals of low importance. Thus, hierarchical coding offers a way of achieving error control by preventing loss of perceptually important information.
The most important idea of layered coding is that by reducing sampling resolution, it is able to reduce the bandwidth usage of the media stream, while keeping the frame rate constant during congestion. It is significantly different from the congestion control mechanism used in TCP, in which transmission rate (i.e.
frame rate for audio and video) is reduced during periods of congestion. Layered coding relieves nefinrork congestion by reducing the quality of the transmitted media, while preserving the frame rate at which it is sent. Therefore, by applying layered coding, users are able to receive a reasonable, continuous music stream while sharing the bandwidth efficiently in case of network congestion.
Although many applications have implemented layered coding congestion control, the exact implementation is dependent on many factors such as: type of data (audio or video), timing constraints, bandwidth constraints and quality constraints.
Among them, the most important factor is the exact type of media that is transmitted. For example, the present invention is sufficiently flexible to apply its dynamic traffic shaping to AC-3, AAC and MP-3. Although the concept is the same, the implementation in each case is significantly different from the others.

As mentioned earlier, the default rate of AC-3 as coded on DVDs is 384Kbps.
However, it is possible to generate AC-3 with either higher or lower bit rates in order to obtain different audio quality. For example, the Dolby Encoder DP569 is

5 able to generate an AC-3 stream at 640kps, 320kps and 256kps, 128kps. This forms one of the foundations of the present invention.
I
Server I
j Client I
Music Transmission I Transmissbn Storage Engine Engrt~e Bit Rate I I
Control Info I I
I I I
I I
Traffic Shaping I
_1_________~
Engine Lo3 Rate, I
Jitter, RTT j Figure-1 Figure-1 demonstrates the block diagram of a client-server based music delivery system that implements the dynamic traffic-shaping system and method of the present invention. There are three important components in the server: music storage, traffic-shaping engine and packet-transmission engine. The music storage is a large REID device that stores all the music tracks. All the music tracks are preferably encoded in a proprietary nac3 (net AC-3) format, but other formats are equally applicable. The traffic-shaping engine is the software module that implements the core algorithm. It has a well-defined interface to exchange information with other modules. In order to make a decision, it takes standard parameters that describe the network performance: number of users connected, packet loss rate, RTT and average arriving fitter at the receiving end. To achieve the best result, all three parameters are required, although it is possible to operate with only the packet loss rate.

6 The packet-transmission engine is a software module that is responsible for transmitting the audio data to the end user. The traffic-shaping engine is completely independent from the packet-transmission engine. This allows the method of the present invention to be easily integrated into existing software. In section 5, a sample implementation that integrates MediaStack's algorithm into the RealAudio~ system is presented.
The traffic-shaping engine evaluates the network congestion based on the parameters mentioned in last section. If there is no congestion, it tells the packet-transmission engine to transmit the AC-3 stream with the highest quality from the music track. In case of congestion, the algorithm computes the proper bit rate of the AC-3 stream that should be transmitted in order to achieve the best throughput for all the connected users, and communicates this value to the transmission engine. The transmission engine will dynamically switch to transmit the AC-3 data with the new bit rate. Once the network starts to recover, the traffic shaping will start to increase the bit rate of the AC-3 streams. Of course, there are a variety of different strategies available, although in a preferred embodiment, a nature format preferably is used.

The AC-3 data file format is not optimized for streaming over Internet, as it was originally designed for local fixed media storage like DVD. For transfer on digital audio links, AC-3 data is embedded within a transmission protocol. For example, AC-3 is embedded in AES/EBU for communicating between professional audio equipment, and it is embedded in S/P-DIF for the consumer products such as DVD
players and home amplifiers. Most of the music tracks in the AC-3 format contain extra information that is added to facilitate the embedding in either AC-3 or S/PDIF, increasing the file size unnecessarily. Transmitting such tracks over a network would therefore be very inefficient.

7 In order to effectively stream AC-3 audio over the Internet, a network ready file format (.nac3) that is customized for dynamic traffic shaping has been developed. A nac-3 file is encoded with several AC-3 streams of the same music track with different bit rates. A file header describes the exact composition of individual blocks. The different AC-3 streams are encoded in such a way as to facilitate the dynamic switching between different streams in real time, as required during network transmissions.
SAMPLE IMPLEMENTATION
The modular design of the traffic-shaping engine of the present invention allows it to be integrated into many different audio products. To date, a proprietary transmission system has been developed, but has also implemented its traffic-shaping algorithm for AC-3 within a set of Real Network's ReaISystem~ G2 plugins. This algorithm could also be integrated into Microsoft Media Player.

Real Network's ReaISystem~ is based on COM binary standard. It provides a media streaming platform that allows custom-developed server and client plugins for streaming new audio file formats.
In order to deliver AC-3 data from the ReaIServer~ to the ReaIPlayer~, a special AC-3 file format and rendering plugins has been implemented. As shown in Figure-2, the AC-3 file format plugin is used by the ReaIServer~ to convert AC-frames into a stream of Real Media packets. The AC-3 rendering plugin is used by the ReaIPlayer~ to repack the received Real Media packets into AC-3 frames.

oca i Systan Plugin Streaming Packet Real SeNer w - ...... ~~ ."Real Player AC3 frames A~3 File - en erng Format Plugh Plugin Figure-2 ADAPTIVE BIT RATE STREAMING
When streaming AC-3 data through ReaISystem~ Platform, adaptive bit rate is achieved by using the feedback channel, as illustrated in Figure-3.
Streaming Pxket;
S/PDIF
usic Real Server 0 0 0 ~ Real Player compatible Storage - --- -- - -- --- ~ AC-3frame Badc Channel i (Packet Loss, I
Jitter, RTT) I
Trarsm'ssion ~ Traffic Shaping ~ ~ ~ ~ Trarsm'ssion Engine Engine Engine AC3 FileFortnat Plugs I I AC3 Renderhg Plugin Figure-3 Figure-4 illustrates the dynamic traffic shaping architecture implemented in the ReaISystem. While re-packing the Real Media packets into AC-3 frames with S/PDIF compatible headers, the AC-3 rendering plugin continuously monitors the net traffic parameters, like packet loss, fitter and RTT. When these parameters indicate that the net traffic increases to a dangerous level, the rendering plugin sends a message to file format plugin through ReaISystem's back channel. With this message, the rendering plugin passes the net traffic parameters to the file format plugin.
As soon as the traffic shaping engine in the file format plugin receives the net traffic parameters from the rendering plugin, it processes these parameters and sends a bit rate control message to the transmission engine to adjust the streaming bit rate.
ea _ ~ -l1V
J

Play Renderirg Serv FileFormat er er Play Connect an to AC-3 Server Locate Mrsic fuC-3 S~rage Load E4C-3 Renderirg Init Load Pg Transm'ssion AC-3 Engine FileFormat Init Pg Transmission Engine Data Trarsm'ssion Init Traffic C~Pu~ Adjust Shaping Engine Packet Streaming Lass, litter, BilReate RTT

Badc Channel (Packet Lcss, litter, RTT) Figure-4 CONCLUSION
Through advanced knowledge in nefinrork protocols and digital multichannel audio encoding, a new adaptable method for efficiently transmitting Dolby Digital~
audio through congested packet networks such as the Internet is proposed. The methods and systems of the present invention have been integrated into two systems for this purpose, and have been tested in varying network congestion conditions. Preliminary testing indicates a good performance in transmitting DVD-quality audio through home DSL connections, as well as through cable based systems.

Although the present invention has been explained hereihabove by way of a preferred embodiment thereof, it should be pointed out that any modifications to this preferred embodiment within the scope of the appended claims is not deemed to alter or change the nature and scope of the present invention.