WO2013147709A1 - Method for transmitting a digital signal, method for receiving a digital signal, transmission arrangement and communication device - Google Patents
Method for transmitting a digital signal, method for receiving a digital signal, transmission arrangement and communication device Download PDFInfo
- Publication number
- WO2013147709A1 WO2013147709A1 PCT/SG2013/000124 SG2013000124W WO2013147709A1 WO 2013147709 A1 WO2013147709 A1 WO 2013147709A1 SG 2013000124 W SG2013000124 W SG 2013000124W WO 2013147709 A1 WO2013147709 A1 WO 2013147709A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bit
- stream
- digital signal
- spectrum
- frequency
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000005540 biological transmission Effects 0.000 title claims description 20
- 238000001228 spectrum Methods 0.000 claims abstract description 52
- 230000005236 sound signal Effects 0.000 claims description 24
- 230000001131 transforming effect Effects 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 2
- 230000006835 compression Effects 0.000 description 18
- 238000007906 compression Methods 0.000 description 18
- 230000008569 process Effects 0.000 description 14
- 230000011218 segmentation Effects 0.000 description 12
- 230000003595 spectral effect Effects 0.000 description 12
- 239000010410 layer Substances 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000005070 sampling Methods 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 230000009466 transformation Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 239000012792 core layer Substances 0.000 description 2
- 101000591286 Homo sapiens Myocardin-related transcription factor A Proteins 0.000 description 1
- 102100034099 Myocardin-related transcription factor A Human genes 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234327—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into layers, e.g. base layer and one or more enhancement layers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/23614—Multiplexing of additional data and video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/24—Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
- H04N21/2402—Monitoring of the downstream path of the transmission network, e.g. bandwidth available
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/2662—Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
- H04N21/4348—Demultiplexing of additional data and video streams
Definitions
- Embodiments of the invention generally relate to methods for transmitting a digital signal, methods for receiving a digital signal, transmission arrangements and communication devices.
- an audio signal may be compressed to save memory space and transmission bandwidth.
- compression formats There are two types of compression formats: non-scalable compression and scalable compression.
- Non-scalable compression is a compression format that is dedicated to a specified transmission bit rate. It provides the most efficient compression in terms of rate-distortion performance. According to this compression, after the compression, the bit rate of, for example, each audio frame of an audio signal is fixed. This can be considered to be not network friendly since the network has to provide a fixed transmission bit rate. Further, to fulfil different requirements for different clients, the audio content has to be compressed into several non-scalable copies. This is illustrated in figure 1.
- Figure 1 shows a communication system
- the communication system includes a non-scalable encoder 101 which provides a plurality of copies of the same audio signal (e.g. the same audio file) with different compression rates (and hence bit rates) to a server 103.
- the server 103 may chose, depending on the desired quality by a plurality of client devices 104, one of the copies 102 for each client 104 and transmit it to the client 104 via a communication network 105, e.g. the Internet.
- server 103 stores the various copies which may be seen to waste storage space of the server 103.
- FIG. 2 shows a communication system 200.
- the communication system includes a scalable encoder 201 which provides an audio signal (e.g. an audio file) in a single compressed version 202 to a server 203.
- the server 203 can truncate the compressed version 202 of the audio signal according to the desired bit-rates of client devices 204 and send, to each client device 204, a truncated version 205 (i.e. a truncated bit-stream) of the audio file over a communication network 206, e.g. the Internet.
- the audio signal is in the form of a scalable audio bit- stream that can be truncated to adapt the bit rate of the audio signal, for example to the actual network condition.
- the server 203 can respond to the requirement from each end user and provide the end users with the most suitable bit-stream by truncating the original scalable audio.
- the scalable compression format has this benefit, it does require extra bits to enable the scalability. Due to the introduction of these extra bits, the compression efficiency is reduced.
- the scalable compression format generally provides more flexibility in bit rate adaptation. It is typically more suitable to be applied in real time streaming applications when compared with the non-scalable compression formats.
- the scaling dimension and the coding priority order can significantly influence the bit-stream quality.
- the coding priority order in the scalable bit-stream is typically fixed after the encoding process and cannot be changed in the truncation process. This reduces the flexibility of the scalable format to provide high quality output.
- An example is the scalable audio coding scheme MPEG-4 Scalable to Lossless Coding (SLS).
- a core MPEG-4 AAC layer In SLS, a core MPEG-4 AAC layer generates an AAC compliant bit-stream and a lossless enhanced layer generates a scalable bit-stream which scans and codes the residual bit-plane symbols from the Most Significant Bit (MSB) to the Least Significant Bit (LSB). Both bit-streams are embedded in the final SLS bit-stream.
- MSB Most Significant Bit
- LSB Least Significant Bit
- Both bit-streams are embedded in the final SLS bit-stream.
- the resulting format is illustrated in figure 3.
- Figure 3 shows an audio signal in encoded format 301 and truncated format 302.
- the encoded audio signal 301 includes core layer information 304 and enhancement layer information 305 wherein the enhancement layer information is in the form of a bit stream starting from the most significant bits 306 of the residual bit-plane symbols and ending with the least significant bits 307 which are further followed by low energy scale factor band information 308.
- the server 103 truncates bits from the enhancement layer bit-stream 305 from its end for each frame, which includes the least significant information part, as defined by the encoder. The more significant information part, which is located, relatively, in the front of the bit- stream 305, is kept (as illustrated in the truncated format 302).
- the coding priority order of the encoded information in the enhancement layer bit-stream 305 is defined by the bit plane scanning order.
- the perceptual priority is considered and reflected on the bit-plane scanning order, it is pre-defined in the encoding process and cannot be changed after the scalable bit-stream 305 is generated, which leads to a fixed Rate-Distortion (R-D) relationship, typically optimized for the lossless case.
- R-D Rate-Distortion
- B SAC Bit-Sliced Arithmetic Coding
- MPEG-4 AAC Bit-Sliced Arithmetic Coding
- SSR Scalable Sampling Rate coding
- a method for transmitting a digital signal comprising generating a representation of the digital signal in the frequency domain, wherein the representation comprises, for each of a plurality of sub-spectrums of the spectrum of the digital signal, at least one frequency coefficient; generating, for each sub- spectrum, a bit-stream from the at least one frequency coefficient of the sub-spectrum; determining, for each bit-stream, a truncation point; truncating each bit-stream according to the truncation point determined for the bit-stream and transmitting data according to the truncated bit-streams.
- a method for receiving a digital signal comprising receiving data including a plurality of bit-streams; generating, from each bit- stream, at least one frequency coefficient; aggregating the frequency coefficients to a frequency domain representation of a digital signal; and transforming the frequency domain representation to the time domain to reconstruct the digital signal.
- a transmission arrangement and a communication device according to the method for transmitting a digital signal and a method for receiving a digital signal are provided.
- Figure 1 shows a communication system
- Figure 2 shows a communication system.
- Figure 3 shows an audio signal in encoded format and truncated format.
- Figure 4 shows a flow diagram.
- Figure 5 shows a transmission arrangement.
- Figure 6 shows a flow diagram.
- Figure 7 shows a communication device.
- Figure 8 shows a communication arrangement
- Figure 9 shows a bit plane representation for an audio signal and a bit-stream.
- Figure 10 shows an encoder.
- Figure 1 1 shows a truncator.
- Figure 12 illustrates a truncation process.
- Figure 13 shows a receiver
- Figure 14 shows a performance diagram.
- a scalable coding scheme is provided that can provide two dimensional compression scalability: frequency scalability and SNR scalability.
- a method for transmitting a digital signal according to one embodiment is illustrated in figure 4.
- Figure 4 shows a flow diagram 400.
- the flow diagram 400 illustrates a method for transmitting a digital signal.
- a representation of the digital signal in the frequency domain is generated, wherein the representation comprises, for each of a plurality of sub-spectrums of the spectrum of the digital signal, at least one frequency coefficient.
- a bit-stream is generated from the at least one frequency coefficient of the sub-spectrum.
- each bit-stream is truncated according to the truncation point determined for the bit- stream.
- data according to the truncated bit-streams (e.g. data including the information from the truncated bit-streams) are transmitted.
- the frequency spectrum is segmented and for each spectrum segment (or sub-spectrum) an individual bit-stream (e.g. enhancement layer bit-stream) is generated and individually truncated (which may include cutting the bit-stream completely, i.e.
- an individual bit-stream e.g. enhancement layer bit-stream
- individually truncated which may include cutting the bit-stream completely, i.e.
- bit-streams for certain sub-spectrums may be truncated more strongly than bit-streams for other sub-spectrums.
- the coding priority order in a bit-stream can be seen to be flexibly changed in the truncation process.
- a scalable encoder generates a plurality of bit-streams, with each bit-stream being associated with a subset of spectral information of the signal and the bit-streams may be ("intelligently") truncated, e.g. by a server independently, e.g. by completely dropping out some bit-streams (to provide frequency scalability) or truncating those bit-streams (to provide SNR scalability).
- the coding scheme described above provides additional frequency scalability.
- the coding scheme described above provides both frequency and SNR scalability simultaneously.
- the method may further comprise dividing the spectrum into the plurality of sub- spectrums.
- the digital signal is for example a media signal.
- the digital signal is an audio signal or a video signal.
- generating a bit-stream from the at least one frequency coefficient of the sub-spectrum includes bit-plane scanning the at least one frequency coefficient of the sub-spectrum.
- the method may further comprise multiplexing the truncated bit-streams to an overall bit- stream wherein transmitting data according to the truncated bit-streams includes transmitting the overall bit-stream.
- the method may further comprise transmitting, for each truncated bit-stream, information about the truncation point.
- the truncation point specifies at which point a bit-stream is truncated and may for example be expressed by a bit position (e.g. the last bit that is not truncated) or a length of the truncated bit-stream.
- determining the truncation point includes determining a truncation length and truncating the bit-stream according to the truncation point includes truncating the bit-stream according to the truncation length.
- determining a truncation point for a bit-stream includes deciding whether the bit-stream should be omitted from the data and, if it is decided that the bit-stream should be omitted, setting the truncation point such that truncating the bit- stream according to the truncation point leads to the bit-stream being omitted from the data. For example, omitting a bit-stream may be expressed by a truncation length of zero.
- FIG 4 The method illustrated in figure 4 is for example carried out by a transmission arrangement as illustrated in figure 5.
- Figure 5 shows a transmission arrangement 500.
- the transmission arrangement 500 comprises an encoder 501 configured to generate a representation of the digital signal in the frequency domain, wherein the representation comprises, for each of a plurality of sub-spectrums of the spectrum of the digital signal, at least one frequency coefficient and to generate , for each sub-spectrum, a bit-stream from the at least one frequency coefficient of the sub-spectrum.
- the transmission arrangement 500 comprises a truncator 502 configured to determine, for each bit-stream, a truncation point and to truncate each bit-stream according to the truncation point determined for the bit-stream.
- the transmission arrangement 500 further comprises a transmitter 503 configured to transmit data according to the truncated bit-streams.
- the transmitted truncated bit streams are for example processed according to a method as illustrated in figure 6.
- Figure 6 shows a flow diagram 600.
- the flow diagram 600 illustrates a method for receiving a digital signal.
- data including a plurality of bit-streams is received.
- at least one frequency coefficient is determined from each bit-stream,
- the frequency coefficients are aggregated to a frequency domain representation of a digital signal.
- the frequency domain representation is transformed to the time domain to reconstruct the digital signal.
- the data includes an overall bit-stream and the method comprises
- FIG 6 The method illustrated in figure 6 is for example carried out by a communication device as illustrated in figure 7.
- Figure 7 shows a communication device 700.
- the communication device 700 comprises a receiver 701 configured to receive data including a plurality of bit-streams.
- the communication device 700 further comprises a decoder 702 configured to generate, from each bit-stream, at least one frequency coefficient, aggregate the frequency coefficients to a frequency domain representation of a digital signal and transform the frequency domain representation to the time domain to reconstruct the digital signal. It should be noted that embodiments described in context of the method illustrated in figure 4 are analogously valid for the transmission arrangement 500, the method illustrated in figure 6 and the communication device 700 and vice versa.
- the components of the transmission arrangement 500 and the communication device may for example be implemented by one or more circuits.
- a “circuit” may be understood as any kind of a logic implementing entity, which may be special purpose circuitry or a processor executing software stored in a memory, firmware, or any combination thereof.
- a “circuit” may be a hard-wired logic circuit or a programmable logic circuit such as a programmable processor, e.g. a microprocessor (e.g. a Complex Instruction Set Computer (CISC) processor or a Reduced Instruction Set Computer (RISC) processor).
- a “circuit” may also be a processor executing software, e.g. any kind of computer program, e.g. a computer program using a virtual machine code such as e.g. Java. Any other kind of implementation of the respective functions which will be described in more detail below may also be understood as a "circuit”.
- Figure 8 shows a communication arrangement 800.
- the communication arrangement 800 includes a signal source 801 , a transmission arrangement 802 which includes an encoder 803, a truncator 804 and a transmitter 805, a communication network 806 and a receiving communication device 807.
- the signal source provides a audio signal 808, (or more generally a digital signal representing media content, e.g. an audio signal or a video signal) to the transmission arrangement 802.
- the transmission arrangement 802 may be implemented using a single device, e.g. a server computer, or a plurality of components.
- the transmission arrangement 802 generates an encoded and truncated audio signal 809 from the audio signal 808 and transmits it via the communication network 806 to the receiving communication device 807.
- the encoder 803 divides each audio spectral frame into m segments with each one covering a specified frequency range. Then the encoder 803 carries out a scalable encoding process for each segment. For each segment, the encoder 803 packs the produced coding information into a segmental bit- stream in an order of segmental side information, segmental data information (from the most perceptually important information to the least important information). This is illustrated in figure 9.
- Figure 9 shows a bit plane representation 901 for an audio signal and a bit-stream 907.
- the bit plane representation includes a frequency coefficient for each of a plurality of frequencies (e.g. for each of a plurality of scale factor bands) for one audio frame.
- the frequency coefficients are grouped according to a segmentation of the whole frequency range into sub-spectrums 902.
- the frequency coefficients for example represent a residual audio signal (e.g. after core layer encoding).
- a segmental bit-stream 903 is generated by bit-plane scanning the frequency coefficients.
- Each segmental bit-stream 903 includes side information 904, followed by the bits of the frequency coefficients in the order from most significant bits 905 to least significant bits 906.
- bits of all segmental bit-stream 903, in this example concatenated in the order of the frequency segments, forms the overall (encoded) bit-stream 907.
- the final overall bit-stream 907 in this example contains the segmental bit-streams 903 in an order of low frequency segment to high frequency segment.
- This bit-stream structure allows frequency scalability and SNR scalability since segmental bit-streams 903 (e.g. high frequency segmental bit-streams) may be omitted (i.e. cut off from the bit stream 907) and segmental bit-streams 903 may be truncated individually at different lengths.
- the encoder 803 provides the final overall bit-stream 907 to the transmitter 805 which transmits it over the network 806 to the receiving communication device 807.
- Figure 10 shows an encoder 1000.
- the encoder 1000 comprises a time/frequency transform block 1001, an optional psychoacoustic analysis block 1002, a spectrum segmentation block 1003, a scalable encoder 1004 and a bit-stream multiplexer 1005.
- the various components and blocks may be implemented by one or more circuits as described above.
- the transform block 1001 converts the audio input 1006 into its spectral representation, (e.g. using integer MDCT (Modified Discrete Cosine Transform)).
- the psychoacoustic analysis block 1002 may optionally perform a psychoacoustic analysis on the audio input 1006. For example, under low network bandwidth scenario, to achieve better perceptual quality, a general psychoacoustic model of the signal can be generated which can be used to guide the subsequent scalable encoding process so that the perceptually important information can be encoded with higher priority. However, under other scenarios, the psychoacoustic analysis can be omitted to reduce the encoder complexity.
- the spectrum of the transformed signal is divided into several segments by the spectrum segmentation block 1003 (e.g. into spectrum segments 902 as explained with reference to figure 9), indexed from the lowest frequency of the spectrum of the transformed signal to the highest frequency of the transformed spectrum.
- the segmentation boundaries could be fixed (e.g. the spectrum may be divided evenly) or adaptive to the psychoacoustic analysis results.
- the spectrum segmentation may be adaptive to the signal sampling rate via pre-defined lookup tables. Exemplary values for typical 48kHz audio sampling rate is listed in table 1 below.
- Table 1 An example of segmentation boundaries for 48kHz audio
- the corresponding boundary information is for example provided to the truncator 804 as part of metadata 1007.
- the truncator 804 and the transmitter 805 are part of a server while the encoder 803 is separate from the server and the encoder 803 provides the metadata 1007 to the server.
- the transformed digital signal i.e. the frequency coefficients
- These groups of frequency coefficients are independently encoded from each other by the scalable encoder 1004 to generate segmental bit-streams (e.g. corresponding to segmental bit-streams 1003).
- the encoding process can for example be carried out like the bit-plane coding adopted in SLS, where the spectral coefficients are converted to binary symbols, the latter are sequentially scanned and encoded from the MSB to LSB, from low frequency to high frequency, as long as the coefficient to be encoded is significant for the present coding bit-plane.
- the encoding process can be guided by some external information, like the perceptually enhanced bit-plane coding where the spectral coefficients are firstly pre- processed based on the corresponding psychoacoustic information, and then the processed results are coded following SLS bit-plane coding technique.
- the bit-stream multiplexer 1005 packs the segmental bit-streams in the order of segmental index, and the resulting final overall scalable bit-stream 1008, associated with the possible metadata 1007 (like the segmental boundary information), is sent to the truncator 804 (e.g. to a server including truncator 804 and transmitter 805 and which has received a request for the audio signal from the receiving communication device 807).
- the information for one audio frame corresponds to m independent segmental bit-streams 903.
- side information 904 and perceptually important bits are located in the front of segmental bit- stream 903, and less important bits are located at the rear of segmental bit-stream 903.
- Figure 1 1 shows a truncator 1 100.
- the truncator 1100 When the truncator 1100 (or e.g. a server housing the truncator 1100) receives a streaming requirement, according to the available network bandwidth, it can drop (omit) some segments from the scalable bit-stream 1101 received from the encoder 803 (along with meta data 1102) to realize frequency scalability, and truncate any segmental bit-stream to realize SNR scalability. The result is a truncated scalable bit-stream 1 103. The truncation is illustrated in figure 12.
- Figure 12 illustrates a truncation process
- the scalable bit-stream 1201 includes m segmental bit-streams 1202.
- the truncator 1203 for example receives information about bandwidth available in the communication network 806 for communication with the receiving communication device 807 and the metadata 1204 from the encoder 803 and individually truncates the segmental bit-streams 1202 or even omits segmental bit-stream 1202 such that the truncated bit-stream 1205 includes n truncated segmental bit streams 1206.
- the (optional) metadata 1204 can for example include the segmentation boundary information if the spectral segmentation is adaptive.
- the bandwidth (i.e. the bit-rate, for example) of the whole input bit-stream 1 101 is Bo, under a given available network bandwidth B such that the truncator decides that only a subset of the segmental bit-streams 1202 is kept, namely those corresponding to low frequency segments.
- the number n of the segmental bit-streams 1202 that are kept may for example be set to be proportional to the ratio between 5 0 and B, for example according to
- the truncator 1 100 drops the remaining m - n high frequency segments to realize frequency scalability.
- the remaining low frequency segments are further truncated, e.g. subject to the constraint that the length summation of truncated segmental bit-streams 1206 should approximately be equal to the available bit budget for each audio frame.
- the individual segmental truncation length for example, can be simply decided based on the relative segmental bandwidth information, e.g. according to the following mathematic expressions:
- B denotes the average bit budget for each frame, which is proportional to the network bandwidth, frame length, and inversely proportional to the signal sampling frequency;
- ⁇ is the truncated bit-stream length of the i th truncated segmental bit-stream 1206 (which may be zero if the bit-stream is omitted);
- ps is the corresponding length of segmental side information and Wj is the corresponding segmental bandwidth, which may be fixed or may be obtained by the truncator from the metadata information 1204.
- the truncation information i.e. the information on how the segmental bit-streams 1202 have been truncated, e.g.
- the length of the truncated segmental bit-streams 1206) is transmitted to the receiving communication device 807 as additional side information.
- the segmental side information 904 as illustrated in figure 9 is updated after the truncation.
- the side information may be directly represented by the truncated lengths n, or its corresponding bit-plane scanning numbers for more efficiency.
- the length of segmental side information pi is updated to include the truncation information before the truncation process is carried out.
- the truncator 804 provides the truncated scalable bit-stream 1103 to the transmitter 805 which transmits it to the receiving communication device 807 via the network 806.
- FIG. 13 shows a receiver 1300.
- the receiver 1300 includes a bit-stream multiplexer 1301 , a scalable decoder 1302 and a frequency to time transform block 1303 which may be implemented by one or more circuits as described above.
- the receiver 1300 receives the truncated scalable bit-stream 1304.
- the demultiplexer 1301 demultiplexes the truncated scalable bit-stream 1304 into the truncated segmental bit-streams 1206.
- the scalable decoder 1302 independently decodes the truncated segmental bit-stream 1206 to generate a decoded spectral information (i.e. reconstructed frequency coefficients).
- the F/T transform block 1303 transforms the decoded spectral information back to the time domain to reconstruct the audio signal in the time domain and outputs the reconstructed audio signal 1305.
- the frequency-to-time domain transformation of the transform block 1303 is complementary to the time-to-frequency transformation performed by the transform block 1001 on the encoding side.
- Figure 14 shows a performance diagram 1400.
- Bit rate increases from left to right along a bit rate axis 1401 and quality of the reconstructed signals increases from bottom to top along a quality axis 1402.
- the streaming quality of a non-scalable coding scheme is shown as a staircase curve 1403 since the server can only store limited copies of non-scalable audio.
- a dash-dotted curve 1404 illustrates the performance of a traditional scalable coding scheme (such as SLS). This can only reach sub-optimal streaming quality due to the high initial side information (located at the beginning of a scalable bit-stream) and fixed scanning order.
- the performance of the flexible scalable scheme as described above is illustrated by a solid curve 1405. It can make up for the shortcomings of the traditional scalable coding scheme due to its two dimensional scalable capability.
- the flexible scalable scheme as described above has fine-grain scalability merit and might be more advantageous at the low bit-rate scenario.
- the flexible scalable scheme as described above can provide better streaming quality at low, or even at middle, bit-rate.
- the initial side information is divided into several parts and dispatched to different positions within a scalable bit-stream, so much less side information are left at the beginning of a scalable bit-stream.
- the intelligent truncator a more optimal coding priority order for a given bit-rate can be provided by taking advantage of the trade-off between SNR and frequency scalability. Both two factors lead to streaming quality improvement at low to middle bit-rate.
- the flexible scalable scheme as described above can provide almost same high quality as traditional scalable coding scheme.
- Various embodiments as for example described above introduce a scalable audio coding scheme that can provide 2-D compression scalability: frequency scalability and SNR scalability. It allows the coding priority order in a bit-stream to be flexibly changed in the truncation process, characterized by (i) Spectral segmentation (ii) Intelligent truncation. The spectral segmentation introduces 2-D compression scalability, and the Intelligent truncation provides an R-D optimized output under a given network bandwidth.
- a scalable audio coding scheme for providing 2-D compression scalability comprising the steps of (i) transforming the input signal into frequency domain to generate the signal spectrum; (ii) dividing the signal spectrum into several segments; (iii) scalably encoding the spectral information of the signal of each segment to generate a plurality of segmental bit-streams; and (iv) truncating the segmental bit-streams to generate truncated bit-streams.
- the scheme may further comprise scalably decoding the truncated bit-streams to generate decoded spectral information and transforming the decoded spectrum into the time domain.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
A method for transmitting a digital signal is provided comprising generating a representation of the digital signal in the frequency domain, wherein the representation comprises, for each of a plurality of sub-spectrums of the spectrum of the digital signal, at least one frequency coefficient; generating, for each sub-spectrum, a bit-stream from the at least one frequency coefficient of the sub-spectrum; determining, for each bit-stream, a truncation point; truncating each bit-stream according to the truncation point determined for the bit-stream and transmitting data according to the truncated bit-streams.
Description
METHOD FOR TRANSMITTING A DIGITAL SIGNAL, METHOD FOR RECEIVING A DIGITAL SIGNAL, TRANSMISSION ARRANGEMENT AND
COMMUNICATION DEVICE Embodiments of the invention generally relate to methods for transmitting a digital signal, methods for receiving a digital signal, transmission arrangements and communication devices.
To efficiently store and transmit audio content (or generally media content, e.g. video), an audio signal (or generally a media signal) may be compressed to save memory space and transmission bandwidth. There are two types of compression formats: non-scalable compression and scalable compression. Non-scalable compression is a compression format that is dedicated to a specified transmission bit rate. It provides the most efficient compression in terms of rate-distortion performance. According to this compression, after the compression, the bit rate of, for example, each audio frame of an audio signal is fixed. This can be considered to be not network friendly since the network has to provide a fixed transmission bit rate. Further, to fulfil different requirements for different clients, the audio content has to be compressed into several non-scalable copies. This is illustrated in figure 1.
Figure 1 shows a communication system.
The communication system includes a non-scalable encoder 101 which provides a plurality of copies of the same audio signal (e.g. the same audio file) with different compression rates (and hence bit rates) to a server 103. The server 103 may chose, depending on the desired quality by a plurality of client devices 104, one of the copies 102 for each client 104 and transmit it to the client 104 via a communication network 105, e.g. the Internet.
This requires that the server 103 stores the various copies which may be seen to waste storage space of the server 103.
On the other hand, scalable compression can be seen as a network friendly coding format. It is illustrated in figure 2.
Figure 2 shows a communication system 200.
The communication system includes a scalable encoder 201 which provides an audio signal (e.g. an audio file) in a single compressed version 202 to a server 203. The server 203 can truncate the compressed version 202 of the audio signal according to the desired bit-rates of client devices 204 and send, to each client device 204, a truncated version 205 (i.e. a truncated bit-stream) of the audio file over a communication network 206, e.g. the Internet.
In the compressed version 202, the audio signal is in the form of a scalable audio bit- stream that can be truncated to adapt the bit rate of the audio signal, for example to the actual network condition. Accordingly, the server 203 can respond to the requirement from each end user and provide the end users with the most suitable bit-stream by truncating the original scalable audio. Although the scalable compression format has this benefit, it does require extra bits to enable the scalability. Due to the introduction of these extra bits, the compression efficiency is reduced.
The scalable compression format generally provides more flexibility in bit rate adaptation. It is typically more suitable to be applied in real time streaming applications when compared with the non-scalable compression formats. When the scalable format is used in streaming applications, the scaling dimension and the coding priority order can significantly influence the bit-stream quality. In the scalable format, the coding priority order in the scalable bit-stream is typically fixed after the encoding process and cannot be changed in the truncation process. This reduces the flexibility of the scalable format to provide high quality output. An example is the scalable audio coding scheme MPEG-4 Scalable to Lossless Coding (SLS). In SLS, a core MPEG-4 AAC layer generates an AAC compliant bit-stream and a lossless enhanced layer generates a scalable bit-stream which scans and codes the residual bit-plane symbols from the Most Significant Bit (MSB) to the Least Significant Bit (LSB). Both bit-streams are embedded in the final SLS bit-stream. The resulting format is illustrated in figure 3.
Figure 3 shows an audio signal in encoded format 301 and truncated format 302.
For each frame 303, the encoded audio signal 301 includes core layer information 304 and enhancement layer information 305 wherein the enhancement layer information is in the form of a bit stream starting from the most significant bits 306 of the residual bit-plane symbols and ending with the least significant bits 307 which are further followed by low energy scale factor band information 308. When the audio signal is to be delivered to an end user (e.g. a client device 104), the server 103 truncates bits from the enhancement layer bit-stream 305 from its end for each frame, which includes the least significant information part, as defined by the encoder. The more significant information part, which is located, relatively, in the front of the bit- stream 305, is kept (as illustrated in the truncated format 302).
In this case, the coding priority order of the encoded information in the enhancement layer bit-stream 305 is defined by the bit plane scanning order. Although the perceptual priority is considered and reflected on the bit-plane scanning order, it is pre-defined in the encoding process and cannot be changed after the scalable bit-stream 305 is generated, which leads to a fixed Rate-Distortion (R-D) relationship, typically optimized for the lossless case. This fixed R-D relationship cannot make the output quality of SLS bit-stream being optimized for all specified bit rate within a wide range.
Another scalable audio coding scheme is Bit-Sliced Arithmetic Coding (B SAC), which is used in MPEG-4 AAC. Its scalable bit-stream has the layered structure including one base layer and multiple enhanced layers with 1kbps enhancement step size. Like SLS, the coding priority order is fixed during its encoding process and cannot be changed during the truncation. Thus, only SNR (signal-to-noise ratio) scalability can be provided by truncating enhanced layers when bit-rate is insufficient.
Another scalable audio coding scheme is Scalable Sampling Rate coding (SSR), which was first published AAC in 1997. It is a hierarchical sampling-rate-scalable coding scheme and
provides sampling rate scalability with a very simple bit-stream splitter to remove upper frequency bands. However, it does not provide SNR scalability.
A higher flexibility of scalability is desirable.
According to one embodiment, a method for transmitting a digital signal is provided comprising generating a representation of the digital signal in the frequency domain, wherein the representation comprises, for each of a plurality of sub-spectrums of the spectrum of the digital signal, at least one frequency coefficient; generating, for each sub- spectrum, a bit-stream from the at least one frequency coefficient of the sub-spectrum; determining, for each bit-stream, a truncation point; truncating each bit-stream according to the truncation point determined for the bit-stream and transmitting data according to the truncated bit-streams. According to a further embodiment, a method for receiving a digital signal is provided comprising receiving data including a plurality of bit-streams; generating, from each bit- stream, at least one frequency coefficient; aggregating the frequency coefficients to a frequency domain representation of a digital signal; and transforming the frequency domain representation to the time domain to reconstruct the digital signal.
Further, a transmission arrangement and a communication device according to the method for transmitting a digital signal and a method for receiving a digital signal are provided.
In the drawings, like reference characters generally refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention. In the following description, various embodiments of the invention are described with reference to the drawings, in which:
Figure 1 shows a communication system.
Figure 2 shows a communication system.
Figure 3 shows an audio signal in encoded format and truncated format. Figure 4 shows a flow diagram. Figure 5 shows a transmission arrangement. Figure 6 shows a flow diagram. Figure 7 shows a communication device.
Figure 8 shows a communication arrangement.
Figure 9 shows a bit plane representation for an audio signal and a bit-stream. Figure 10 shows an encoder. Figure 1 1 shows a truncator. Figure 12 illustrates a truncation process.
Figure 13 shows a receiver.
Figure 14 shows a performance diagram. According to one embodiment, a scalable coding scheme is provided that can provide two dimensional compression scalability: frequency scalability and SNR scalability. A method for transmitting a digital signal according to one embodiment is illustrated in figure 4.
Figure 4 shows a flow diagram 400.
The flow diagram 400 illustrates a method for transmitting a digital signal.
In 401, a representation of the digital signal in the frequency domain is generated, wherein the representation comprises, for each of a plurality of sub-spectrums of the spectrum of the digital signal, at least one frequency coefficient.
In 402, for each sub-spectrum, a bit-stream is generated from the at least one frequency coefficient of the sub-spectrum.
In 403, for each bit-stream, a truncation point is determined.
In 404 each bit-stream is truncated according to the truncation point determined for the bit- stream.
In 405, data according to the truncated bit-streams (e.g. data including the information from the truncated bit-streams) are transmitted.
In other words, the frequency spectrum is segmented and for each spectrum segment (or sub-spectrum) an individual bit-stream (e.g. enhancement layer bit-stream) is generated and individually truncated (which may include cutting the bit-stream completely, i.e.
omitting the bit-stream). Thus, in addition to the scalability in terms of bit rate a scalability in terms of frequency is achieved since, for example, bit-streams for certain sub-spectrums may be truncated more strongly than bit-streams for other sub-spectrums.
According to one embodiment, the coding priority order in a bit-stream can be seen to be flexibly changed in the truncation process. For example, a scalable encoder generates a plurality of bit-streams, with each bit-stream being associated with a subset of spectral information of the signal and the bit-streams may be ("intelligently") truncated, e.g. by a server independently, e.g. by completely dropping out some bit-streams (to provide frequency scalability) or truncating those bit-streams (to provide SNR scalability).
Compared to SLS and BSAC the coding scheme described above provides additional frequency scalability. In term of scalability capability, the coding scheme described above provides both frequency and SNR scalability simultaneously.
By introducing the two-dimensional scalability, the trade-off between the two scalability can be adjusted at the truncator which allows generating an R-D optimized output under a given network bandwidth. The method may further comprise dividing the spectrum into the plurality of sub- spectrums.
The digital signal is for example a media signal. For example, the digital signal is an audio signal or a video signal.
According to one embodiment, generating a bit-stream from the at least one frequency coefficient of the sub-spectrum includes bit-plane scanning the at least one frequency coefficient of the sub-spectrum.
The method may further comprise multiplexing the truncated bit-streams to an overall bit- stream wherein transmitting data according to the truncated bit-streams includes transmitting the overall bit-stream. The method may further comprise transmitting, for each truncated bit-stream, information about the truncation point. The truncation point specifies at which point a bit-stream is truncated and may for example be expressed by a bit position (e.g. the last bit that is not truncated) or a length of the truncated bit-stream. According to one embodiment, determining the truncation point includes determining a truncation length and truncating the bit-stream according to the truncation point includes truncating the bit-stream according to the truncation length.
The data are for example transmitted to a communication device via a communication channel and the truncation point is for example determined based on the capacity of the communication channel.
According to one embodiment, determining a truncation point for a bit-stream includes deciding whether the bit-stream should be omitted from the data and, if it is decided that the bit-stream should be omitted, setting the truncation point such that truncating the bit- stream according to the truncation point leads to the bit-stream being omitted from the data. For example, omitting a bit-stream may be expressed by a truncation length of zero.
The method illustrated in figure 4 is for example carried out by a transmission arrangement as illustrated in figure 5. Figure 5 shows a transmission arrangement 500.
The transmission arrangement 500 comprises an encoder 501 configured to generate a representation of the digital signal in the frequency domain, wherein the representation comprises, for each of a plurality of sub-spectrums of the spectrum of the digital signal, at least one frequency coefficient and to generate , for each sub-spectrum, a bit-stream from the at least one frequency coefficient of the sub-spectrum.
Further, the transmission arrangement 500 comprises a truncator 502 configured to determine, for each bit-stream, a truncation point and to truncate each bit-stream according to the truncation point determined for the bit-stream.
The transmission arrangement 500 further comprises a transmitter 503 configured to transmit data according to the truncated bit-streams. The transmitted truncated bit streams are for example processed according to a method as illustrated in figure 6.
Figure 6 shows a flow diagram 600.
The flow diagram 600 illustrates a method for receiving a digital signal.
In 601, data including a plurality of bit-streams is received.
In 602, at least one frequency coefficient is determined from each bit-stream,
In 603, the frequency coefficients are aggregated to a frequency domain representation of a digital signal.
In 604, the frequency domain representation is transformed to the time domain to reconstruct the digital signal.
For example, the data includes an overall bit-stream and the method comprises
demultiplexing the overall bit-stream to generate the plurality of bit-streams.
The method illustrated in figure 6 is for example carried out by a communication device as illustrated in figure 7. Figure 7 shows a communication device 700.
The communication device 700 comprises a receiver 701 configured to receive data including a plurality of bit-streams. The communication device 700 further comprises a decoder 702 configured to generate, from each bit-stream, at least one frequency coefficient, aggregate the frequency coefficients to a frequency domain representation of a digital signal and transform the frequency domain representation to the time domain to reconstruct the digital signal. It should be noted that embodiments described in context of the method illustrated in figure 4 are analogously valid for the transmission arrangement 500, the method illustrated in figure 6 and the communication device 700 and vice versa.
The components of the transmission arrangement 500 and the communication device (e.g. the encoder, the truncator, the decoder, the transmitter and the receiver) may for example be implemented by one or more circuits. A "circuit" may be understood as any kind of a logic implementing entity, which may be special purpose circuitry or a processor executing software stored in a memory, firmware, or any combination thereof. Thus a "circuit" may
be a hard-wired logic circuit or a programmable logic circuit such as a programmable processor, e.g. a microprocessor (e.g. a Complex Instruction Set Computer (CISC) processor or a Reduced Instruction Set Computer (RISC) processor). A "circuit" may also be a processor executing software, e.g. any kind of computer program, e.g. a computer program using a virtual machine code such as e.g. Java. Any other kind of implementation of the respective functions which will be described in more detail below may also be understood as a "circuit".
In the following, examples are described in greater detail.
Figure 8 shows a communication arrangement 800.
The communication arrangement 800 includes a signal source 801 , a transmission arrangement 802 which includes an encoder 803, a truncator 804 and a transmitter 805, a communication network 806 and a receiving communication device 807.
The signal source provides a audio signal 808, (or more generally a digital signal representing media content, e.g. an audio signal or a video signal) to the transmission arrangement 802. The transmission arrangement 802 may be implemented using a single device, e.g. a server computer, or a plurality of components.
The transmission arrangement 802 generates an encoded and truncated audio signal 809 from the audio signal 808 and transmits it via the communication network 806 to the receiving communication device 807.
In this example, before the actual encoding process, the encoder 803 divides each audio spectral frame into m segments with each one covering a specified frequency range. Then the encoder 803 carries out a scalable encoding process for each segment. For each segment, the encoder 803 packs the produced coding information into a segmental bit- stream in an order of segmental side information, segmental data information (from the most perceptually important information to the least important information). This is illustrated in figure 9.
Figure 9 shows a bit plane representation 901 for an audio signal and a bit-stream 907.
The bit plane representation includes a frequency coefficient for each of a plurality of frequencies (e.g. for each of a plurality of scale factor bands) for one audio frame. The frequency coefficients are grouped according to a segmentation of the whole frequency range into sub-spectrums 902. The frequency coefficients for example represent a residual audio signal (e.g. after core layer encoding).
For each sub-spectrum, a segmental bit-stream 903 is generated by bit-plane scanning the frequency coefficients. Each segmental bit-stream 903 includes side information 904, followed by the bits of the frequency coefficients in the order from most significant bits 905 to least significant bits 906.
The bits of all segmental bit-stream 903, in this example concatenated in the order of the frequency segments, forms the overall (encoded) bit-stream 907.
The final overall bit-stream 907 in this example contains the segmental bit-streams 903 in an order of low frequency segment to high frequency segment. This bit-stream structure allows frequency scalability and SNR scalability since segmental bit-streams 903 (e.g. high frequency segmental bit-streams) may be omitted (i.e. cut off from the bit stream 907) and segmental bit-streams 903 may be truncated individually at different lengths. The encoder 803 provides the final overall bit-stream 907 to the transmitter 805 which transmits it over the network 806 to the receiving communication device 807.
In the following, examples for the encoder 803, the truncator 804 and an decoder included in the receiving communication device 807 are described with reference to figures 10, 11, 12 and 13.
Figure 10 shows an encoder 1000.
The encoder 1000 comprises a time/frequency transform block 1001, an optional psychoacoustic analysis block 1002, a spectrum segmentation block 1003, a scalable encoder 1004 and a bit-stream multiplexer 1005. The various components and blocks may be implemented by one or more circuits as described above.
Given an audio input 1006 (i.e. a digital audio signal), the transform block 1001 converts the audio input 1006 into its spectral representation, (e.g. using integer MDCT (Modified Discrete Cosine Transform)). The psychoacoustic analysis block 1002 may optionally perform a psychoacoustic analysis on the audio input 1006. For example, under low network bandwidth scenario, to achieve better perceptual quality, a general psychoacoustic model of the signal can be generated which can be used to guide the subsequent scalable encoding process so that the perceptually important information can be encoded with higher priority. However, under other scenarios, the psychoacoustic analysis can be omitted to reduce the encoder complexity.
After the transformation by the transform block 1001, the spectrum of the transformed signal is divided into several segments by the spectrum segmentation block 1003 (e.g. into spectrum segments 902 as explained with reference to figure 9), indexed from the lowest frequency of the spectrum of the transformed signal to the highest frequency of the transformed spectrum. The segmentation boundaries could be fixed (e.g. the spectrum may be divided evenly) or adaptive to the psychoacoustic analysis results. For example, the spectrum segmentation may be adaptive to the signal sampling rate via pre-defined lookup tables. Exemplary values for typical 48kHz audio sampling rate is listed in table 1 below.
Table 1 : An example of segmentation boundaries for 48kHz audio
For the case of adaptive segmentation of the spectrum, the corresponding boundary information is for example provided to the truncator 804 as part of metadata 1007. For example, the truncator 804 and the transmitter 805 are part of a server while the encoder 803 is separate from the server and the encoder 803 provides the metadata 1007 to the server. According to the segmentation of the spectrum, the transformed digital signal (i.e. the frequency coefficients) are grouped.
These groups of frequency coefficients are independently encoded from each other by the scalable encoder 1004 to generate segmental bit-streams (e.g. corresponding to segmental bit-streams 1003). The encoding process can for example be carried out like the bit-plane coding adopted in SLS, where the spectral coefficients are converted to binary symbols, the latter are sequentially scanned and encoded from the MSB to LSB, from low frequency to high frequency, as long as the coefficient to be encoded is significant for the present coding bit-plane.
Alternatively, the encoding process can be guided by some external information, like the perceptually enhanced bit-plane coding where the spectral coefficients are firstly pre- processed based on the corresponding psychoacoustic information, and then the processed results are coded following SLS bit-plane coding technique.
Lastly, the bit-stream multiplexer 1005 packs the segmental bit-streams in the order of segmental index, and the resulting final overall scalable bit-stream 1008, associated with the possible metadata 1007 (like the segmental boundary information), is sent to the truncator 804 (e.g. to a server including truncator 804 and transmitter 805 and which has received a request for the audio signal from the receiving communication device 807).
As illustrated in figure 9, the information for one audio frame corresponds to m independent segmental bit-streams 903. For each segmental bit-stream 903, side information 904 and perceptually important bits are located in the front of segmental bit- stream 903, and less important bits are located at the rear of segmental bit-stream 903.
Figure 1 1 shows a truncator 1 100.
When the truncator 1100 (or e.g. a server housing the truncator 1100) receives a streaming requirement, according to the available network bandwidth, it can drop (omit) some segments from the scalable bit-stream 1101 received from the encoder 803 (along with meta data 1102) to realize frequency scalability, and truncate any segmental bit-stream to realize SNR scalability. The result is a truncated scalable bit-stream 1 103. The truncation is illustrated in figure 12.
Figure 12 illustrates a truncation process.
As explained above, for each audio frame, the scalable bit-stream 1201 includes m segmental bit-streams 1202. The truncator 1203 for example receives information about bandwidth available in the communication network 806 for communication with the receiving communication device 807 and the metadata 1204 from the encoder 803 and individually truncates the segmental bit-streams 1202 or even omits segmental bit-stream 1202 such that the truncated bit-stream 1205 includes n truncated segmental bit streams 1206.
The (optional) metadata 1204 can for example include the segmentation boundary information if the spectral segmentation is adaptive.
As an example, it is assumed that the bandwidth (i.e. the bit-rate, for example) of the whole input bit-stream 1 101 is Bo, under a given available network bandwidth B such that the truncator decides that only a subset of the segmental bit-streams 1202 is kept, namely those corresponding to low frequency segments. The number n of the segmental bit-streams 1202 that are kept may for example be set to be proportional to the ratio between 50 and B, for example according to
Accordingly, the truncator 1 100 drops the remaining m - n high frequency segments to realize frequency scalability. To realize SNR scalability, the remaining low frequency segments are further truncated, e.g. subject to the constraint that the length summation of truncated segmental bit-streams 1206 should approximately be equal to the available bit budget for each audio frame. The individual segmental truncation length, for example, can be simply decided based on the relative segmental bandwidth information, e.g. according to the following mathematic expressions:
+ Pi B
i=l
where B denotes the average bit budget for each frame, which is proportional to the network bandwidth, frame length, and inversely proportional to the signal sampling frequency; τ is the truncated bit-stream length of the ith truncated segmental bit-stream 1206 (which may be zero if the bit-stream is omitted); ps is the corresponding length of segmental side information and Wj is the corresponding segmental bandwidth, which may be fixed or may be obtained by the truncator from the metadata information 1204.
The truncation information (i.e. the information on how the segmental bit-streams 1202 have been truncated, e.g. the length of the truncated segmental bit-streams 1206) is transmitted to the receiving communication device 807 as additional side information. For example, the segmental side information 904 as illustrated in figure 9 is updated after the truncation. The side information may be directly represented by the truncated lengths n, or its corresponding bit-plane scanning numbers for more efficiency. In one embodiment, no matter which representation method is adopted, the length of segmental side information pi is updated to include the truncation information before the truncation process is carried out. The truncator 804 provides the truncated scalable bit-stream 1103 to the transmitter 805 which transmits it to the receiving communication device 807 via the network 806.
Figure 13 shows a receiver 1300. The receiver 1300 includes a bit-stream multiplexer 1301 , a scalable decoder 1302 and a frequency to time transform block 1303 which may be implemented by one or more circuits as described above.
The receiver 1300 receives the truncated scalable bit-stream 1304.
The demultiplexer 1301 demultiplexes the truncated scalable bit-stream 1304 into the truncated segmental bit-streams 1206.
The scalable decoder 1302 independently decodes the truncated segmental bit-stream 1206 to generate a decoded spectral information (i.e. reconstructed frequency coefficients). The F/T transform block 1303 transforms the decoded spectral information back to the time domain to reconstruct the audio signal in the time domain and outputs the reconstructed audio signal 1305. The frequency-to-time domain transformation of the transform block 1303 is complementary to the time-to-frequency transformation performed by the transform block 1001 on the encoding side.
The overall rate-distortion performances of various coding schemes are illustrated in figure 14.
Figure 14 shows a performance diagram 1400.
Bit rate increases from left to right along a bit rate axis 1401 and quality of the reconstructed signals increases from bottom to top along a quality axis 1402.
The streaming quality of a non-scalable coding scheme is shown as a staircase curve 1403 since the server can only store limited copies of non-scalable audio. A dash-dotted curve 1404 illustrates the performance of a traditional scalable coding scheme (such as SLS). This can only reach sub-optimal streaming quality due to the high initial side information (located at the beginning of a scalable bit-stream) and fixed scanning order. The performance of the flexible scalable scheme as described above is illustrated by a solid curve 1405. It can make up for the shortcomings of the traditional scalable coding scheme due to its two dimensional scalable capability.
Comparing to the non-scalable scheme, the flexible scalable scheme as described above has fine-grain scalability merit and might be more advantageous at the low bit-rate scenario.
Comparing to the traditional scalable coding scheme, the flexible scalable scheme as described above can provide better streaming quality at low, or even at middle, bit-rate. By introducing the frequency scalability, the initial side information is divided into several parts and dispatched to different positions within a scalable bit-stream, so much less side information are left at the beginning of a scalable bit-stream. Apart from it, by introducing the intelligent truncator, a more optimal coding priority order for a given bit-rate can be provided by taking advantage of the trade-off between SNR and frequency scalability. Both two factors lead to streaming quality improvement at low to middle bit-rate. At high bit-rate, due to insignificant difference of side information amount, the flexible scalable scheme as described above can provide almost same high quality as traditional scalable coding scheme.
Various embodiments as for example described above introduce a scalable audio coding scheme that can provide 2-D compression scalability: frequency scalability and SNR scalability. It allows the coding priority order in a bit-stream to be flexibly changed in the truncation process, characterized by (i) Spectral segmentation (ii) Intelligent truncation. The spectral segmentation introduces 2-D compression scalability, and the Intelligent truncation provides an R-D optimized output under a given network bandwidth.
In summary, a scalable audio coding scheme for providing 2-D compression scalability is provided according to an embodiment comprising the steps of (i) transforming the input signal into frequency domain to generate the signal spectrum; (ii) dividing the signal spectrum into several segments; (iii) scalably encoding the spectral information of the signal of each segment to generate a plurality of segmental bit-streams; and (iv) truncating the segmental bit-streams to generate truncated bit-streams.
On the decoding side, the scheme may further comprise scalably decoding the truncated bit-streams to generate decoded spectral information and transforming the decoded spectrum into the time domain. While specific aspects have been described, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the aspects of this disclosure as defined by the appended claims. The scope is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced.
Claims
1. A method for transmitting a digital signal comprising:
generating a representation of the digital signal in the frequency domain, wherein the representation comprises, for each of a plurality of sub-spectrums of the spectrum of the digital signal, at least one frequency coefficient;
generating, for each sub-spectrum, a bit-stream from the at least one frequency coefficient of the sub-spectrum;
determining, for each bit-stream, a truncation point;
truncate each bit-stream according to the truncation point determined for the bit- stream;
transmitting data according to the truncated bit-streams.
2. The method according to claim 1 , further comprising dividing the spectrum into the plurality of sub-spectrums.
3. The method according to claim 1 or 2, wherein the digital signal is a media signal.
4. The method according to any one of claims 1 to 3, wherein the digital signal is an audio signal or a video signal.
5. The method according to any one of claims 1 to 4, wherein generating a bit-stream from the at least one frequency coefficient of the sub-spectrum includes bit-plane scanning the at least one frequency coefficient of the sub-spectrum.
6. The method according to any one of claims 1 to 5, further comprising multiplexing the truncated bit-streams to an overall bit-stream wherein transmitting data according to the truncated bit-streams includes transmitting the overall bit-stream.
7. The method according to any one of claims 1 to 6, further comprising transmitting, for each truncated bit-stream, information about the truncation point.
8. The method according to any one of claims 1 to 7, wherein determining the truncation point includes determining a truncation length and truncating the bit- stream according to the truncation point includes truncating the bit-stream according to the truncation length.
9. The method according to any one of claims 1 to 8, wherein the data are transmitted to a communication device via a communication channel and the truncation point is determined based on the capacity of the communication channel.
10. The method according to any one of claims 1 to 9, wherein determining a
truncation point for a bit-stream includes deciding whether the bit-stream should be omitted from the data and, if it is decided that the bit-stream should be omitted, setting the truncation point such that truncating the bit-stream according to the truncation point leads to the bit-stream being omitted from the data.
11. A transmission arrangement comprising
an encoder configured to generate a representation of the digital signal in the frequency domain, wherein the representation comprises, for each of a plurality of sub-spectrums of the spectrum of the digital signal, at least one frequency coefficient and to generate , for each sub-spectrum, a bit-stream from the at least one frequency coefficient of the sub-spectrum;
a truncator configured to determine, for each bit-stream, a truncation point and to truncate each bit-stream according to the truncation point determined for the bit- stream; and
a transmitter configured to transmit data according to the truncated bit-streams.
12. A method for receiving a digital signal comprising:
receiving data including a plurality of bit-streams;
generating, from each bit-stream, at least one frequency coefficient;
aggregating the frequency coefficients to a frequency domain representation of a digital signal; and
transforming the frequency domain representation to the time domain to reconstruct the digital signal.
13. The method according to claim 12, wherein the data includes an overall bit-stream and the method comprises demultiplexing the overall bit-stream to generate the plurality of bit- streams.
14. The method according to claim 13, wherein the data includes, for each bit-stream, information about a truncation point of the bit-stream and demultiplexing the overall bit-stream is based on the information about the truncation points.
15. A communication device comprising:
a receiver configured to receive data including a plurality of bit-streams; and a decoder configured to generate, from each bit-stream, at least one frequency coefficient, aggregate the frequency coefficients to a frequency domain
representation of a digital signal and transform the frequency domain representation to the time domain to reconstruct the digital signal.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SG201202261 | 2012-03-28 | ||
SG201202261-2 | 2012-03-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013147709A1 true WO2013147709A1 (en) | 2013-10-03 |
Family
ID=49260798
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SG2013/000124 WO2013147709A1 (en) | 2012-03-28 | 2013-03-28 | Method for transmitting a digital signal, method for receiving a digital signal, transmission arrangement and communication device |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2013147709A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000049570A1 (en) * | 1999-02-19 | 2000-08-24 | Unisearch Limited | Method for visual optimisation of embedded block codes to exploit visual masking phenomena |
EP1730725B1 (en) * | 2004-01-23 | 2009-12-09 | Microsoft Corporation | Efficient coding of digital audio spectral data using spectral similarity |
-
2013
- 2013-03-28 WO PCT/SG2013/000124 patent/WO2013147709A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000049570A1 (en) * | 1999-02-19 | 2000-08-24 | Unisearch Limited | Method for visual optimisation of embedded block codes to exploit visual masking phenomena |
EP1730725B1 (en) * | 2004-01-23 | 2009-12-09 | Microsoft Corporation | Efficient coding of digital audio spectral data using spectral similarity |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8046235B2 (en) | Apparatus and method of encoding audio data and apparatus and method of decoding encoded audio data | |
US7761290B2 (en) | Flexible frequency and time partitioning in perceptual transform coding of audio | |
EP2282310B1 (en) | Entropy coding by adapting coding between level and run-length/level modes | |
US9741354B2 (en) | Bitstream syntax for multi-process audio decoding | |
EP1749296B1 (en) | Multichannel audio extension | |
JP4849466B2 (en) | Method for encoding a digital signal into a scalable bitstream and method for decoding a scalable bitstream | |
US6122618A (en) | Scalable audio coding/decoding method and apparatus | |
CN101223570B (en) | Frequency segmentation to obtain bands for efficient coding of digital media | |
US20020049586A1 (en) | Audio encoder, audio decoder, and broadcasting system | |
EP1569203A2 (en) | Lossless audio decoding/encoding method and apparatus | |
KR19990041073A (en) | Audio encoding / decoding method and device with adjustable bit rate | |
WO2007083264A1 (en) | Audio coding | |
US20090228284A1 (en) | Method and apparatus for encoding/decoding multi-channel audio signal by using a plurality of variable length code tables | |
WO2005076260A1 (en) | Efficient coding of digital media spectral data using wide-sense perceptual similarity | |
KR102361206B1 (en) | Method and apparatus for encoding or decoding using subband dependent prediction adaptation for GCLI entropy coding | |
US20200312103A1 (en) | Encoding/decoding apparatuses and methods for encoding/decoding vibrotactile signals | |
EP2248263A1 (en) | Method and device of bitrate distribution/truncation for scalable audio coding | |
WO2008049737A1 (en) | Audio coding | |
TWI241558B (en) | Audio coding device and method | |
US7750829B2 (en) | Scalable encoding and/or decoding method and apparatus | |
EP2526546A1 (en) | Method and device for determining a number of bits for encoding an audio signal | |
WO2013147709A1 (en) | Method for transmitting a digital signal, method for receiving a digital signal, transmission arrangement and communication device | |
Arora et al. | Audio Compression in MPEG Technology | |
KR20080047837A (en) | Bsac arithmetic decoding method based on plural probability model | |
Golchin et al. | Lossless coding of MPEG-1 Layer III encoded audio streams |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13769937 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13769937 Country of ref document: EP Kind code of ref document: A1 |