EP2926556A1 - Compressed data stream transmission using rate control - Google Patents

Compressed data stream transmission using rate control

Info

Publication number
EP2926556A1
EP2926556A1 EP12791806.8A EP12791806A EP2926556A1 EP 2926556 A1 EP2926556 A1 EP 2926556A1 EP 12791806 A EP12791806 A EP 12791806A EP 2926556 A1 EP2926556 A1 EP 2926556A1
Authority
EP
European Patent Office
Prior art keywords
rate control
fill level
rate
control parameter
data stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP12791806.8A
Other languages
German (de)
French (fr)
Inventor
Joachim Keinert
Michael Schöberl
Marcus Wetzel
Siegfried FÖSSEL
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Original Assignee
Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV filed Critical Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Publication of EP2926556A1 publication Critical patent/EP2926556A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/152Data rate or code amount at the encoder output by measuring the fullness of the transmission buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/115Selection of the code volume for a coding unit prior to coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation

Definitions

  • the present application is concerned with the transmission of a compressed data stream using rate control, applicable, for example, in low delay applications and with respect to any type of data such as any media data, such as video or audio data, measurement signals, etc.
  • Compression is a well-known technique in order to reduce the amount of data necessary to represent a picture or video scene.
  • Various algorithms exist such as JPEG, JPEG 2000, WebP, H.264, MPEG-2, VC-2, or VC-5. While their application is straight forward when simply desiring to reduce the size of a file, real-time streaming introduces several additional constraints:
  • the requested latency might be very small.
  • the image compression needs to start operating on a part of the image data. There is typically not enough time to store the full image. This would introduce a delay. Also, the compressed data needs to be sent directly after compression and there is no time to wait for the compression to be finished.
  • Fig. 1 shows an encoder 10 and a decoder 20 with the encoder 10 being configured to generate, by encoding/compression, a compressed data stream 30 from an information signal entering at a data input 12 and send same via a transmission channel 40 to the decoder 20, which in turn is for decoding/decompressing the compressed data stream 30.
  • the transmission channel 40 is subject to transmission loss 42 and has a maximum data rate r c>max (t).
  • the encoder 10 has a rate control 44, a compression core 46 and a smoothing buffer or entropy output buffer 48, while decoder 20 has an inverse smoothing buffer or decoder input buffer 50 connected therewith so that compression core 46 and decoder 20 are connected to each other via a serial connection of buffer 48, transmission channel 40 and buffer 50.
  • the rate control 44 is informed of the maximum transmission rate as depicted by arrow 52, inspects the output rate of the compressed data stream 30 as output by compression core 46 and controls rate control parameters 54 of compression core 46 accordingly.
  • the rate control parameters influence the reconstruction quality at which the compression core 46 compresses data such as video or audio data into the compressed data stream 30.
  • a rate control parameter may be a quantization parameter.
  • a major purpose is to control the compression parameters in such a way that the coded video stream 30 meets a certain mean target rate r enCimean (t) and never exceeds the peak data rate r c>max (t). While some applications set r enc>mean t) — r Ctmax t), others use different values for the two magnitudes. Both of them might be either constant, or can vary over time.
  • Designing the transmission channel in such a way that it can sustain the peak data rates is sometimes simply not possible. This occurs, for instance, when a predefined transport medium such as Gigabit Ethernet shall be used. In this case, the peak data rate can hardly be influenced, apart from using multiple links.
  • setting the codec in such a way that it never exceeds the peak data rate without taking special considerations will result in bad image quality. This can be easily seen by means of Fig. 2. Without modifying the rate distribution of the image, the only possibility is to further reduce the image quality until the occurring peak data rate is smaller than r max . This, however, means that most of the time the channel is only used at a low fraction of its capacity.
  • a FIFO buffer can be added, that smoothes the rate distribution.
  • both the costs for implementing the buffer memory and the latency of the solution increase with the FIFO size. This is, because the FIFO delays the transmission of bytes required to decode a certain row to a later time.
  • the challenge is hence to provide a solution that only requires a small FIFO buffer, uses the transmission channel in an efficient way, and achieves good coding quality.
  • the underlying reason is situated in the rate control feedback loop depicted in Fig. 1. It results from the fact that the rate control needs to set the compression parameters for the encoder in such a way that the encoder smoothing buffer never overflows, because this would result in a drop of data, and hence a corrupted image. Furthermore, underflows might penalize achievable image quality.
  • the rate control needs to control the core encoder based on the buffer fill level.
  • the smoothing buffer might already have over- or under-flowed. This, of course, is particularly likely, when the smoothing buffer is small, because the time to react is also small.
  • the encoder repeatedly encodes the same amount of data using varying compression parameters, until it complies with all requirements such as maximum coded size. This, however, needs lots of computation and additional buffer memory, because data needs to be processed several times.
  • a decoder for decoding a compressed data stream comprises a decoding stage configured to decode the compressed data stream depending on a rate control parameter, and a rate control configured to log a fill level of a virtual buffer for buffering the compressed data stream and to adjust the rate control parameter depending on the fill level.
  • the present invention is based on the finding that logging the fill level of a virtual (encoder output) buffer at both the encoding and decoding sides is able to provide a common basis for adjusting the rate control parameter so that the latter does not have to be explicitly signaled to the decoder.
  • This has two consequences: first of all, transmission rate for transmitting the rate control parameter is saved. Additionally, due to the fact that the rate control parameter adjustment does not involve any transmission rate penalties, the granularity at which this adjustment may be performed may be set to a very fine granularity such as, for example, down to individual transform coefficients.
  • the feedback loop for adjusting the rate control parameter is very short and is able to react very quickly, so that the rate of the compressed data stream may be very quickly adapted to the needs imposed by the transmission channel and the physical buffer size at the encoder and decoder may be kept small. Due to the tightly wound feedback loop and the possibility to keep the buffer size small, the latency may be made small as well.
  • Fig. 1 shows a typical transmission scenario from an encoder to a decoder where the encoder completely assumes responsibility for performing the rate control so as to obey a maximum transmission rate; shows a graph of an exemplary distribution of peak rates of the compressed data stream over a sequence of image rows of an image in case of video/image compression; shows a block diagram of a transmission scenario including an encoder and a decoder according to an embodiment; shows a possible implementation of the encoding stage of the encoder as a multi-channel encoding stage along with packetization measures in accordance with an embodiment; shows a possible implementation of a decoding stage fitting to the embodiment of Fig.
  • FIG. 4 shows a block diagram of an implementation of the encoder and a possible implementation of the decoder within a transmission scenario of the embodiment of Fig. 3 when applying the coding codec realized by Figs. 4 and 5; shows schematically a possibility to derive a common time basis at encoder and decoder within the multi-channel coding scheme of Figs.
  • FIG. 4 and 5 shows schematically a structuring of a compressed data stream of a time- varying signal into a sequence of frames and a subdivision of the frames into frame fragments associated to certain portions out of a spectral and/or spatial domain of the coded time-varying signal which is, here, exemplarily a video; schematically shows the association between frame fragments and portions of a spatial and/or spectral domain of a spectrally and/or spatially sampled information signal in accordance with an example where an hierarchical wavelet transform is used for coding as in the case of the codec of Figs.
  • Fig. 1 1 shows a block diagram of an encoder as an extension of the embodiment of
  • Fig. 12 shows a flow diagram of the rate controls to log the fill level in accordance with an embodiment.
  • FIG. 3 showing a transmission scenario between an encoder and a decoder, based on which general concepts are discussed and an overview of the following embodiments is provided.
  • Fig. 3 shows an encoder as comprising an encoding stage 70 and a rate control 72 and a decoder as comprising a decoding stage 74 and a rate control 76.
  • encoder and decoder are connected via a transmission channel 78.
  • an encoder buffer 80 is connected between encoder and transmission channel 78 and an input buffer 82 is connected between decoder and transmission channel 78. Both of them are drawn with dashed lines to illustrate that same may be internal components of encoder and decoder respectively, or external components.
  • encoding stage's 70 output is connected to the input of encoder buffer 80, the input thereof receives the signal to be compressed and transmitted, such as a temporal sequence of samplings of an information signal or a measurement signal, a media signal such as an audio or video signal or the like. While the latter signal 84 is undistorted, this is not necessarily the case with respect to the reconstructed signal 86 at the decoding stage's 74 output.
  • encoding stage 70 and decoding stage 74 are configured to be controllable as far as their compression or reconstruction quality in encoding/decoding the compressed data stream is concerned.
  • encoding and decoding stages 70 and 74 are controllable by a rate control parameter 88 so as to change the way the signal 84 is compressed and reconstructed.
  • the reconstruction quality could monotonically depend on the rate control parameter 88, for example.
  • the rate control parameter is selected such that a function Q(p) of the reconstruction quality Q at which the decoding stage 74 decodes the compressed data stream 100 and encoding stage 70 encodes same, respectively, in dependency on the rate control parameter p 88 varies. It could, for example, substantially show a monotonic tendency.
  • Substantially means, for example, that the monotonic tendency is potentially fulfilled after applying a moving average filtering on Q(p) such as using an averaging over a window of a length of one quarter of the domain of Q(p).
  • a moving average filtering on Q(p) such as using an averaging over a window of a length of one quarter of the domain of Q(p).
  • a reconstruction quality may decrease with an increasing rate control parameter and vice versa.
  • the rate control parameter comprises quantization values at which values or quantization values coded into the compressed data stream 78 are quantized/dequantized. Accordingly, changing the rate control parameter 88 changes the reconstruction quality and concurrently the compression rate as a lower transmitted reconstruction quality nec
  • the rate control parameter 88 is adjusted at the encoder and decoder by the rate controls 72 and 76 in synchrony without the necessity to signal the rate control parameter adjustment from encoder to decoder. Rather, both rate controls 72 and 76 log a fill level of a virtual (encoder) buffer for buffering the compressed data stream 78 and adjust the rate control parameter 88 depending on the fill level. In order to strictly maintain the synchrony between rate controls 72 and 76 as far as the fill level of the corresponding virtual buffer 90 is concerned, both rely on parameters commonly accessible for rate control 72 and 76, respectively.
  • the maximum supported bit rate of the transmission channel 78 may be used as a read rate 92 by both rate control 72 and rate control 76 as the actual transmission rate at encoder and decoder may differ.
  • the not encoded data stream itself serves as a common time basis for logging the fill level of the virtual buffer 90 by rate controls 72 and 76.
  • a respective update 94 of the respective virtual buffer's 90 fill level 98 is caused with using the bit length 96 of the respective fragment as a write rate for increasing the fill level, and using a number of bits corresponding to a predetermined fraction of the read rate 92 for decreasing the fill level at the respective update instant (time click) 94.
  • the fame fragments may correspond, for example, to individual input pixels or transform coefficient levels or transform coefficient level groups such as trees of spatially corresponding coefficients, i.e. groups consisting of one coefficient in the spatially coarsest sub-band including all its associated (spatially co- located ) descendants in the other sub-bands, as described in the following embodiments, although alternatives are of course also feasible.
  • transform coefficient levels or transform coefficient level groups such as trees of spatially corresponding coefficients, i.e. groups consisting of one coefficient in the spatially coarsest sub-band including all its associated (spatially co- located ) descendants in the other sub-bands, as described in the following embodiments, although alternatives are of course also feasible.
  • rate controls 72 and 76 are then able to adjust the rate control parameter 88 synchronously to each other so as to prevent the virtual buffer 90 from depleting or overflowing.
  • rate controls 72 and 76 use the same feedback or fill level as an input to the rate control parameter mapping function. Details in this regard are also derivable from the embodiments outlined further below.
  • the embodiment of Fig. 3 enables the adjustment of rate control parameters without the need of explicitly signaling the adjustment to the decoding side. Implicit signaling on the basis of the buffer fill level is used instead.
  • the granularity at which the rate control parameters are adjusted may be as fine as possible, such as fine enough to coincide with the frame fragment borders 94 within the compressed data stream.
  • Fig. 4 shows the architecture of an encoding stage in accordance with an embodiment.
  • the encoding stage 70 is for encoding/compressing a video 84 into the compressed data stream 100 to be sent via smoothing buffer 80 over the transmission channel shown in Fig. 3.
  • the encoding stage 70 may, as shown in Fig.
  • a color transformer 102 which receives the video input 84 and performs a color space transform onto the video input 84.
  • the color transformer 102 is optional and may be left out.
  • the input video data 84 might be transformed into a target color space such as RGB or a lumen and chroma representation.
  • the video encoder 70 comprises a wavelet transformer 103 in order to decompose the pictures of the video input 84 into several spectral substreams as known to a man skilled in the art.
  • the pictures of the video input 84 are hierarchically decomposed into four substreams per hierarchical level: a two-dimensionally low-pass filtered version of the respective picture of the video 84, a two-dimensionally high-pass filtered version of the respective picture, and two versions of the respective picture high-pass filtered in row direction and low-pass filtered in the column direction (and vice versa), with all four versions being spatially subsampled in order to account for the reduction of the conveyed spectral content.
  • the two-dimensionally low-pass filtered version is then subject to the next hierarchical decomposition, if any, so that altogether the pictures of the video input 84 are decomposed into (3n + 1) subbands with n denoting the number of hierarchical levels of the wavelet transform of the wavelet transformer 103.
  • the wavelet transformer 103 may decompose the input video data 84 into several channels using, exemplarily, an hierarchical wavelet transform. Each of the wavelet transforms generate several subbands representing a channel. Some of the channels might be decorrelated using, for instance, spatial prediction.
  • the encoding stage 70 may comprise respective an spatial inter-channel and/or intra channel predictor 104.
  • each picture of the video input 84 has been translated into transform coefficients, either predictively so that same actually represent transform coefficient differences, or without prediction in which case same are the actual transform coefficients.
  • the encoding stage 70 comprises a quantization stage 106 for applying a quantization onto the transform coefficients of the different channels.
  • the individual quantization factors are generated by a quantization factor computer 107 of the encoding stage 70 depending on a quantization index which is, in accordance with the present example, a rate control parameter 88 whose adjustment/computation is the task of the rate controls introduced in Fig. 3 and described in more detail hereinafter.
  • the coefficients, or - due to their quantization - coefficient levels are entropy coded, using for instance a Golomb encoder 108 or alternatively zero run-length encoding unit 1 10 which encodes runs of zeroes.
  • a Golomb encoder 108 or alternatively zero run-length encoding unit 1 10 which encodes runs of zeroes.
  • both lossless compression schemes 108 and 110 could be provided within encoding stage 70 with the ability to switch therebetween as illustrated by a multiplexer 112 and a demultiplexer 114 between which the entropy coder 108 and the zero run-length encoder 1 10 are connected in parallel as shown in Fig. 4.
  • the multiplexing between these two units 108 and 1 10 would has to be done in such a way that proper decoding is possible, i.e. the switching instances would have to be explicitly or implicitly signaled to the decoding side.
  • the lossless coded data is then collected per subband into a respective small collector FIFO 116.
  • a packetizer and multiplexer module 1 18 reads all the data from the different input channels, prepends a packet header in front of the data, and sends it with high speed to the smoothing buffer 80, the memory of which is used to smooth the data rates possibly showing bursty behavior.
  • the decoding stage 74 may be constructed in an inverse manner: the compressed video stream 100 enters decoder 74 via input buffer 82 where a depacketizer and multiplexer 120 demultiplexes the inbound compressed data stream 100 into the aforementioned subbands.
  • Small FIFOs 122 are provided, one for each subband, and forward the respective coded subband stream to the lossless decoding stage illustratively constructed as an entropy decoder 124 and a zero run-length decoder 126 connected in parallel between a multiplexer 128 and demultiplexer 130.
  • the resulting transform coefficient levels of the respective subband are then dequantized at a dequantizer 132, which receives the quantization factor from a quantization factor computer 134 operating exactly the same as quantization factor computer 107 of the encoding side so as to compute the quantization factors from the quantization index received from the rate control 76.
  • the optional predictor 134 then predictively reconstructs the transform coefficients of the subbands which are then subject to the inverse transform in inverse transformer 136, such as, in the present case, to an inverse hierarchical wavelet transform.
  • the resulting video picture samples may then be subject to an inverse color transform in an inverse color transformer 138 whereupon a reconstruction 140 of the original video input 84 results. Based on the codec model just described with respect to Figs.
  • Fig. 6 exemplarily shows an implementation of the transmission scenario of Fig. 3 including an encoder and decoder in case of relying upon the exemplary video codec described with respect to Figs. 4 and 5. As far as possible, the reference signs are reused.
  • Fig. 6 represents an extension of the embodiment of Fig. 3 to multichannel coding/decoding.
  • all of the details presented hereinafter with respect to Fig. 6 relating to the actual rate control parameter adjustment and virtual buffer logging are also applicable to single-channel encoding/decoding embodiments with or without packetizing.
  • encoder 71 and decoder 75 additionally comprise summation units 142 and 144, respectively, in order to sum-up the count/number of bits of the individual channels/subbands together forming the payload data of the coded data stream of encoder 70 and decoder 74, respectively.
  • encoder 71 and decoder 75 additionally comprise a virtual output rate controller 146 and 148, respectively.
  • this controller is optional as will become clear from the description brought forward hereinafter.
  • the virtual output rate controller 146 of encoder 71 could be provided in order to instruct the packetizer 1 18 to include into the just mentioned packet headers information relating to the virtual buffer logging such as information on the virtual buffer read rate 92 to be used in logging the virtual buffer, so that the decoder's rate control may use the same read rate 92.
  • the read rate could be fixed and agreed between encoder and decoder beforehand.
  • packetizer 118 informs rate control 72 of the size of the packet headers which additionally contribute to the rate to be transmitted by the transmission channel 78 and depacketizer 120 acts the same, i.e. it informs the rate control 76 of the sizes of the packet headers within the packets transferred via the transmission channel 78 within the coded data stream.
  • the rate controls 72 and 76 update (increase) the fill state 98 accordingly.
  • Fig. 6 shows the rate controls 72 and 76 as internally comprising a quantization computer applying the aforementioned fill level to rate control parameter mapping function so as to perform the adjustment of the rate control parameter 88 depending on the fill level 98, with the computers being denoted by 150 and 152, respectively.
  • the introduction of the virtual buffer 90 is used to control the rate control parameter, such as the quantization, based on, for example, the aforementioned predefined mapping function and thus allows synchronizing the encoder and decoder without any overhead in the transmitted bitstream.
  • Fig. 6 shows a core encoder 70 that possibly divides an input image into multiple channels. They might for instance correspond to wavelet subbands. The number of channels can be any integer number larger than zero.
  • the optional packetizer 118 is responsible for dividing the data stream into packets with possibly variable length, prepending headers containing the necessary control information, and multiplexing the different channels into a single bitstream. These packets are stored in a physical output buffer 80.
  • the physical output buffer 80 is connected to transmission channel 78.
  • a peak data rate r c ⁇ max (t) is defined that it can sustain for sure.
  • the content of the physical output buffer 80 can be transferred with this speed to and via the transmission channel 78, and hence the decoder 75.
  • the rate generated by the encoder 70 is controlled by influencing the corresponding core encoder parameters, i.e. the rate control parameters 88. They might for instance represent quantization values belonging to the different channels, or truncation parameters when using bit plane entropy coding as in JPEG 2000.
  • the physical output buffer 80 also called smoothing FIFO, is responsible for balancing the changes in the rate produced by core encoder 70. Its data is transferred to the transmission channel 78 with the channel peak data rate. Whenever the core encoder 70 temporarily generates a rate that is larger than the channel peak rate, the smoothing FIFO 80 will fill. If the core encoder 70 temporarily generates a rate that is smaller than the channel peak rate, the physical output buffer 80 will deplete. If the period during which the core encoder rate exceeds the transmission channel rate maintains for too long time, the smoothing FIFO 80 will overflow. In other words, data will get lost and the decoder 75 cannot recover the encoded images anymore.
  • the encoder 71 It is hence the task of the encoder 71 to avoid such a FIFO overflow in any circumstance. To this end, it follows the basic strategy to reduce the quality and hence the rate of the encoded video sequence, if the buffer 80 is filling up. On the other hand, if the FIFO 80 is getting empty, the encoder 71 can spend more bits to achieve higher quality and exploit the available transmission channel bandwidth. The reduction in quality and increase of quality, respectively, is achieved by adjusting the rate control parameter 88.
  • the decoder 75 For successful decoding, the decoder 75 needs to know for every coefficient, which rate control parameter 88 the encoder 71 has used. However, since the encoder 71 can change these parameters for each coefficient in order to enable very quick reaction on changed image contents, traditional signaling of the rate control parameters 88 within the codestream would result in a huge overhead. In order to avoid this drawback, an implicit signaling mechanism based on a buffer fill level ⁇ is used.
  • be in a first step the fill level of the physical buffer 80 for step n. Then, a function can be defined that determines the next rate control parameter Q n based on the physical buffer fill level l n p and the current state ⁇ . This state can, for instance, encompass the current rate control parameters:
  • encoder 71 determines the rate control parameters (L based on the previous equations using the current buffer fill level l n . If the decoder 75 knows the corresponding buffer fill level as well, it can compute the same rate control parameters as well in order to correctly decode the sample values.
  • the decoder 75 it will be very hard for the decoder 75 to derive the exact value of l n p .
  • the reason is that the rate by which the physical buffer can be transferred to the transmission channel 78 might slightly vary because of the properties of the employed transmission protocol, clock domain crossing issues, or temporal congestion of the transmission channel 78.
  • the packetizer 1 18 depicted in Fig. 6 typically contains a buffer memory, the time between the change of a coding parameter and its actual impact on the output of the packetizer 1 18 might be large. As a consequence, it is difficult to control the buffer fill level, making the application susceptible to buffer overflows.
  • the virtual buffer 90 is introduced. It is directly connected to the output of the encoding stage 70 in order to enable a short latency between a possible change of the coding parameters 88 and the impact on the virtual buffer's 90 fill level 98. Furthermore, whenever the packetizer 118 inserts a header into the stream of coded bytes, this is included in the virtual buffer calculation as well.
  • the virtual buffer 90 does not have any associated memory. In other words, it cannot store any value. Instead, it simply monitors how many input values need to be stored, and how many values can be transferred to the transmission channel 78. The implementation of this virtual buffer 90 hence only requires one or two counters that keep track of the number of bits or bytes that were generated so far.
  • the rate control 72 determines the rate control parameters 88 in order to avoid any overflow of the virtual buffer 90. Consequently, by properly setting the physical buffer 80 size, the encoder 71 can guarantee that the physical buffer 80 never overflows:
  • the virtual FIFO 90 cannot be read with the current transmission channel bit rate r c (t) (see Fig. 1). This is because the latter can vary over time, and these variations are not known by the decoder 75. Instead the transfer of the data to the transmission channel 78 is modeled by assuming that the transmission channel 78 reads the data with the maximum supported bit rate r Cimax (t). If this value is constant and known by the decoder 75 it can serve as a common base for the encoder 71 and decoder 75 to compute the fill level 98 of the virtual buffer 90.
  • r c max (t) is smaller than the maximum channel capacity, it is even possible to tolerate some variations in the transmission bit rate, as long as the mean rate is strictly greater or equal than r c>max t).
  • the physical buffer needs to be increased accordingly.
  • r amax (t) is not a constant, the decoder 75 needs to be informed about every change. Since, however, changes in the rate should be far less frequent than changes in the rate control parameters 88, this is an efficient way for signaling changes in the rate control parameters (or some derived values such as number of bit planes).
  • Fig. 7 depicts a Active scenario where an input image is split into four channels. Each of the rectangles corresponds to a coefficient that needs to be quantized, entropy-coded and "stored" in the virtual buffer.
  • the splitter corresponding to entities 102, 103, 104 divides the input pixels into channels. This might be as simple as just separating the color components, or more complex filtering operations such as a wavelet transform delivering several subbands. Since, however, the outputs generated by the splitter depend on the same data input and because of the data dependencies inherent to the algorithms applied within the splitter, it is possible to organize the split coefficients into a common virtual time base. As depicted in Fig. 7, not every channel needs to output a coefficient for every virtual clock tick shown by black arrows. Instead, more or less regular patterns can occur. The only important definition is that the pattern is static, such that the decoder 75 can reliably reproduce it.
  • the patterns occurring at the output of the splitter typically show some form of regularity, or can be forced to have one. Consequently, they can be easily described by corresponding formulas.
  • the data flow for the virtual encoder buffer 90 can now easily be described. For each virtual clock tick, the bits generated by the entropy coder 70 are "written" into the virtual buffer 90. Furthermore, h c>max (t) bits are removed from the virtual buffer 90 (as long as the latter contains a sufficient number of bits). This models the transfer to the transmission channel 78. Knowing the temporal resolution of the common virtual time base, b c,max (t) can be transformed into a corresponding rate r cmax t).
  • the decoder side can perform the same computation.
  • the decoder 75 can reconstruct for every coefficient the corresponding fill level 98 of the virtual encoder buffer 90, as long as it obeys the same temporal pattern of the different channels. Typically, this is easily possible because of data dependency constraints.
  • the data 84 is again exemplarily shown to be a video, i.e. a sequence of pictures 170.
  • the pictures 170 are accordingly temporally arranged along a presentation time axis 172 at a temporal pitch At, i.e. the picture rate At "1 .
  • the pictures 170 are, for example, individually coded into a respective frame 174.
  • a compressed data stream 100 is subdivided into a sequence of frames 174.
  • Each frame comprises, in turn, a sequence of frame segments 176 which may, preferably, be contained within the compressed data stream 100 in a self- contained form, i.e. individually form a respective contiguous portion of the compressed data stream 100.
  • Each of the frame fragments is associated with a corresponding 178 portion 180 of the respective picture 170.
  • the frame fragments 176 may, for example, be a collection of all wavelet transform coefficients relating to one contiguous portion 180 of the picture 170.
  • the individual frame fragments 176 are individual transform coefficients of such a hierarchical wavelet transform, in which case each transform coefficient would have associated therewith a spatial extension as well as a spectral extension, i.e. would have associated therewith a spatial/spectral portion of the spatial and spectral decomposition formed by the hierarchical wavelet transform. Besides such fragments some fragments could relate to rather general information such as side information such as headers.
  • the rate control 72 and 76 is then configured to distribute the maximum supported bitrate among the one or more frame fragments 176 within each frame in relation to the size of the corresponding portions 180 to obtain fractions of the maximum supported bitrate and to decrease the fill level 98 in logging the fill level 98 frame- fragment-wise using the rate fractions.
  • the frames 174 exclusively comprise payload frame fragments which are associatable with non- overlapping - but together completely covering the picture - portions of the corresponding picture, and that the size of a corresponding portion 180 would be a, and A would be the size of the complete picture 170, then, for each frame 174, At ⁇ r C;max bits could be spent and accordingly, a/A thereof would represent a target bit amount to be spent for each frame fragment 176 corresponding to its respective portion 180, for example.
  • the rate control 72 and 76 would increase the fill level 98 by the actual amount of bits which the frame fragment 176 consumes within the compressed data stream 100, and decrease the fill level 98 by the just mentioned target size At ⁇ r c>max -a/A.
  • each frame 174 would, for example, convey the transform coefficients of a spectral decomposition of a time interval of the audio signal, when the time intervals overlap.
  • the compressed data stream 100 may comprise a sequence of frames 174 which, generally speaking, spectrally and/or spatially decompose/sample an information signal at a sequence of associated time stamps.
  • Each frame is structured into frame fragments, each of which is associatable with a corresponding portion of the spectral and/or spatial domain.
  • each sample in the spatially coarsest subband of the aforementioned hierarchical wavelet transform has associated therewith further ⁇ (2 A (T(sb- 1)/3 ⁇
  • -1)) 2 transform coefficients in the other subbands sb 2...(3n+l) of higher spatial resolution and together all of these l+ ⁇ (2 A (T(sb-l)/3 ⁇ l-l)) 2 (sum over 2...(3n+l)) transform coefficients form a tree of spatially corresponding coefficients and are associatable with a corresponding spatial tile of the picture of the video. For each of the tiles corresponding to one of the transform coefficients of the coarsest subband, i.e.
  • the same number of bits may be used as a target bit amount to be spent (in average) for all trees collections, i.e. the same rate fraction .
  • Other bits possibly contained within the frames 174 may simply be accounted for by subtracting a predetermined amount from the available bit length for the frames 174 as determined based on the maximum supported bitrate, before sharing among the frame fragments to obtain the target bit amounts or mean bitrates for the individual tree collections.
  • the rate control 72 and 76 decreases the fill level 98 of the virtual buffer 90 by the same amount derived as described by dividing the maximum supported bitrate minus some amount for side information data, if any, by the number of such frame fragments.
  • sb 1 of the hierarchical wavelet transform.
  • transform coefficients of coarser subbands of the hierarchical wavelet transform are generally - in the sense of their information importance - more important than transform coefficients of spatially finer subbands. Accordingly, a greater fraction of the available bit length available for frames 174 as, in average, offered by the maximum supported bit rate for transmitting the number of all wavelet coefficients, could be distributed to the transform coefficients of spatially coarser subbands than compared to transform coefficients of spatially coarser subbands. The distribution would be done in the same manner at the encoding and decoding side. See, for example, Fig. 9. Fig.
  • FIG. 9 shows exemplarily a decomposition of a monochromatic picture using a hierarchical wavelet transform 180.
  • the spatially coarsest sub-band 182 had pxq wavelet coefficients.
  • the number of all wavelet coefficients in all sub-bands would be 8p times 8q resulting in, in average, - and with summing all header payload amount with a constant h - a target or mean bit amount per coefficient of (At ⁇ r c ,max - h)/(64pq). Relating to the number of coefficients only, merely 1/64 of the whole available bit amount per frame would thus be allotted to the coefficients of the spatially coarsest sub-band 182.
  • each coefficient could form a frame fragment the encoding/decoding of which would trigger the virtual buffer fill level logging using the mean bit amount per coefficient of (At ⁇ r C;max - h)/(64pq) and the actual bit consumption, respectively. That is the time at which logging would be triggered would occur at a fast rate. In a different embodiment, however, this fraction would be increased for coefficients of the lower spatial frequency sub-bands.
  • the sub-division of the frame into frame fragments could be performed level-wise: one fragment would encompass all coefficients of sub-band 182, one fragment 187b would encompass all coefficients of sub- bands 2 to 4, one fragment 187c would encompass all coefficients of sub-bands 5 to 7 and one fragment 187d would encompass all coefficients of sub-bands 8 to 10.
  • a ratio of 1.4: 1.2: 1.1 :1 could, for example, be used for allotting the available bitrate per frame (At ⁇ Tcma - h) to the sub-bands of levels 1 to 4, i.e. the frame fragments 187a-d, respectively. Even a 1 : 1 : 1 : 1 ratio would prefer the wavelet coefficients of level 1, i.e.
  • the frame fragments could be the result of sub-dividing the coefficients spatially and level-wise: the pxq "trees of spatially corresponding coefficients" one of which is exemplarily shown in Fig.
  • the reaction needs to be relatively quick in order to allow for small buffer sizes without causing a buffer overflow.
  • it should control the rate control parameters 88 in such a way that the image quality is maximized.
  • the usage of the physical buffer 80 should balance peaks in the coded bit rate where necessary.
  • unnecessary fluctuations in the rate control parameter 88 will degrade the image quality as the local quality in the image 180 changes abruptly.
  • rate-distortion curve is, for example, convex and monotonic decreasing. Distortion values add up. This reduces the problem of rate control to find one quantization parameter such that the overall rate is not exceeded. Unfortunately, for implementations with low complexity, this is prohibitive. Neither it is possible to buffer complete images during encoding before actually sending them out, nor applying repetitive encoding. Furthermore, the inaccuracy of model based approaches prevents the usage of very small buffers.
  • Fig. 10 depicts the corresponding quantization graph. On the horizontal axis, it shows the fill level in %, on the vertical axes the so called quantization index.
  • a high value for the quantization index means strong quantization whereas low values result in little influence of the quantization up to, for example, a lossless coding mode.
  • This quantization index needs to be translated into a quantization factor for every channel. Note that neither the quantization index nor the quantization values are limited to integer values.
  • the different curves correspond to different working points. A working point basically defines the desired a target quantization that avoids abrupt changes in the coding parameters. Each curve is designed in such a way that it prevents overflow of the virtual buffer.
  • the quantization value for a full buffer is so large that all coefficients will be quantized to zero.
  • the buffer 80 will hence definitively deplete, because the coded bit rate will be smaller than the peak channel bit rate. The way how to select a working point is described in the following.
  • the encoder's task is now determining the optimal graph, in a way that the buffer level 98 is near the optimum buffer level. For inhomogeneous images, this will obviously not be possible. For image regions which are easy to compress, the buffer level 98 will reduce, while for difficult image parts, the buffer fill level 98 will increase. Nevertheless, the encoder 71 can try to put the mean buffer fill level into the zone, where the quantization curves are flat. To decide whether the optimum working point is active, the mean buffer fill level or the mean quantization value can be monitored by rate controls 72 and 76. Furthermore, it can be considered whether the buffer 90 has filled or completed at the end of the image compared to the start of the image.
  • the image can be divided into different quantization slices, each having its own working point.
  • a simple extension of the rate control mechanism is sufficient as illustrated in Fig. 1 1.
  • the first virtual buffer 90 is as discussed previously. Its purpose is to avoid any overflow of the physical output buffer. Consequently, the virtual output rate 92 is set to the maximum rate that the physical channel can provide.
  • Fig. 1 1 additionally shows that the virtual buffer 72 also comprises a computer 204 for computing a rate control parameter candidate 206 on the basis of the fill level 208 of the second virtual buffer 200, and a maximum selector 210 which chooses the maximum of the rate control parameter candidate 206 and the respective candidate 212 output by computer 150 on the basis of fill level 98 of the first virtual buffer 90, with the resulting maximum representing the actually adjusted rate control parameter 88.
  • a virtual output rate controller 214 is shown to be optionally present in order to control or vary the mean output rate used as the read rate 202 in logging the fill level 208 of the second virtual buffer 200. Explicit signaling may be used in order to keep the synchrony to the decoding side.
  • working point computers 216 and 218 may be present in order to control the working point of computer 150 and 204, respectively, as described above, by surveying a running average of the fill level 98 and 208, respectively, and/or a running average of the respective rate control parameter candidate 206 and 212, respectively.
  • the rate control parameter may be adjusted so as to correspond to the maximum of the rate control parameter candidates in case increasing the rate control parameter is intended to decrease the resulting coding bit rate as it was exemplarily the case in the afore-described figures. However, same may be adjusted to a minimum of the rate control parameter candidates in case decreasing the rate control parameter is intended to decrease the resulting coding bit rate.
  • the second virtual buffer 200 is for controlling the mean data rate. Consequently, the output rate 202 of this buffer is set to the desired mean data rate, of course smaller than the peak data rate.
  • the size of the virtual buffer 200 basically defines the admissible variance in the mean data rate between different images. A very small buffer will cause that only very little variations around the mean data rate are possible. Hence, it is unlikely to exploit the available channel peak data rate. A large buffer, on the other hand, causes that the compressed image size may vary significantly between different images. A reasonable choice might hence be to specify the virtual buffer's 200 size between a half and a full compressed frame size. Note that in contrast to the peak virtual buffer, the mean virtual buffer does not need to have a corresponding physical memory.
  • the rough estimation of the buffer level 90 is not sufficient.
  • the overhead of the headers and control sequences have to be considered, too.
  • the amount of overhead data, which is inserted by the packetizer 1 18 is inserted to the virtual buffer 90 at a specific period in time. This period has to be tracked exactly, because the decoder 75 has to be able to add the same amount of overhead at the same period in time. More precisely, the amount of overhead, which is added to the bit stream by the packetizer 118 in the form of packet headers, has to be added to the virtual buffer 90 at the encoder as well as at the decoder at the same period in time respectively at the same processed pixel. Otherwise encoder and decoder could get out of sync.
  • the challenge consists hence in telling the decoder at what time it needs to add packet header overhead with the precision of the correct virtual time slice, which can be for example one coefficient.
  • the encoder can add the number of coefficients already processed by the quantizer since the last header insertion. This information permits the decoder to basically count the dequantized coefficients and add the overhead at the right time to the virtual buffer. However, this solution creates an overhead.
  • Implicit synchronization of the packet header insertion requires that the decoder knows the rule set when the encoder adds a packet overhead to the virtual buffer. By applying the same rule set, the decoder can hence add the overhead at the right moment, so that the virtual buffer fill level 98 is the same for all processed coefficients.
  • the encoder decides to build a packet, when one of the channel FIFOs 116 in front of the packetizer 1 18 shown in Fig. 6 exceeds a given level of fullness. It will then collect the data from all input FIFOs 1 16, combine them into a packet and adds a corresponding header. At this moment in time, this packet header will also be added to the virtual buffer 90. In other words, the quantization factor for the coefficient quantized next will base on a modified virtual buffer fill level 98. Let's call this coefficient x n . It is hence the task of the decoder 75 to add the packet header to the virtual buffer 90 within the decoder 75 such that it impacts exactly coefficient x constitute during decoding as well.
  • the decoder 75 needs to figure out which coefficients are actually contained in a received packet. By knowing the pipeline structure of the encoder 71, it can hence derive how many coefficients are still stuck in the encoder pipeline. In other words, it is possible to compute the coefficient that will first be impacted by the addition of the packet overhead to the virtual buffer. Note that the packet header overhead only concerns the coefficients that are encoded after the coefficients within the currently created packet.
  • coefficients coded with zero-run-length coding represent a special case in the above mentioned scenario. It might indeed happen that some of the coefficients are decoded with the wrong quantization index, if the packet header insertion occurred in the middle of a zero-run. This represents a serious problem, if rounding is performed during dequantization instead of quantization.
  • Fig. 12 showing a mode of operation of the rate controls of Fig. 3 and 6 in accordance with an embodiment involving a packetizer 118, wherein Fig. 12 concentrates on the logging of the fill level which is then used by the computers within rate control 72 and 76 to vary or adjust the rate control parameter 88 accordingly as described above. It should be kept in mind, that the functionality of the rate controls of Fig. 3 and 6, as described next with respect to Fig.
  • the method starts with a setting 250 of the rate control parameter 88 to a default value.
  • the default value may be known to the decoding side or rate control 76 by definition or by explicit signaling beforehand.
  • the current fill level of the virtual buffer needs to be properly initialized such as set to an empty state which may be known to the decoding side or rate control 76 by definition or by explicit signaling beforehand.
  • the rate control waits for a new frame fragment being identified in the coded data stream in step 252 with the identification being performed by the encoding and decoding stages 70 and 74, respectively.
  • the identification is triggered whenever an encoding of a certain fragment has been finished, or whenever same has been decoded far enough in order to be able to identify the fragment borders of the fragment.
  • an update procedure 254 is started by the rate control 72 and 76, respectively. In particular, it is first checked in step 254 whether the current frame fragment is a header or a payload frame fragment as discussed previously with respect to Fig. 9.
  • the rate control determines the number n of frame fragments contained in the current packet in step 256 wherein reference is made to the above discussion as to how this determination could be performed. In effect, this step is to defer the fill level increase by the header length to the future, namely to when the n th payload frame fragment following the current header will be identified.
  • the current update 254 is finished and the waiting for the next frame fragment 252 is started.
  • the adjustment is performed so that within each frame fragment, the rate control parameter is constant, and so that same is selected/computed based on the fill level 98 as not yet having been updated by the bit length of the frame fragment in logging the fill level.
  • rate control for streaming applications is a complex undertaken, because it has to consider the peak data rate of the transmission channel, the desired mean data rate for encoding, the size of the smoothing buffer, as well as the peak data rate produced by the core encoder. None of the known solutions provides a satisfactory result, in particular when very low latency and small buffer sizes are required. In summary, they are either computationally expensive, do not control peak and mean data rates, are not able to react sufficiently fast to avoid frame drops, or their signaling overhead is large.
  • the above embodiments provided a novel way for controlling the compression rate of a video (or audio) encoder using feedback control. Compared to schemes using iterative loops in order to find the correct compression parameters, this avoids the redundant encoding of the same data. In other words, the computation effort is reduced. While such a strategy can also be found in applications using for instance H.264, the presented rate control permits to change the quantization with every processed coefficient, without adding any overhead to the resulting codestream. This is, because it is not necessary to explicitly signal the quantization change. Thus compared to approaches based on H.264, the described rate control
  • permits very fast reaction when changes in the image content causes abrupt changes in the required coding rate. This is particular useful when using very small buffer sizes, since in this case abrupt rate changes could result in frame drops or exceedance of rate.
  • Our rate control module can guarantee to not provoke any frame drop or rate exceedance.
  • the rate control is able to control both the mean and the peak data rate.
  • the rate control is able to control both the mean and the peak data rate.
  • the above embodiments ensure that the peak data rate of the channel is never exceeded.
  • it can be used for both constant and varying target compression rates.
  • they avoid the building of complex models for predicting the bit rate beforehand, thus allowing flexible codec modifications.
  • they do not require to slice the coefficients into bits and process them individually, avoiding thus high computation effort.
  • aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
  • Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
  • embodiments of the invention can be implemented in hardware or in software.
  • the implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
  • Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
  • embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer.
  • the program code may for example be stored on a machine readable carrier.
  • inventions comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
  • an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
  • a further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
  • the data carrier, the digital storage medium or the recorded medium are typically tangible and/or non- transitionary.
  • a further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein.
  • the data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
  • a further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
  • a processing means for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
  • a further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
  • a further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver.
  • the receiver may, for example, be a computer, a mobile device, a memory device or the like.
  • the apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver .
  • a programmable logic device for example a field programmable gate array
  • a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein.
  • the methods are preferably performed by any hardware apparatus.

Abstract

Logging the fill level of a virtual (encoder output) buffer at both the encoding and decoding sides is used to provide a common basis for adjusting the rate control parameter so that the latter does not have to be explicitly signaled to the decoder.

Description

Compressed Data Stream Transmission using Rate Control
Description
The present application is concerned with the transmission of a compressed data stream using rate control, applicable, for example, in low delay applications and with respect to any type of data such as any media data, such as video or audio data, measurement signals, etc.
Compression is a well-known technique in order to reduce the amount of data necessary to represent a picture or video scene. Various algorithms exist such as JPEG, JPEG 2000, WebP, H.264, MPEG-2, VC-2, or VC-5. While their application is straight forward when simply desiring to reduce the size of a file, real-time streaming introduces several additional constraints:
• The channel used for transmission only supports a precise maximum bit rate.
• The channel might introduce data losses.
• Depending on the application, the requested latency might be very small.
For achieving a low latency, the image compression needs to start operating on a part of the image data. There is typically not enough time to store the full image. This would introduce a delay. Also, the compressed data needs to be sent directly after compression and there is no time to wait for the compression to be finished.
Furthermore, for applications demanding little implementation complexity, the following requirements need to be taken into account:
· The amount of admissible buffer size is limited.
• The necessary computation effort should be limited in order to enable cheap implementation.
Because of the limited transmission bit rate, a rate control is necessary as depicted in Fig. 1. Fig. 1 shows an encoder 10 and a decoder 20 with the encoder 10 being configured to generate, by encoding/compression, a compressed data stream 30 from an information signal entering at a data input 12 and send same via a transmission channel 40 to the decoder 20, which in turn is for decoding/decompressing the compressed data stream 30. The transmission channel 40 is subject to transmission loss 42 and has a maximum data rate rc>max(t). The encoder 10 has a rate control 44, a compression core 46 and a smoothing buffer or entropy output buffer 48, while decoder 20 has an inverse smoothing buffer or decoder input buffer 50 connected therewith so that compression core 46 and decoder 20 are connected to each other via a serial connection of buffer 48, transmission channel 40 and buffer 50. The rate control 44 is informed of the maximum transmission rate as depicted by arrow 52, inspects the output rate of the compressed data stream 30 as output by compression core 46 and controls rate control parameters 54 of compression core 46 accordingly. The rate control parameters influence the reconstruction quality at which the compression core 46 compresses data such as video or audio data into the compressed data stream 30. For example, such a rate control parameter may be a quantization parameter.
A major purpose is to control the compression parameters in such a way that the coded video stream 30 meets a certain mean target rate renCimean(t) and never exceeds the peak data rate rc>max(t). While some applications set renc>mean t) — rCtmax t), others use different values for the two magnitudes. Both of them might be either constant, or can vary over time.
Meeting the peak data rate is complicated by the fact, that for constant quality throughout an image, the rate is not equally distributed within this image. This is because some rows of an image might be easier to encode than others, leading to a distribution of the encoded bytes as exemplified in Fig. 2.
In order to solve this problem, two principal different solutions exist:
1. Designing the transmission channel 40 in such a way that it can sustain all possible peak data rates.
2. Adding a FIFO buffer with a well-defined size that smoothes the bit rates, together with an adaptation of the compression parameters to avoid FIFO overflow.
Designing the transmission channel in such a way that it can sustain the peak data rates is sometimes simply not possible. This occurs, for instance, when a predefined transport medium such as Gigabit Ethernet shall be used. In this case, the peak data rate can hardly be influenced, apart from using multiple links. On the other hand, setting the codec in such a way that it never exceeds the peak data rate without taking special considerations will result in bad image quality. This can be easily seen by means of Fig. 2. Without modifying the rate distribution of the image, the only possibility is to further reduce the image quality until the occurring peak data rate is smaller than rmax. This, however, means that most of the time the channel is only used at a low fraction of its capacity. Furthermore, the costs for implementing a transmission channel with sufficient peak data rate might simply be too expensive. In order to avoid these drawbacks, a FIFO buffer can be added, that smoothes the rate distribution. The larger the FIFO size the less modifications on the codec parameters are required. On the other hand, both the costs for implementing the buffer memory and the latency of the solution increase with the FIFO size. This is, because the FIFO delays the transmission of bytes required to decode a certain row to a later time.
The challenge is hence to provide a solution that only requires a small FIFO buffer, uses the transmission channel in an efficient way, and achieves good coding quality. This makes the rate control pretty challenging. The underlying reason is situated in the rate control feedback loop depicted in Fig. 1. It results from the fact that the rate control needs to set the compression parameters for the encoder in such a way that the encoder smoothing buffer never overflows, because this would result in a drop of data, and hence a corrupted image. Furthermore, underflows might penalize achievable image quality. In other words, the rate control needs to control the core encoder based on the buffer fill level. However, since the core encoder might contain complex logic, it possibly takes some time until the effects of parameter changes became visible at the output. In the meantime, the smoothing buffer might already have over- or under-flowed. This, of course, is particularly likely, when the smoothing buffer is small, because the time to react is also small.
To solve this issue, existing solutions typically employ one of the two following strategies:
1. The encoder repeatedly encodes the same amount of data using varying compression parameters, until it complies with all requirements such as maximum coded size. This, however, needs lots of computation and additional buffer memory, because data needs to be processed several times.
2. Using of a model based approach that permits to predict the amount of coded data based on the compression parameters without actually doing the encoding itself. While this reduces the computational complexity, the prediction is inaccurate. This is particularly true, if the encoder contains complex operations such as combination of different entropy encoding mechanisms, whose results are difficult to foresee. This inaccuracy, however, can cause that the rate control takes a wrong decision, causing the over- or underflow of the smoothing buffer. Again, the probability for this incident increases when the smoothing buffer gets small as required for low complexity compression. Thus, it is an object of the present invention to provide a concept which allows a transmission of a compressed data stream using rate control so that real-time streaming is possible at lower latency and/or at a better rate/distortion ratio. This object is achieved by the subject matter of the independent claims.
In accordance with an embodiment of the present invention, a decoder for decoding a compressed data stream comprises a decoding stage configured to decode the compressed data stream depending on a rate control parameter, and a rate control configured to log a fill level of a virtual buffer for buffering the compressed data stream and to adjust the rate control parameter depending on the fill level.
The present invention is based on the finding that logging the fill level of a virtual (encoder output) buffer at both the encoding and decoding sides is able to provide a common basis for adjusting the rate control parameter so that the latter does not have to be explicitly signaled to the decoder. This, in turn, has two consequences: first of all, transmission rate for transmitting the rate control parameter is saved. Additionally, due to the fact that the rate control parameter adjustment does not involve any transmission rate penalties, the granularity at which this adjustment may be performed may be set to a very fine granularity such as, for example, down to individual transform coefficients. Consequently, the feedback loop for adjusting the rate control parameter is very short and is able to react very quickly, so that the rate of the compressed data stream may be very quickly adapted to the needs imposed by the transmission channel and the physical buffer size at the encoder and decoder may be kept small. Due to the tightly wound feedback loop and the possibility to keep the buffer size small, the latency may be made small as well.
Advantageous implementations of the present invention are the subject of the dependent claims. Preferred embodiments of the present application are described below with respect to the figures, among which:
Fig. 1 shows a typical transmission scenario from an encoder to a decoder where the encoder completely assumes responsibility for performing the rate control so as to obey a maximum transmission rate; shows a graph of an exemplary distribution of peak rates of the compressed data stream over a sequence of image rows of an image in case of video/image compression; shows a block diagram of a transmission scenario including an encoder and a decoder according to an embodiment; shows a possible implementation of the encoding stage of the encoder as a multi-channel encoding stage along with packetization measures in accordance with an embodiment; shows a possible implementation of a decoding stage fitting to the embodiment of Fig. 4; shows a block diagram of an implementation of the encoder and a possible implementation of the decoder within a transmission scenario of the embodiment of Fig. 3 when applying the coding codec realized by Figs. 4 and 5; shows schematically a possibility to derive a common time basis at encoder and decoder within the multi-channel coding scheme of Figs. 4 and 5; shows schematically a structuring of a compressed data stream of a time- varying signal into a sequence of frames and a subdivision of the frames into frame fragments associated to certain portions out of a spectral and/or spatial domain of the coded time-varying signal which is, here, exemplarily a video; schematically shows the association between frame fragments and portions of a spatial and/or spectral domain of a spectrally and/or spatially sampled information signal in accordance with an example where an hierarchical wavelet transform is used for coding as in the case of the codec of Figs. 4 and 5; shows a graph for an example for a parameterizable fill level to rate control parameter mapping function, here exemplarily mapping the fill level of the virtual buffer to a quantization index as a rate control parameter, wherein the bold drawn line defines possible working points; Fig. 1 1 shows a block diagram of an encoder as an extension of the embodiment of
Fig. 6 to the control of both maximum bit rate constraints and obeyance of mean data rate constraints; and
Fig. 12 shows a flow diagram of the rate controls to log the fill level in accordance with an embodiment.
Before some embodiments of the present invention are described in more detail below, reference is made to Fig. 3 showing a transmission scenario between an encoder and a decoder, based on which general concepts are discussed and an overview of the following embodiments is provided.
In particular, Fig. 3 shows an encoder as comprising an encoding stage 70 and a rate control 72 and a decoder as comprising a decoding stage 74 and a rate control 76. As shown, encoder and decoder are connected via a transmission channel 78. To be more precise, an encoder buffer 80 is connected between encoder and transmission channel 78 and an input buffer 82 is connected between decoder and transmission channel 78. Both of them are drawn with dashed lines to illustrate that same may be internal components of encoder and decoder respectively, or external components.
While the encoding stage's 70 output is connected to the input of encoder buffer 80, the input thereof receives the signal to be compressed and transmitted, such as a temporal sequence of samplings of an information signal or a measurement signal, a media signal such as an audio or video signal or the like. While the latter signal 84 is undistorted, this is not necessarily the case with respect to the reconstructed signal 86 at the decoding stage's 74 output. To be more precise, encoding stage 70 and decoding stage 74 are configured to be controllable as far as their compression or reconstruction quality in encoding/decoding the compressed data stream is concerned. In particular, encoding and decoding stages 70 and 74 are controllable by a rate control parameter 88 so as to change the way the signal 84 is compressed and reconstructed. In particular, the reconstruction quality could monotonically depend on the rate control parameter 88, for example. In other words, the rate control parameter is selected such that a function Q(p) of the reconstruction quality Q at which the decoding stage 74 decodes the compressed data stream 100 and encoding stage 70 encodes same, respectively, in dependency on the rate control parameter p 88 varies. It could, for example, substantially show a monotonic tendency. Substantially means, for example, that the monotonic tendency is potentially fulfilled after applying a moving average filtering on Q(p) such as using an averaging over a window of a length of one quarter of the domain of Q(p). The same applies for the function R(p) of the coding rate R at which the decoding stage 74 decodes the compressed data stream 100 and encoding stage 70 encodes same, respectively, in dependency on the rate control parameter p 88. It varies with p and could, for example, substantially show a monotonic tendency. A reconstruction quality may decrease with an increasing rate control parameter and vice versa. For example, the rate control parameter comprises quantization values at which values or quantization values coded into the compressed data stream 78 are quantized/dequantized. Accordingly, changing the rate control parameter 88 changes the reconstruction quality and concurrently the compression rate as a lower transmitted reconstruction quality necessitates less data rate in the compressed data stream 78.
In accordance with the embodiments described further below, the rate control parameter 88 is adjusted at the encoder and decoder by the rate controls 72 and 76 in synchrony without the necessity to signal the rate control parameter adjustment from encoder to decoder. Rather, both rate controls 72 and 76 log a fill level of a virtual (encoder) buffer for buffering the compressed data stream 78 and adjust the rate control parameter 88 depending on the fill level. In order to strictly maintain the synchrony between rate controls 72 and 76 as far as the fill level of the corresponding virtual buffer 90 is concerned, both rely on parameters commonly accessible for rate control 72 and 76, respectively. For example, the maximum supported bit rate of the transmission channel 78 may be used as a read rate 92 by both rate control 72 and rate control 76 as the actual transmission rate at encoder and decoder may differ. The not encoded data stream itself serves as a common time basis for logging the fill level of the virtual buffer 90 by rate controls 72 and 76. Each time a respective fragment is detected by encoding and decoding stages 70 and 7, respectively, a respective update 94 of the respective virtual buffer's 90 fill level 98 is caused with using the bit length 96 of the respective fragment as a write rate for increasing the fill level, and using a number of bits corresponding to a predetermined fraction of the read rate 92 for decreasing the fill level at the respective update instant (time click) 94. As described further below, the fame fragments may correspond, for example, to individual input pixels or transform coefficient levels or transform coefficient level groups such as trees of spatially corresponding coefficients, i.e. groups consisting of one coefficient in the spatially coarsest sub-band including all its associated (spatially co- located ) descendants in the other sub-bands, as described in the following embodiments, although alternatives are of course also feasible. For further details, reference is made to the embodiments described herein below.
Based on the fill level 98 of the virtual buffer 90, rate controls 72 and 76 are then able to adjust the rate control parameter 88 synchronously to each other so as to prevent the virtual buffer 90 from depleting or overflowing. To this end, rate controls 72 and 76 use the same feedback or fill level as an input to the rate control parameter mapping function. Details in this regard are also derivable from the embodiments outlined further below. Thus, the embodiment of Fig. 3 enables the adjustment of rate control parameters without the need of explicitly signaling the adjustment to the decoding side. Implicit signaling on the basis of the buffer fill level is used instead. The granularity at which the rate control parameters are adjusted may be as fine as possible, such as fine enough to coincide with the frame fragment borders 94 within the compressed data stream.
As already announced above, in the following, more specific embodiments and implementation details are described. Such details concern the type of compression realized by the pair of encoding and decoding stages, the rate control and the logging of the fill level of the virtual buffer and so forth. These details are transferrable onto the embodiment of Fig. 3 individually. Before proceeding with the description of Fig. 4, however, it should be noted that although Fig. 3, and later on Fig. 5, show a transmission scenario including an encoder and decoder, both entities, encoder and decoder, are separate components which are not necessarily, but very likely, built into separate devices. In this regard, it should be noted that the decoder, for example, may be implemented in hardware in the form of an ASIC or a programmable hardware, or in software. The decoder comprising decoding stage 74 and rate control 76 may be contained in a cellular phone, a portable computer or the like.
With respect to Fig. 4, a possible codec model underlying encoding and decoding stages 70 and 74 is described. It is to be emphasized that neither the encoder and decoder shown in Fig. 3 nor the encoder and decoder shown in Fig. 5 are restricted to this type of codec model. The description of Fig. 4 is simply meant to simplify the understanding of the more detailed embodiments described hereinafter. Fig. 4 shows the architecture of an encoding stage in accordance with an embodiment. In particular, in accordance with Fig. 4 the encoding stage 70 is for encoding/compressing a video 84 into the compressed data stream 100 to be sent via smoothing buffer 80 over the transmission channel shown in Fig. 3. The encoding stage 70 may, as shown in Fig. 4, comprise the following internal components: a color transformer 102 which receives the video input 84 and performs a color space transform onto the video input 84. The color transformer 102 is optional and may be left out. For example, the input video data 84 might be transformed into a target color space such as RGB or a lumen and chroma representation. The video encoder 70 comprises a wavelet transformer 103 in order to decompose the pictures of the video input 84 into several spectral substreams as known to a man skilled in the art. Briefly speaking, the pictures of the video input 84 are hierarchically decomposed into four substreams per hierarchical level: a two-dimensionally low-pass filtered version of the respective picture of the video 84, a two-dimensionally high-pass filtered version of the respective picture, and two versions of the respective picture high-pass filtered in row direction and low-pass filtered in the column direction (and vice versa), with all four versions being spatially subsampled in order to account for the reduction of the conveyed spectral content. The two-dimensionally low-pass filtered version is then subject to the next hierarchical decomposition, if any, so that altogether the pictures of the video input 84 are decomposed into (3n + 1) subbands with n denoting the number of hierarchical levels of the wavelet transform of the wavelet transformer 103. In even other words, the wavelet transformer 103 may decompose the input video data 84 into several channels using, exemplarily, an hierarchical wavelet transform. Each of the wavelet transforms generate several subbands representing a channel. Some of the channels might be decorrelated using, for instance, spatial prediction. For the latter spatial prediction, the encoding stage 70 may comprise respective an spatial inter-channel and/or intra channel predictor 104.
At the end, each picture of the video input 84 has been translated into transform coefficients, either predictively so that same actually represent transform coefficient differences, or without prediction in which case same are the actual transform coefficients.
The encoding stage 70 comprises a quantization stage 106 for applying a quantization onto the transform coefficients of the different channels. The individual quantization factors are generated by a quantization factor computer 107 of the encoding stage 70 depending on a quantization index which is, in accordance with the present example, a rate control parameter 88 whose adjustment/computation is the task of the rate controls introduced in Fig. 3 and described in more detail hereinafter. Next, the coefficients, or - due to their quantization - coefficient levels are entropy coded, using for instance a Golomb encoder 108 or alternatively zero run-length encoding unit 1 10 which encodes runs of zeroes. In Fig. 4, it is shown that both lossless compression schemes 108 and 110 could be provided within encoding stage 70 with the ability to switch therebetween as illustrated by a multiplexer 112 and a demultiplexer 114 between which the entropy coder 108 and the zero run-length encoder 1 10 are connected in parallel as shown in Fig. 4. The multiplexing between these two units 108 and 1 10 would has to be done in such a way that proper decoding is possible, i.e. the switching instances would have to be explicitly or implicitly signaled to the decoding side. The lossless coded data is then collected per subband into a respective small collector FIFO 116. If one of the collector FIFOs 1 16 reaches a certain level of fullness, a packetizer and multiplexer module 1 18 reads all the data from the different input channels, prepends a packet header in front of the data, and sends it with high speed to the smoothing buffer 80, the memory of which is used to smooth the data rates possibly showing bursty behavior.
As shown in Fig. 5, the decoding stage 74 may be constructed in an inverse manner: the compressed video stream 100 enters decoder 74 via input buffer 82 where a depacketizer and multiplexer 120 demultiplexes the inbound compressed data stream 100 into the aforementioned subbands. Small FIFOs 122 are provided, one for each subband, and forward the respective coded subband stream to the lossless decoding stage illustratively constructed as an entropy decoder 124 and a zero run-length decoder 126 connected in parallel between a multiplexer 128 and demultiplexer 130. The resulting transform coefficient levels of the respective subband are then dequantized at a dequantizer 132, which receives the quantization factor from a quantization factor computer 134 operating exactly the same as quantization factor computer 107 of the encoding side so as to compute the quantization factors from the quantization index received from the rate control 76. The optional predictor 134 then predictively reconstructs the transform coefficients of the subbands which are then subject to the inverse transform in inverse transformer 136, such as, in the present case, to an inverse hierarchical wavelet transform. The resulting video picture samples may then be subject to an inverse color transform in an inverse color transformer 138 whereupon a reconstruction 140 of the original video input 84 results. Based on the codec model just described with respect to Figs. 4 and 5, the next sections will describe the rate control algorithm underlying rate control 72 and 76 in more detail. Again, care should be taken not to accidentally restrict the present invention to the codec model exemplarily described with respect to Figs. 4 and 5. Fig. 6 exemplarily shows an implementation of the transmission scenario of Fig. 3 including an encoder and decoder in case of relying upon the exemplary video codec described with respect to Figs. 4 and 5. As far as possible, the reference signs are reused.
Generally, the embodiment of Fig. 6 represents an extension of the embodiment of Fig. 3 to multichannel coding/decoding. However, all of the details presented hereinafter with respect to Fig. 6 relating to the actual rate control parameter adjustment and virtual buffer logging are also applicable to single-channel encoding/decoding embodiments with or without packetizing. In addition to the elements already known from the previous figures, the following elements are included in this scenario of Fig. 6. For example, encoder 71 and decoder 75 additionally comprise summation units 142 and 144, respectively, in order to sum-up the count/number of bits of the individual channels/subbands together forming the payload data of the coded data stream of encoder 70 and decoder 74, respectively. The summation states are used in order to update the fill level 98 of the virtual buffers 90 as explained herein below. Further, encoder 71 and decoder 75 additionally comprise a virtual output rate controller 146 and 148, respectively. However, this controller is optional as will become clear from the description brought forward hereinafter. Briefly speaking, the virtual output rate controller 146 of encoder 71 could be provided in order to instruct the packetizer 1 18 to include into the just mentioned packet headers information relating to the virtual buffer logging such as information on the virtual buffer read rate 92 to be used in logging the virtual buffer, so that the decoder's rate control may use the same read rate 92. Alternatively, the read rate could be fixed and agreed between encoder and decoder beforehand. Further, packetizer 118 informs rate control 72 of the size of the packet headers which additionally contribute to the rate to be transmitted by the transmission channel 78 and depacketizer 120 acts the same, i.e. it informs the rate control 76 of the sizes of the packet headers within the packets transferred via the transmission channel 78 within the coded data stream. The rate controls 72 and 76 update (increase) the fill state 98 accordingly.
Additionally, Fig. 6 shows the rate controls 72 and 76 as internally comprising a quantization computer applying the aforementioned fill level to rate control parameter mapping function so as to perform the adjustment of the rate control parameter 88 depending on the fill level 98, with the computers being denoted by 150 and 152, respectively.
After having described the structure of the encoder and the structure of the decoder of Fig. 6, the functionality is described hereinafter. The main intention of the rate controls at the encoder and decoder has already been outlined with respect to Fig. 3: Small buffer sizes or ultra-low latency requires the possibility to quickly react to rate changes caused by variations in the image content. In the best case, every coefficient can be quantized with a different value. This permits fine 4086
12 parameter changes avoiding abrupt large modifications in the quantization value, which would lead to coding artifacts. In traditional schemes, such an approach would cause an enormous overhead, since every quantization change needs to be signaled. H.264, for instance, restricts quantization changes to the macroblock level, such that only one quantization value per macro block needs to be transmitted. JPEG 2000 on the other hand, uses a bit plane entropy coder that simplifies discarding data (as one method of quantization), however, the resulting number of bits should be signaled in the bitstream for each block. This is achieved with a sophisticated method to signal resulting control values by means of so called tag-trees. Both, however, are computationally expensive and have overhead in signalization. The introduction of the virtual buffer 90 is used to control the rate control parameter, such as the quantization, based on, for example, the aforementioned predefined mapping function and thus allows synchronizing the encoder and decoder without any overhead in the transmitted bitstream. Fig. 6 shows a core encoder 70 that possibly divides an input image into multiple channels. They might for instance correspond to wavelet subbands. The number of channels can be any integer number larger than zero. The optional packetizer 118 is responsible for dividing the data stream into packets with possibly variable length, prepending headers containing the necessary control information, and multiplexing the different channels into a single bitstream. These packets are stored in a physical output buffer 80. The physical output buffer 80 is connected to transmission channel 78. A peak data rate rc<max(t) is defined that it can sustain for sure. In other words, the content of the physical output buffer 80 can be transferred with this speed to and via the transmission channel 78, and hence the decoder 75. The peak data rate can be smaller than the actual transmission channel capacity. Having, for instance, a switched Gigabit Ethernet connection, rmax(f) = 700Mbit/s would be a valid value. The rate generated by the encoder 70 is controlled by influencing the corresponding core encoder parameters, i.e. the rate control parameters 88. They might for instance represent quantization values belonging to the different channels, or truncation parameters when using bit plane entropy coding as in JPEG 2000.
The physical output buffer 80, also called smoothing FIFO, is responsible for balancing the changes in the rate produced by core encoder 70. Its data is transferred to the transmission channel 78 with the channel peak data rate. Whenever the core encoder 70 temporarily generates a rate that is larger than the channel peak rate, the smoothing FIFO 80 will fill. If the core encoder 70 temporarily generates a rate that is smaller than the channel peak rate, the physical output buffer 80 will deplete. If the period during which the core encoder rate exceeds the transmission channel rate maintains for too long time, the smoothing FIFO 80 will overflow. In other words, data will get lost and the decoder 75 cannot recover the encoded images anymore.
It is hence the task of the encoder 71 to avoid such a FIFO overflow in any circumstance. To this end, it follows the basic strategy to reduce the quality and hence the rate of the encoded video sequence, if the buffer 80 is filling up. On the other hand, if the FIFO 80 is getting empty, the encoder 71 can spend more bits to achieve higher quality and exploit the available transmission channel bandwidth. The reduction in quality and increase of quality, respectively, is achieved by adjusting the rate control parameter 88.
For successful decoding, the decoder 75 needs to know for every coefficient, which rate control parameter 88 the encoder 71 has used. However, since the encoder 71 can change these parameters for each coefficient in order to enable very quick reaction on changed image contents, traditional signaling of the rate control parameters 88 within the codestream would result in a huge overhead. In order to avoid this drawback, an implicit signaling mechanism based on a buffer fill level Ι is used.
Let Ιζ be in a first step the fill level of the physical buffer 80 for step n. Then, a function can be defined that determines the next rate control parameter Qn based on the physical buffer fill level ln p and the current state η . This state can, for instance, encompass the current rate control parameters:
in = 0» )
In other words, for every coefficient x„ to process, encoder 71 determines the rate control parameters (L based on the previous equations using the current buffer fill level ln . If the decoder 75 knows the corresponding buffer fill level as well, it can compute the same rate control parameters as well in order to correctly decode the sample values.
Unfortunately, it will be very hard for the decoder 75 to derive the exact value of ln p . The reason is that the rate by which the physical buffer can be transferred to the transmission channel 78 might slightly vary because of the properties of the employed transmission protocol, clock domain crossing issues, or temporal congestion of the transmission channel 78. Furthermore, since the packetizer 1 18 depicted in Fig. 6 typically contains a buffer memory, the time between the change of a coding parameter and its actual impact on the output of the packetizer 1 18 might be large. As a consequence, it is difficult to control the buffer fill level, making the application susceptible to buffer overflows.
To solve these issues, the virtual buffer 90 is introduced. It is directly connected to the output of the encoding stage 70 in order to enable a short latency between a possible change of the coding parameters 88 and the impact on the virtual buffer's 90 fill level 98. Furthermore, whenever the packetizer 118 inserts a header into the stream of coded bytes, this is included in the virtual buffer calculation as well. The virtual buffer 90 does not have any associated memory. In other words, it cannot store any value. Instead, it simply monitors how many input values need to be stored, and how many values can be transferred to the transmission channel 78. The implementation of this virtual buffer 90 hence only requires one or two counters that keep track of the number of bits or bytes that were generated so far. Based on this virtual fill level Ιζ, the rate control 72 then determines the rate control parameters 88 in order to avoid any overflow of the virtual buffer 90. Consequently, by properly setting the physical buffer 80 size, the encoder 71 can guarantee that the physical buffer 80 never overflows:
<¾* = /«(«. ¾) (!)
¾ = " ) (2)
In order to allow the decoder 75 to recompute the fill level encountered in the encoder 71 when coding coefficient x„, the virtual FIFO 90 cannot be read with the current transmission channel bit rate rc(t) (see Fig. 1). This is because the latter can vary over time, and these variations are not known by the decoder 75. Instead the transfer of the data to the transmission channel 78 is modeled by assuming that the transmission channel 78 reads the data with the maximum supported bit rate rCimax(t). If this value is constant and known by the decoder 75 it can serve as a common base for the encoder 71 and decoder 75 to compute the fill level 98 of the virtual buffer 90. In case the value rc max(t) is smaller than the maximum channel capacity, it is even possible to tolerate some variations in the transmission bit rate, as long as the mean rate is strictly greater or equal than rc>max t). The physical buffer needs to be increased accordingly. For situations where ramax(t) is not a constant, the decoder 75 needs to be informed about every change. Since, however, changes in the rate should be far less frequent than changes in the rate control parameters 88, this is an efficient way for signaling changes in the rate control parameters (or some derived values such as number of bit planes). While such an approach in principle permits to synchronize the fill levels 98 of the virtual buffers 90 in the encoder 71 and decoder 75, it is very sensitive to clock skews, if the rc max(t) would be specified in bits per second. To this end a virtual clock depending on the input pixels may be defined. This is subject of the following description.
In order to allow proper synchronization between encoder 71 and decoder 75, it is necessary to precisely specify when data elements are removed from the virtual buffer 90. Defining a rate in bits per second is not sufficient because of possible clock skews. Since we need to determine for each processed pixel the correct virtual buffer fill level 98, defining the rate in bits per input pixel is straight forward. Since, however, the encoder 71 might split the input image into different channels using for instance a wavelet transform, a careful system design is necessary. Fig. 7 depicts a Active scenario where an input image is split into four channels. Each of the rectangles corresponds to a coefficient that needs to be quantized, entropy-coded and "stored" in the virtual buffer. The splitter corresponding to entities 102, 103, 104 divides the input pixels into channels. This might be as simple as just separating the color components, or more complex filtering operations such as a wavelet transform delivering several subbands. Since, however, the outputs generated by the splitter depend on the same data input and because of the data dependencies inherent to the algorithms applied within the splitter, it is possible to organize the split coefficients into a common virtual time base. As depicted in Fig. 7, not every channel needs to output a coefficient for every virtual clock tick shown by black arrows. Instead, more or less regular patterns can occur. The only important definition is that the pattern is static, such that the decoder 75 can reliably reproduce it. Note that the patterns occurring at the output of the splitter typically show some form of regularity, or can be forced to have one. Consequently, they can be easily described by corresponding formulas. With this in mind, the data flow for the virtual encoder buffer 90 can now easily be described. For each virtual clock tick, the bits generated by the entropy coder 70 are "written" into the virtual buffer 90. Furthermore, hc>max(t) bits are removed from the virtual buffer 90 (as long as the latter contains a sufficient number of bits). This models the transfer to the transmission channel 78. Knowing the temporal resolution of the common virtual time base, bc,max(t) can be transformed into a corresponding rate rcmax t).
The decoder side can perform the same computation. By these means, the decoder 75 can reconstruct for every coefficient the corresponding fill level 98 of the virtual encoder buffer 90, as long as it obeys the same temporal pattern of the different channels. Typically, this is easily possible because of data dependency constraints.
Finally it has to be noticed that in an actual hardware implementation the operations might be organized in form of a pipeline. This pipeline structure should be taken into account in the decoder 75 as well in order to avoid desynchronization between the two codecs.
With respect to Fig. 8, the just mentioned provision of a common time basis between encoder and decoder is described again. The data 84 is again exemplarily shown to be a video, i.e. a sequence of pictures 170. The pictures 170 are accordingly temporally arranged along a presentation time axis 172 at a temporal pitch At, i.e. the picture rate At"1. In the coded data stream 100, the pictures 170 are, for example, individually coded into a respective frame 174. In other words, a compressed data stream 100 is subdivided into a sequence of frames 174. Each frame comprises, in turn, a sequence of frame segments 176 which may, preferably, be contained within the compressed data stream 100 in a self- contained form, i.e. individually form a respective contiguous portion of the compressed data stream 100. Each of the frame fragments is associated with a corresponding 178 portion 180 of the respective picture 170. In the case of a hierarchical wavelet transform, the frame fragments 176 may, for example, be a collection of all wavelet transform coefficients relating to one contiguous portion 180 of the picture 170. It would also be feasible, however, that the individual frame fragments 176 are individual transform coefficients of such a hierarchical wavelet transform, in which case each transform coefficient would have associated therewith a spatial extension as well as a spectral extension, i.e. would have associated therewith a spatial/spectral portion of the spatial and spectral decomposition formed by the hierarchical wavelet transform. Besides such fragments some fragments could relate to rather general information such as side information such as headers.
The rate control 72 and 76, respectively, is then configured to distribute the maximum supported bitrate among the one or more frame fragments 176 within each frame in relation to the size of the corresponding portions 180 to obtain fractions of the maximum supported bitrate and to decrease the fill level 98 in logging the fill level 98 frame- fragment-wise using the rate fractions. Assuming, for example, that the frames 174 exclusively comprise payload frame fragments which are associatable with non- overlapping - but together completely covering the picture - portions of the corresponding picture, and that the size of a corresponding portion 180 would be a, and A would be the size of the complete picture 170, then, for each frame 174, At · rC;max bits could be spent and accordingly, a/A thereof would represent a target bit amount to be spent for each frame fragment 176 corresponding to its respective portion 180, for example. The non- overlapping portions 180 may be equally-sized or may vary in size. In the latter case, the target bit-consumption for the frame fragment associated with portion i would be ¾/A with aj being the size of portion i with A= ^ α, and N being the number of Portions 180.
0</<W
As soon as such a frame fragment 176 has been processed by encoding stage 70 and decoding stage 74, respectively, the rate control 72 and 76, respectively, would increase the fill level 98 by the actual amount of bits which the frame fragment 176 consumes within the compressed data stream 100, and decrease the fill level 98 by the just mentioned target size At · rc>max -a/A.
It should be mentioned that the concept of uniformly spreading the target transmission rate derived as described from the maximum transmission rate across the spatio/spectral domain in the case of a video may be easily transferred onto other types of signals such as audio signals and the spectrogram or the like. In case of an audio signal, each frame 174 would, for example, convey the transform coefficients of a spectral decomposition of a time interval of the audio signal, when the time intervals overlap.
In other words, the compressed data stream 100 may comprise a sequence of frames 174 which, generally speaking, spectrally and/or spatially decompose/sample an information signal at a sequence of associated time stamps. Each frame is structured into frame fragments, each of which is associatable with a corresponding portion of the spectral and/or spatial domain. For example, each sample in the spatially coarsest subband of the aforementioned hierarchical wavelet transform has associated therewith further∑(2A(T(sb- 1)/3~|-1))2 transform coefficients in the other subbands sb = 2...(3n+l) of higher spatial resolution and together all of these l+∑(2A(T(sb-l)/3~l-l))2 (sum over 2...(3n+l)) transform coefficients form a tree of spatially corresponding coefficients and are associatable with a corresponding spatial tile of the picture of the video. For each of the tiles corresponding to one of the transform coefficients of the coarsest subband, i.e. for each such tree collection of spatially corresponding transform coefficients, the same number of bits may be used as a target bit amount to be spent (in average) for all trees collections, i.e. the same rate fraction . Other bits possibly contained within the frames 174 may simply be accounted for by subtracting a predetermined amount from the available bit length for the frames 174 as determined based on the maximum supported bitrate, before sharing among the frame fragments to obtain the target bit amounts or mean bitrates for the individual tree collections. That is, each time such a frame fragment of associated wavelet coefficients has been coded/decoded, the rate control 72 and 76, respectively, decreases the fill level 98 of the virtual buffer 90 by the same amount derived as described by dividing the maximum supported bitrate minus some amount for side information data, if any, by the number of such frame fragments. In case of using the described tree collections, this number of such frame fragments is equal to the number of transform coefficients of the coarsest subband (sb = 1) of the hierarchical wavelet transform. However, other decompositions of the transform coefficients would also be feasible.
However, it would also be feasible to define the granularity of the frame fragments even more finely. In case of the hierarchical wavelet transform, transform coefficients of coarser subbands of the hierarchical wavelet transform are generally - in the sense of their information importance - more important than transform coefficients of spatially finer subbands. Accordingly, a greater fraction of the available bit length available for frames 174 as, in average, offered by the maximum supported bit rate for transmitting the number of all wavelet coefficients, could be distributed to the transform coefficients of spatially coarser subbands than compared to transform coefficients of spatially coarser subbands. The distribution would be done in the same manner at the encoding and decoding side. See, for example, Fig. 9. Fig. 9 shows exemplarily a decomposition of a monochromatic picture using a hierarchical wavelet transform 180. Here, exemplarily, three levels (n=3) have been used resulting in 10 subbands. One coefficient of the spatially coarsest subband 182 (sb=l) is exemplarily highlighted by hatching and its associated transform coefficients in the spatially finer subbands are indicated as well, all together forming a tree of locally corresponding coefficients as denoted above. Imagine, for example, the spatially coarsest sub-band 182 had pxq wavelet coefficients. Then, the number of all wavelet coefficients in all sub-bands would be 8p times 8q resulting in, in average, - and with summing all header payload amount with a constant h - a target or mean bit amount per coefficient of (At · rc,max - h)/(64pq). Relating to the number of coefficients only, merely 1/64 of the whole available bit amount per frame would thus be allotted to the coefficients of the spatially coarsest sub-band 182. Thus, each coefficient could form a frame fragment the encoding/decoding of which would trigger the virtual buffer fill level logging using the mean bit amount per coefficient of (At · rC;max - h)/(64pq) and the actual bit consumption, respectively. That is the time at which logging would be triggered would occur at a fast rate. In a different embodiment, however, this fraction would be increased for coefficients of the lower spatial frequency sub-bands. For example, the sub-division of the frame into frame fragments could be performed level-wise: one fragment would encompass all coefficients of sub-band 182, one fragment 187b would encompass all coefficients of sub- bands 2 to 4, one fragment 187c would encompass all coefficients of sub-bands 5 to 7 and one fragment 187d would encompass all coefficients of sub-bands 8 to 10. A ratio of 1.4: 1.2: 1.1 :1 could, for example, be used for allotting the available bitrate per frame (At · Tcma - h) to the sub-bands of levels 1 to 4, i.e. the frame fragments 187a-d, respectively. Even a 1 : 1 : 1 : 1 ratio would prefer the wavelet coefficients of level 1, i.e. sub-band 182, over the coefficients of level 4, including sub-band 184, because 3*32*p*q wavelet coefficients of level 4 would have to share thereamong the same average bitrate compared to the p*q coefficients of level 1. However, this would be justified, as each coefficient in level 1, relates to a greater areal size 186b of image 188, compared to the areal size 186a which the coefficients of level 3 relate to. In even another embodiment, the frame fragments could be the result of sub-dividing the coefficients spatially and level-wise: the pxq "trees of spatially corresponding coefficients" one of which is exemplarily shown in Fig. 9, could each be further sub-divided into four quarters each, each quarter containing the tree's coefficients contained within the level 1, 2, 3 and 4, respectively. Just for comparison, in the embodiment above, to each of the pxq "trees of spatially corresponding coefficients", (Δΐ rCjmax - h)/(pq) of the available bit amount for one frame was allotted. Now, further aspects and further embodiments are described.
Controlling the peak data rate is of major importance in order to avoid frame drops. Whenever the data rate produced by the encoder exceeds the peak data rate of the channel 78 and the smoothing FIFO 80 overflows, data gets lost, and the decoder 75 cannot recover the encoded image any more. To this end, the previous sections have described the architectural principles in order to avoid these difficulties. The concept, however, will only work, if the function fQ in Equation (1) is designed in such a way that
• it reduces the coded bit rate when the virtual smoothing buffer 90 gets full
· it increases the coded bit rate when the virtual smoothing buffer 90 gets empty
Furthermore, the reaction needs to be relatively quick in order to allow for small buffer sizes without causing a buffer overflow. On the other hand, it should control the rate control parameters 88 in such a way that the image quality is maximized. To this end, the usage of the physical buffer 80 should balance peaks in the coded bit rate where necessary. In general, unnecessary fluctuations in the rate control parameter 88 will degrade the image quality as the local quality in the image 180 changes abruptly.
In the following, a corresponding solution to this problem will be discussed by elaborating a strategy for refining Equations (l)-(2). Note, however, that other strategies as well as other functions can be used as well in order to optimize the codec as required by the envisaged applications. Low Complexity Rate Control
In order to maximize the achievable image quality, well known coding algorithms such as JPEG 2000 or H.264 perform a rate distortion optimization. To this end, the image is divided into subregions such as macroblocks (H.264) or codeblocks (JPEG 2000). An optimization algorithm then decides for every subregion how much information to include in the compressed file such that the overall distortion is minimized while obeying the rate bound. However, such a rate distortion optimization typically comes along with a significant amount of computation effort. Furthermore, for limited amount of available buffer memory, corresponding heuristics are required.
To reduce the implementation costs, a simpler rate control strategy can be strived for that assumes every image subregion (within a channel) to be equal. Hence, ideally all image regions should be quantized identically, rate-distortion curve is, for example, convex and monotonic decreasing. Distortion values add up. This reduces the problem of rate control to find one quantization parameter such that the overall rate is not exceeded. Unfortunately, for implementations with low complexity, this is prohibitive. Neither it is possible to buffer complete images during encoding before actually sending them out, nor applying repetitive encoding. Furthermore, the inaccuracy of model based approaches prevents the usage of very small buffers.
Consequently, the effective quantization should be controlled based on the fill level 98 of virtual buffer 90. Fig. 10 depicts the corresponding quantization graph. On the horizontal axis, it shows the fill level in %, on the vertical axes the so called quantization index. A high value for the quantization index means strong quantization whereas low values result in little influence of the quantization up to, for example, a lossless coding mode. This quantization index needs to be translated into a quantization factor for every channel. Note that neither the quantization index nor the quantization values are limited to integer values. The different curves correspond to different working points. A working point basically defines the desired a target quantization that avoids abrupt changes in the coding parameters. Each curve is designed in such a way that it prevents overflow of the virtual buffer. To this end, the quantization value for a full buffer is so large that all coefficients will be quantized to zero. The buffer 80 will hence definitively deplete, because the coded bit rate will be smaller than the peak channel bit rate. The way how to select a working point is described in the following.
Selection of the Working Point
The encoder's task is now determining the optimal graph, in a way that the buffer level 98 is near the optimum buffer level. For inhomogeneous images, this will obviously not be possible. For image regions which are easy to compress, the buffer level 98 will reduce, while for difficult image parts, the buffer fill level 98 will increase. Nevertheless, the encoder 71 can try to put the mean buffer fill level into the zone, where the quantization curves are flat. To decide whether the optimum working point is active, the mean buffer fill level or the mean quantization value can be monitored by rate controls 72 and 76. Furthermore, it can be considered whether the buffer 90 has filled or completed at the end of the image compared to the start of the image.
While the solution discussed above avoids a frame drop because of violating the maximum rate, processing very inhomogeneous images is still suboptimal. The reason is that when choosing one working point for the whole image, regions that are simple to compress will cause the buffer fill level to be very low, while it is large for difficult image regions. Both the low and the high fill level regions, however, are to avoid because small changes in the image lead to significant changes in the quantization, and hence visible artifacts.
To solve this problem, the image can be divided into different quantization slices, each having its own working point.
The solution proposed so far permits the precise control of the instantaneous bit rate generated by the video coder. In other words, the encoder is controlled in such a way that a constant bit rate codestream is generated, never exceeding the channel peak data rate. On the other hand, controlling the mean data rate is not directly possible. However, as discussed above, such an approach offers the advantage of reduced buffer sizes. To this end, a variable bit rate codestream needs to be generated, that
• meets a certain mean data rate, but
• does not exceed a certain peak data rate.
While the peak data rate can be precisely controlled with the approaches discussed so far, it is not possible to generate a data stream whose mean bit rate is smaller than the peak data rate.
To solve these drawbacks, a simple extension of the rate control mechanism is sufficient as illustrated in Fig. 1 1. To this end, a second virtual buffer 200 is introduced. The first virtual buffer 90 is as discussed previously. Its purpose is to avoid any overflow of the physical output buffer. Consequently, the virtual output rate 92 is set to the maximum rate that the physical channel can provide. Fig. 1 1 additionally shows that the virtual buffer 72 also comprises a computer 204 for computing a rate control parameter candidate 206 on the basis of the fill level 208 of the second virtual buffer 200, and a maximum selector 210 which chooses the maximum of the rate control parameter candidate 206 and the respective candidate 212 output by computer 150 on the basis of fill level 98 of the first virtual buffer 90, with the resulting maximum representing the actually adjusted rate control parameter 88. It should be kept in mind that the virtual buffer 76 on the decoding side looks the same even though Fig. 11 concentrates on the encoder only. Further, a virtual output rate controller 214 is shown to be optionally present in order to control or vary the mean output rate used as the read rate 202 in logging the fill level 208 of the second virtual buffer 200. Explicit signaling may be used in order to keep the synchrony to the decoding side. Further, it is exemplarily shown that working point computers 216 and 218 may be present in order to control the working point of computer 150 and 204, respectively, as described above, by surveying a running average of the fill level 98 and 208, respectively, and/or a running average of the respective rate control parameter candidate 206 and 212, respectively. It should be noted that, as described, the rate control parameter may be adjusted so as to correspond to the maximum of the rate control parameter candidates in case increasing the rate control parameter is intended to decrease the resulting coding bit rate as it was exemplarily the case in the afore-described figures. However, same may be adjusted to a minimum of the rate control parameter candidates in case decreasing the rate control parameter is intended to decrease the resulting coding bit rate.
The second virtual buffer 200 is for controlling the mean data rate. Consequently, the output rate 202 of this buffer is set to the desired mean data rate, of course smaller than the peak data rate. The size of the virtual buffer 200 basically defines the admissible variance in the mean data rate between different images. A very small buffer will cause that only very little variations around the mean data rate are possible. Hence, it is unlikely to exploit the available channel peak data rate. A large buffer, on the other hand, causes that the compressed image size may vary significantly between different images. A reasonable choice might hence be to specify the virtual buffer's 200 size between a half and a full compressed frame size. Note that in contrast to the peak virtual buffer, the mean virtual buffer does not need to have a corresponding physical memory.
The actual quantization index then simply equals the maximum between the one determined by the peak data rate control and the mean data rate control. In other words, if the data rate is so large that the physical output buffer risks overflowing, the peak data rate control will increase the quantization. Otherwise, the mean data rate control will basically adjust the necessary quantization. Coupling the computation of the working point for the peak and mean data rate control avoids that the peak virtual buffer always uses working point qwp = 0. One possibility consists for instance in using the same working point. Alternatively, it can be computed separately, but the integration is only active during times where the corresponding virtual buffer decides the actual quantization.
In the following, it is discussed how to account for header information generated by the packetizer 1 18.
To match a given data rate exactly, the rough estimation of the buffer level 90 is not sufficient. The overhead of the headers and control sequences have to be considered, too. For this reason, the amount of overhead data, which is inserted by the packetizer 1 18 is inserted to the virtual buffer 90 at a specific period in time. This period has to be tracked exactly, because the decoder 75 has to be able to add the same amount of overhead at the same period in time. More precisely, the amount of overhead, which is added to the bit stream by the packetizer 118 in the form of packet headers, has to be added to the virtual buffer 90 at the encoder as well as at the decoder at the same period in time respectively at the same processed pixel. Otherwise encoder and decoder could get out of sync.
The challenge consists hence in telling the decoder at what time it needs to add packet header overhead with the precision of the correct virtual time slice, which can be for example one coefficient. To this end, several solutions are possible:
• Adding an additional Golomb escape word. This, however, possibly affects the coding efficiency in a serious manner.
• Each time inserting a packet header, the encoder can add the number of coefficients already processed by the quantizer since the last header insertion. This information permits the decoder to basically count the dequantized coefficients and add the overhead at the right time to the virtual buffer. However, this solution creates an overhead.
• Addition of packet overhead can be synchronized implicitly in the encoder and decoder as described in the following.
Implicit synchronization of the packet header insertion requires that the decoder knows the rule set when the encoder adds a packet overhead to the virtual buffer. By applying the same rule set, the decoder can hence add the overhead at the right moment, so that the virtual buffer fill level 98 is the same for all processed coefficients.
To be more concrete, assume that the encoder decides to build a packet, when one of the channel FIFOs 116 in front of the packetizer 1 18 shown in Fig. 6 exceeds a given level of fullness. It will then collect the data from all input FIFOs 1 16, combine them into a packet and adds a corresponding header. At this moment in time, this packet header will also be added to the virtual buffer 90. In other words, the quantization factor for the coefficient quantized next will base on a modified virtual buffer fill level 98. Let's call this coefficient xn. It is hence the task of the decoder 75 to add the packet header to the virtual buffer 90 within the decoder 75 such that it impacts exactly coefficient x„ during decoding as well.
To this end, the decoder 75 needs to figure out which coefficients are actually contained in a received packet. By knowing the pipeline structure of the encoder 71, it can hence derive how many coefficients are still stuck in the encoder pipeline. In other words, it is possible to compute the coefficient that will first be impacted by the addition of the packet overhead to the virtual buffer. Note that the packet header overhead only concerns the coefficients that are encoded after the coefficients within the currently created packet.
While this seems to be straight forward, it has to be taken into account that a packet might contain incomplete coefficients because of entropy encoding. This is because of the variable length codes including possible zero run-length encoding. These bits have to be grouped into bytes or words in order to simplify the processing in the rest of the pipeline. Among others this permits to specify packet lengths in bytes or words, instead of bits, significantly reducing the overhead.
This, however, can cause that only some of the bits required to decode a coefficient in the entropy decoder are contained in the currently decoded packet. Similarly, it might happen that several coefficients already quantized are still stuck in the entropy coder, because the latter groups several coefficients until having completed a byte or word.
The following procedure presents a solution taking these difficulties into account:
1. Determine the number of bits associated to each channel within the currently processed packet.
2. Count the number of entropy coded bits which are getting decoded for each channel.
3. If the number of decoded bits equals or exceeds the number of bits available for the channel, the pixel which caused the packet creation is reached. At this point the packet header overhead has to be added to the virtual buffer 90. Care has to be taken about delays, which are dependent on the encoder pipeline structure.
At this point it is important to mention that coefficients coded with zero-run-length coding represent a special case in the above mentioned scenario. It might indeed happen that some of the coefficients are decoded with the wrong quantization index, if the packet header insertion occurred in the middle of a zero-run. This represents a serious problem, if rounding is performed during dequantization instead of quantization.
To solve this issue, two different approaches are possible:
• Application of rounding during quantization. In this case, dequantization of a zero value will deliver a zero as result, independent of the quantization factor.
• Strictly process the different channels in parallel and ensure that the channel that triggered the insertion of a packetizer header does not contain any incomplete zero run. This means that coefficients with the same virtual time base are processed simultaneously. Since the channel issuing the insertion of packet headers definitively contains no incomplete zero run, it can be computed for the other channels which coefficients are not influenced by the insertion of the header overhead. This has the nice side-effect that it is only necessary to count the bits for the channel that triggered the insertion of the packet header overhead.
Summarizing the above description of the functionality of the rate controls 72 and 76 of Fig. 6, reference is made to Fig. 12 showing a mode of operation of the rate controls of Fig. 3 and 6 in accordance with an embodiment involving a packetizer 118, wherein Fig. 12 concentrates on the logging of the fill level which is then used by the computers within rate control 72 and 76 to vary or adjust the rate control parameter 88 accordingly as described above. It should be kept in mind, that the functionality of the rate controls of Fig. 3 and 6, as described next with respect to Fig. 12, is done in a way triggered by the base en/decoding stages 70, 74, namely at times at which the en/decoding stage 70/74 has finalized processing a next frame fragment in line up to an extent sufficient to recognize its size occupying the bitstream.
The method starts with a setting 250 of the rate control parameter 88 to a default value. The default value may be known to the decoding side or rate control 76 by definition or by explicit signaling beforehand. Similarly, the current fill level of the virtual buffer needs to be properly initialized such as set to an empty state which may be known to the decoding side or rate control 76 by definition or by explicit signaling beforehand.
After that, the rate control waits for a new frame fragment being identified in the coded data stream in step 252 with the identification being performed by the encoding and decoding stages 70 and 74, respectively. For example, the identification is triggered whenever an encoding of a certain fragment has been finished, or whenever same has been decoded far enough in order to be able to identify the fragment borders of the fragment. As soon as an identification has been realized, an update procedure 254 is started by the rate control 72 and 76, respectively. In particular, it is first checked in step 254 whether the current frame fragment is a header or a payload frame fragment as discussed previously with respect to Fig. 9. If it is a header, the rate control determines the number n of frame fragments contained in the current packet in step 256 wherein reference is made to the above discussion as to how this determination could be performed. In effect, this step is to defer the fill level increase by the header length to the future, namely to when the nth payload frame fragment following the current header will be identified. After step 256, the current update 254 is finished and the waiting for the next frame fragment 252 is started.
If, however, the current frame fragment is determined in step 254 to be no header, the number n is decreased by 1 in step 258 and it is checked in step 260 whether n = 0, i.e. the end of the previous packet to which the header determined in step 265 belongs, has been reached. If yes, the fill level 98 is increased by the header size in step 262. After step 260 or 262, respectively, the fill level 98 is increased by the frame fragment size of the current frame fragment and decreased by the rate fraction associated with the frame fragment as discussed above with respect to Fig. 9 in steps 264 and 266, wherein after the update 254 is finished and the weighting procedure 252 for the next frame fragment is started again.
It should be noted that several amendments could be performed relative to the embodiment of Fig. 12. For example, the order among the steps 260, 262 on the one hand and steps 264 and 266 on the other hand may be changed. Moreover, in case of single-channel coding and the absence of a packetizer, only steps 252, 264 and 266 would have to be present.
In adjusting the rate control parameter 88 depending on the fill level 98, the adjustment is performed so that within each frame fragment, the rate control parameter is constant, and so that same is selected/computed based on the fill level 98 as not yet having been updated by the bit length of the frame fragment in logging the fill level.
Thus, summarizing the above embodiments, they solve the problems identified in the introductory section of the present specification. As discussed, rate control for streaming applications is a complex undertaken, because it has to consider the peak data rate of the transmission channel, the desired mean data rate for encoding, the size of the smoothing buffer, as well as the peak data rate produced by the core encoder. None of the known solutions provides a satisfactory result, in particular when very low latency and small buffer sizes are required. In summary, they are either computationally expensive, do not control peak and mean data rates, are not able to react sufficiently fast to avoid frame drops, or their signaling overhead is large.
The above embodiments provided a novel way for controlling the compression rate of a video (or audio) encoder using feedback control. Compared to schemes using iterative loops in order to find the correct compression parameters, this avoids the redundant encoding of the same data. In other words, the computation effort is reduced. While such a strategy can also be found in applications using for instance H.264, the presented rate control permits to change the quantization with every processed coefficient, without adding any overhead to the resulting codestream. This is, because it is not necessary to explicitly signal the quantization change. Thus compared to approaches based on H.264, the described rate control
• reduces the overhead for signaling quantization, and
· permits very fast reaction when changes in the image content causes abrupt changes in the required coding rate. This is particular useful when using very small buffer sizes, since in this case abrupt rate changes could result in frame drops or exceedance of rate. Our rate control module can guarantee to not provoke any frame drop or rate exceedance.
• The very small buffer sizes are beneficial to achieve ultra-low latency.
· For applications being less sensitive to latency, high image quality can be achieved, by providing a larger buffer. Its capacity can be fully used to balance changes in coding rate, and only increasing the quantization if the buffer almost overflows. This is beneficial for close to lossless compression. This tradeoff between quality and hardware complexity can be achieved without any change in the algorithm or implementation. Hence, it permits a universal applicability of the algorithm, as the proposed scheme adapts to a different buffer size.
• The rate control is able to control both the mean and the peak data rate. By these means it is possible to trade the FIFO size against the channel capacity. Suppose for instance, that a mean target bit rate of 500 Mbit/s is required. Then it is possible to use a small FIFO, and a channel offering 500Mbh7s, but this might require lots of coding parameter changes, possibly causing artifacts. Alternatively, a larger FIFO can be used. Or, the FIFO size can be reduced at the expense of having a channel with higher peak data rate than 500 Mbit/s.
Furthermore, the above embodiments ensure that the peak data rate of the channel is never exceeded. In addition, it can be used for both constant and varying target compression rates. Moreover, they avoid the building of complex models for predicting the bit rate beforehand, thus allowing flexible codec modifications. Finally they do not require to slice the coefficients into bits and process them individually, avoiding thus high computation effort.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer. A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non- transitionary.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver .
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are preferably performed by any hardware apparatus.
The above described embodiments are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.

Claims

Claims
Decoder for decoding a compressed data stream (100), comprising a decoding stage (74) configured to decode the compressed data stream (100) depending on a rate control parameter (88); and a rate control (76) configured to log a fill level (98) of a virtual buffer (90) for buffering the compressed data stream (100) and to adjust the rate control parameter (88) depending on the fill level (98).
Decoder according to claim 1, wherein the rate control parameter is selected such that a function Q(p) of a reconstruction quality Q at which the decoding stage (74) decodes the compressed data stream (100) varies in dependency on the rate control parameter p (88) .
Decoder according to claim 1 or 2, wherein the rate control parameter (88) comprises quantization values and the decoding stage (74) is configured to dequantize values in the compressed data stream (100) using the quantization values.
Decoder according to any of claim 1 to 3, wherein the compressed data stream (100) represents a temporal sequence of samplings of an information signal or a measurement signal,
a media signal,
- an audio signal, or
a video signal.
Decoder according to any of claims 1 to 4, wherein the rate control (76) is configured to use a desired or admissible maximum supported bitrate of a transmission channel (78) as a read rate (92) in logging the fill level (98).
Decoder according to claim 5, wherein the decoder further comprises a physical buffer configured to buffer the compressed bit stream prior to its decoding in the decoding stage and to be fed via the transmission channel at an actual, possibly time-varying transmission rate, wherein the rate control is configured such that the maximum supported bitrate of the transmission channel is, at least for a predetermined time-interval, independent from any possible temporal variation of the actual, possibly time- varying transmission rate.
Decoder according to claim 5 or 6, wherein the rate control is configured to receive a parameter indicating the maximum supported bitrate from an encoder generating the compressed data stream or a transmission site providing the transmission channel.
Decoder according to any of claims 5 to 7, wherein the rate control is configured to intermittently change the maximum supported bitrate.
Decoder according to any of claims 5 to 8, wherein the compressed data stream (100) represents a temporal sampling of a spectrally and/or spatially sampled information signal and is subdivided into a sequence of frames relating to a sequence of time stamps, each framing comprising one or more frame fragments, each frame fragment being associated with a corresponding portion of a spectral and/or spatial domain of the spectrally and/or spatially sampled information signal wherein the rate control is configured to distribute the maximum supported bitrate among the one or more frame fragments in relation to the size of the corresponding portions in the spectral and/or spatial domain to obtain rate fractions of the maximum supported bitrate and to decrease the fill level in logging the fill level frame-fragment-wise using the rate fractions.
Decoder according to claim 9, wherein the frame fragments are individual transform coefficients, or syntax portions relating to picture blocks.
Decoder according to claim 9 or 10, wherein the rate control is configured to perform the decrease of the fill level triggered by the decoding stage whenever a respective frame fragment has been processed by the decoding stage.
Decoder according to any of claims 5 to 1 1 , wherein the rate control is configured to increase the fill level in logging the fill level frame-fragment-wise according to an actual bit length of the respective frame-fragments in the compressed datastream.
13. Decoder according to any of the previous claims, wherein the rate control (76) is configured to update the fill level (98) in logging the fill level (98) at times occurring at a rate at which the decoding stage (70) processes the compressed data stream (100).
Decoder according to any of the previous claims, wherein the rate control (76) is configured to take headers in the compressed data stream into account in logging the fill level (98).
Decoder according to any of the previous claims, wherein the rate control parameter is selected such that a reconstruction quality at which the decoding stage (74) decodes the compressed data stream (100) decreases with an increase of the rate control parameter and increases with a decrease of the rate control parameter and wherein the rate control (76) is configured to increase the rate control parameter (88) responsive to the fill level (98) approaching a full state of the virtual buffer (90) and decrease the rate control parameter (88) responsive to the fill level (98) approaching an empty state of the virtual buffer (90).
Decoder according to any of the previous claims, wherein the rate control is configured to perform the adjustment of the rate control parameter depending on the fill level using a fill level to rate control parameter mapping function which is parameterizable using a working point parameter, wherein the rate control is configured to control the working point parameter depending on the fill level and the rate control parameter.
Decoder according to any of the previous claims, wherein the rate control is configured to additionally log a further fill level of a further virtual buffer for buffering the data stream, wherein the rate control is configured to use a maximum supported bitrate as a read rate in logging the fill level of the virtual buffer and use a mean data rate as a read rate in logging the further fill level of the further virtual buffer, wherein the rate control is configured to adjust the rate control parameter depending on the fill level and the further fill level.
Decoder according to claim 17, wherein the rate control is configured to individually determine rate control parameter candidates based on the fill level and the further fill level, respectively, and adjust the rate control parameter so as to correspond to a maximum of the rate control parameter candidates or so as to correspond to a minimum of the rate control parameter candidates.
19. Encoder for encoding a compressed data stream (100), comprising a encoding stage (70) configured to generate, by encoding, the compressed data stream (100) depending on a rate control parameter (88); and a rate control (72) configured to log a fill level (98) of a virtual buffer (90) for buffering the compressed data stream (100) and to adjust the rate control parameter (88) depending on the fill level (98), wherein the encoder is configured to implicitly signal the rate control parameter using the fill level (98), rather than explicitly signaling same within the compressed data stream.
20. Encoder according to claim 19, wherein the rate control parameter is selected such that a function Q(p) of a reconstruction quality Q at which the encoding stage (70) encodes the compressed data stream (100) varies in dependency on the rate control parameter p (88).
21. Encoder according to claim 19 or 20, wherein the rate control parameter (88) comprises quantization values and the encoding stage (70) is configured to quantize values in the compressed data stream (100) using the quantization values.
22. Encoder according to any of claim 19 to 21, wherein the compressed data stream (100) represents - a temporal sequence of samplings of an information signal or a measurement signal,
- a media signal,
an audio signal, or
a video signal.
23. Encoder according to any of claims 19 to 22, wherein the rate control (72) is configured to use a desired or admissible maximum supported bitrate of a transmission channel (78) as a read rate (92) in logging the fill level (98). 24. Encoder according to claim 23, wherein the encoder further comprises a physical buffer configured to buffer the compressed bit stream prior to its transmission over a transmission channel to a decoder and to be emptied via the transmission channel at an actual, possibly time-varying transmission rate, wherein the rate control is configured such that the maximum supported bitrate of the transmission channel is, at least for a predetermined time-interval, independent from any possible temporal variation of the actual, possibly time-varying transmission rate.
Encoder according to claim 23 or 24, wherein the rate control is configured to send a parameter indicating the maximum supported bitrate to the decoder or to receive the parameter from a transmission site providing the transmission channel.
Encoder according to any of claims 19 to 25, wherein the rate control is configured to intermittently change the maximum supported bitrate.
Encoder according to any of claims 19 to 26, wherein the compressed data stream (100) represents a temporal sampling of a spectrally and/or spatially sampled information signal and is subdivided into a sequence of frames relating to a sequence of time stamps, each framing comprising one or more frame fragments, each frame fragment being associated with a corresponding portion of a spectral and/or spatial domain of the spectrally and/or spatially sampled information signal wherein the rate control is configured to distribute the maximum supported bitrate among the one or more frame fragments in relation to the size of the corresponding portions in the spectral and/or spatial domain to obtain rate fractions of the maximum supported bitrate and to decrease the fill level in logging the fill level frame-fragment-wise using the rate fractions.
Encoder according to claim 27, wherein the frame fragments are individual transform coefficients, or syntax portions relating to picture blocks.
Encoder according to claim 27 or 28, wherein the rate control is configured to perform the decrease of the fill level triggered by the encoding stage whenever a respective frame fragment has been encoded by the encoding stage.
Encoder according to any of claims 19 to 29, wherein the rate control is configured to increase the fill level in logging the fill level frame-fragment-wise according to an actual bit length of the respective frame-fragments in the compressed datastream.
Encoder according to any of claims 19 to 30, wherein the rate control (72) is configured to update the fill level (98) in logging the fill level (98) at times occurring at a rate at which the encoding stage (70) encodes the compressed data stream (100).
Encoder according to any of claims 19 to 31 , wherein the rate control (72) is configured to take headers in the compressed data stream into account in logging the fill level (98).
Encoder according to any of claims 19 to 32, wherein the rate control parameter is selected such that a reconstruction quality at which the encoding stage (70) encodes the compressed data stream (100) decreases with an increase of the rate control parameter and increases with a decrease of the rate control parameter and wherein the rate control (72) is configured to increase the rate control parameter (88) responsive to the fill level (98) approaching a full state of the virtual buffer (90) and decrease the rate control parameter (88) responsive to the fill level (98) approaching an empty state of the virtual buffer (90).
Encoder according to any of claims 19 to 33, wherein the rate control is configured to perform the adjustment of the rate control parameter depending on the fill level using a fill level to rate control parameter mapping function which is parameterizable using a working point parameter, wherein the rate control is configured to control the working point parameter depending on the fill level and the rate control parameter.
Encoder according to any of claims 19 to 34, wherein the rate control is configured to additionally log a further fill level of a further virtual buffer for buffering the data stream, wherein the rate control is configured to use a maximum supported bitrate as a read rate in logging the fill level of the virtual buffer and use a mean data rate as a read rate in logging the further fill level of the further virtual buffer, wherein the rate control is configured to adjust the rate control parameter depending on the fill level and the further fill level.
Encoder according to claim 35, wherein the rate control is configured to individually determine rate control parameter candidates based on the fill level and the further fill level, respectively, and adjust the rate control parameter so as to correspond to a maximum of the rate control parameter candidates or so as to correspond to a minimum of the rate control parameter candidates..
37. Method for decoding a compressed data stream (100), comprising decoding the compressed data stream (100) depending on a rate control parameter (88); logging a fill level (98) of a virtual buffer (90) for buffering the compressed data stream (100); and adjusting the rate control parameter (88) depending on the fill level (98).
38. Method for encoding a compressed data stream (100), comprising a encoding stage (70) configured to generate, by encoding, the compressed data stream (100) depending on a rate control parameter (88); and a rate control (72) configured to log a fill level (98) of a virtual buffer (90) for buffering the compressed data stream (100) and to adjust the rate control parameter (88) depending on the fill level (98), wherein the encoder is configured to implicitly signal the rate control parameter using the fill level (98), rather than explicitly signaling same within the compressed data stream.
39. Computer program having a program code for performing, when running on a computer, a method according to claim 37 or 38.
EP12791806.8A 2012-11-30 2012-11-30 Compressed data stream transmission using rate control Withdrawn EP2926556A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2012/074086 WO2014082680A1 (en) 2012-11-30 2012-11-30 Compressed data stream transmission using rate control

Publications (1)

Publication Number Publication Date
EP2926556A1 true EP2926556A1 (en) 2015-10-07

Family

ID=47257848

Family Applications (1)

Application Number Title Priority Date Filing Date
EP12791806.8A Withdrawn EP2926556A1 (en) 2012-11-30 2012-11-30 Compressed data stream transmission using rate control

Country Status (2)

Country Link
EP (1) EP2926556A1 (en)
WO (1) WO2014082680A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020084474A1 (en) 2018-10-22 2020-04-30 Beijing Bytedance Network Technology Co., Ltd. Gradient computation in bi-directional optical flow
CN116527882A (en) 2018-11-07 2023-08-01 寰发股份有限公司 Video block encoding or decoding method and apparatus using current picture reference encoding scheme
EP3857879A4 (en) 2018-11-12 2022-03-16 Beijing Bytedance Network Technology Co., Ltd. Simplification of combined inter-intra prediction
JP7241870B2 (en) 2018-11-20 2023-03-17 北京字節跳動網絡技術有限公司 Difference calculation based on partial position
EP3915259A4 (en) 2019-03-06 2022-03-30 Beijing Bytedance Network Technology Co., Ltd. Usage of converted uni-prediction candidate

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5677969A (en) * 1995-02-23 1997-10-14 Motorola, Inc. Method, rate controller, and system for preventing overflow and underflow of a decoder buffer in a video compression system
US5686963A (en) * 1995-12-26 1997-11-11 C-Cube Microsystems Method for performing rate control in a video encoder which provides a bit budget for each frame while employing virtual buffers and virtual buffer verifiers

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
None *
See also references of WO2014082680A1 *

Also Published As

Publication number Publication date
WO2014082680A1 (en) 2014-06-05

Similar Documents

Publication Publication Date Title
US8345754B2 (en) Signaling buffer fullness
KR101643790B1 (en) Multiple bit rate video encoding using variable bit rate and dynamic resolution for adaptive video streaming
US8265140B2 (en) Fine-grained client-side control of scalable media delivery
EP1180900A2 (en) Separating and merging of MPEG-2 coded bitstream
US20070263720A1 (en) System and method of adaptive rate control for a video encoder
JP4656190B2 (en) Information processing apparatus and method
WO2014082680A1 (en) Compressed data stream transmission using rate control
US20090196344A1 (en) Method and apparatus for transcoding between hybrid video codec bitstreams
WO1996026595A1 (en) Method and apparatus for preventing overflow and underflow of an encoder buffer in a video compression system
US8948242B2 (en) Encoding device and method and multimedia apparatus including the encoding device
US20060171463A1 (en) Apparatus, system for, method of and computer program product for separating and merging coded signal
US20150312601A1 (en) Methods and apparatuses including a statistical multiplexer with multiple channel rate control
JP2012060261A (en) Image processing apparatus and method
KR102361206B1 (en) Method and apparatus for encoding or decoding using subband dependent prediction adaptation for GCLI entropy coding
US10412424B2 (en) Multi-channel variable bit-rate video compression
US10205943B2 (en) Bitrate distribution
WO2021061024A1 (en) An encoder, a decoder with support of sub-layer picture rates
KR20180122354A (en) Apparatus and methods for adaptive computation of quantization parameters in display stream compression
US20130093853A1 (en) Information processing apparatus and information processing method
JP5950157B2 (en) Image processing apparatus and method, and program
WO2014182389A1 (en) Methods and apparatuses including a statistical multiplexer with bitrate smoothing
JP6344386B2 (en) Time-series data encoding apparatus, method and program, and time-series data re-encoding apparatus, method and program
EP2247110A2 (en) Image encoding apparatus, image encoding method and medium on which image enoding program is recorded
KR102525584B1 (en) Spatial layer rate allocation
US10432934B2 (en) Video encoding device and video decoding device

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20150626

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20180917

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20190328