GB2485576A

GB2485576A - Video compression using quantisation based on pixel tile bandwidth and transform coefficients

Info

Publication number: GB2485576A
Application number: GB1019602.0A
Authority: GB
Inventors: William Stoye
Original assignee: DisplayLink UK Ltd
Current assignee: DisplayLink UK Ltd
Priority date: 2010-11-19
Filing date: 2010-11-19
Publication date: 2012-05-23
Anticipated expiration: 2030-11-19
Also published as: WO2012066292A1; GB2485576B; EP2641399A1; GB201019602D0

Abstract

A system and method of image compression comprising receiving a frame of pixel tiles / blocks, determining the available bandwidth of each pixel block and, for each colour channel of each pixel block, transforming the pixel data to create a series of coefficients and performing a quantisation of the series of coefficients using a quantisation level selected according to the bandwidth and coefficient size. Bandwidth may be expressed as a bit rate per pixel. The available bandwidth may be varied for different tiles of the same pixel tile. The transform may comprise a Haar transform or a Discrete Wavelet Transform. Entropy coding of the quantised series of coefficients may be performed. Selecting the quantization level may comprise mapping the coefficient size to one of a plurality of predetermined levels and using a quantisation level pre-assigned to the mapped level. When bandwidth is restricted this flexible compression method allows different types of images to be compressed.

Description

DESCRIPTION

VIDEO COMPRESSION

This invention relates to a method of compressing a frame of pixel tiles.

The compression of video data is a large and wide-ranging technical field. In general, as display devices such as televisions and computer monitors have increased in size and resolution and the number of sources of video has increased through the expansion of television channels and Internet sites, then the importance of saving bandwidth by compressing video has correspondingly increased. Well-known technologies such as JPEG and MPEG provide compression technologies that are in wide use throughout various different industries, particularly television broadcast and computing.

is These compression technologies operate on the principle that there are large temporal and spatial redundancies within video images that can be exploited to remove significant amounts of information without degrading the quality of the end user's experience of the resulting image.

For example, a colour image may have twenty-four bits of information per pixel, being eight bits each for three colour channels of red, green and blue. Using conventional compression techniques, this information can be reduced to two bits per pixel without the quality of the final image overly suffering. This can be achieved by dividing the image into rectangular blocks (or tiles), where each block is then subjected to a mathematical transform (such as the Discrete Cosine Transform) to produce a series of coefficients.

These coefficients are then quantized (effectively divided by predetermined numbers) and the resulting compressed image data can be transmitted. At the receiving end, the data is decompressed by performing reverse quantization and reversing the transform to reconstruct the original block. Other steps may also occur in the process, such as entropy encoding, to further reduce the amount of data that is actually transmitted.

Compression technologies that are based around the principle of transforming tiles and then quantizing the resulting coefficients are highly effective at reducing the amount of video data that then has to be transmitted.

However, they are not necessarily as flexible as is desirable in the specific situation being used. It is known that certain types of images compress much better than others and techniques that are appropriate for photographic type images do not work as well with desktop type images, and vice versa. When bandwidth is restricted and different types of images need to be compressed a highly flexible approach to the compression is desirable.

It is therefore an object of the invention to improve upon the known art.

According to a first aspect of the present invention, there is provided a method of compressing a frame of pixel tiles comprising receiving a frame of pixel tiles, determining a bandwidth available for each pixel tile, and for each is colour channel of each pixel tile performing a transform of the pixel data to create a series of coefficients, selecting a quantization level according to a function of the determined bandwidth and the size of the coefficients, and performing a quantization of the series of coefficients using the selected quantization level.

According to a second aspect of the present invention, there is provided a device for compressing a frame of pixel tiles comprising an encoder arranged to receive a frame of pixel tiles, determine a bandwidth available for each pixel tile, and for each colour channel of each pixel tile perform a transform of the pixel data to create a series of coefficients, select a quantization level according to a function of the determined bandwidth and the size of the coefficients, and perform a quantization of the series of coefficients using the selected quantization level.

Owing to the invention, it is possible to provide a method for compressing a video image into a compressed tile format whereby the quantization level is selected based on the distribution of the transform coefficient values in order to meet an approximate target compressed image size. The quantization level for each video tile must be provided for the quantization stage of compression. The quantization level corresponds approximately to an image quality level, and straightforward execution of the compression algorithm would require that the desired quality level is an input to the compression process. However, at a given quality level the compressed image size may vary wildly depending on the input image.

Preferably, the step of selecting a quantization level according to a function of the determined bandwidth and the size of coefficients comprises mapping the coefficient size to one of a plurality of predetermined stages and using a quantization level pre-assigned to the mapped stage. Once the io coefficients have been obtained from the transform, their size is determined, which may be an absolute measure or may be based on the number of significant bits that are present, for example. This size is then used to select a quantization level. Predetermined stages (eight for example) that are specific to the determined bandwidth can be used to map the coefficient size to a is specific stage which has a quantization level pre-assigned to the specific stage. In this way the quantization level is chosen. Effectively a matrix of predetermined quantization levels is used, which could be 8x8, with eight different bandwidths on one axis and eight different coefficient size ranges on the other axis.

In a server providing compressed screens to many remote clients, the aggregate of compressed image size is likely to be the correct determining factor in deciding the quality level to use, due to finite network bandwidth. This leads to a chicken-and-egg problem in deciding the quality/quantization level to use. This becomes a problem when users display images which compress less well than expected. Typically this is caused by very noisy images, such as a screen full of tiny writing or rapidly changing or very busy video imagery. For example an image may start out as twenty-four bits/pixel (eight each of red, green, blue). If there are used quantization settings which compress a smooth image to one bit/pixel, for a noisy image, the same settings only get to two bits/pixel. As a general rule, at more extreme quantization (= lower quality) the difference is greater than at high quality. It is advantageous to compress with a requested bit/pixel level, rather than a requested quality.

One solution to this is to rely on statistical multiplexing, in order to hope that if some users require the compression of problem images, others may be less demanding. This will work on many occasions but can sometimes break down. For instance, if many users are watching the same multicast video, or in a teaching context are all requested to perform the same actions on their terminals, then this statistical assumption breaks down. The best solution to this problem of quality against bandwidth is to determine a budget of network bandwidth (i.e. compressed image size) for each compressed tile, dependent on the total amount of compression activity required in any given phase of activity in the server.

The quantization level for a video tile is determined after the transform stage of compression and before quantization. The input to the transform for a 64 x 64 video tile is 4096 pixel values, each expressed as three colour components (Y, Cr, Cb), a total of 12288 values. The output of the transform is is 4096 coefficient values. Coefficient values can be quantized (= reduced in size) with little effect on human perception of the image, this is why the transform stage is performed. The relevant statistics can be collected in parallel with the transform stage of compression, as each transform result is produced.

Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which:-Figure 1 is a schematic diagram showing the processing of a video frame, Figure 2 is a schematic diagram of components of a computer, Figures 3 and 4 are schematic diagrams of components of an encoder, Figure 5 is a schematic diagram illustrating the transform of a tile of a video frame, and Figure 6 is a flowchart of a method of compressing a video frame.

Figure 1 shows a frame 10 comprised of tiles 12. The frame is compressed through the process steps SI colour transform, S2 tile transform, S3 quantization and S4 entropy coding. This type of compression is used for example when a server is running virtual machines or client sessions for remote client devices. For example, a single server may have twenty remote clients connected, and the server must provide twenty images, each image representing the image of the client session that must be displayed at the respective client device. Owing to the limits on current connections technologies this type of server-client system will only work if the outgoing video data is highly compressed from the original size.

To compress a frame of video data that is expressed in conventional RGB format it is desirable to perform a colour transform to the Y,Cb,Cr domain. There is then performed a tile transform using, for example a Haar transform or a 5/3 Discrete Wavelet Transform (DWT). The Haar transform converts an nxn tile into low frequency data, a (n/2) x (n12) tile, where each pixel is the average of four input pixels and high frequency data, n 2 -(n12)2 is additional values showing local pixel deltas from the low frequency values.

This can be repeated multiple times on a large square, re-visiting the low frequency data at each iteration.

The DVVT transform does exactly the same process but it uses low frequency data, a 5-tap-sinc-like-FIR-filtered sub-sampled (n/2)x(n12) tile, taps (-1 2 6 2 -1) and high frequency data, n2 -(n/2)2 additional values showing local pixel deltas from the average of two or four adjacent pixels. Generally, for photographic data the DWT will typically produce far more tiny numbers than Haar. For synthetic images the Haar transform will usually do a better job because a sharp edge between two fixed values will create several non-zero coefficients with DWT.

The compression uses a tile size of 64x64 pixels. The transform turns a tile into a series of coefficients and these are then quantized and entropy coding. The compression uses a scheme called Run Length Golomb Rice.

This has no tables but uses two adaptive parameters to scale the coded values depending on the distribution of input values. In relation to the tile transform, the DWT transform does not break down into neat 2x2 groups like Haar but is performed in strips (horizontally and vertically) on a large tile. This is explained in more detail below with reference to Figure 5.

Figure 2 illustrates schematically some of the components of a server 14, which has a central processing unit (CPU) 16, a graphics processing unit (CPU) 18 and a PCI encoder 20. Obviously other components of the server 14 would also be present, such as memory and other interfaces, but have been omitted for clarity purposes. The CPU 18 is for controlling the output of a local display device connected to the computer and can be used in the compression of images if required. The PCI encoder 20 is the principal component for io compressing of the multiple video streams received from the CPU 16.

The hardware encoder 20 takes un-coded video tiles as input and produce coded data messages as output. It is important for the encoder 20 to do all of the steps (colour transform, tile transform, quantization and entropy coding) because otherwise the 10 bandwidth is increased. The encoder 20 is does not need to hold an entire screen image at one time. Because of this the encoder 20 can be built on a FPCA which does not need any external storage (DDR). None of the processing steps need vast storage, e.g. a 64x64 tile requires 12KB in RCB form. Holding more tiles would increase system throughput, but it need not be huge. In hardware an RLCR entropy encoder or decoder will be very small, because it has no tables.

Pixels to be encoded are delivered optimally over PCle from the CPU 18 to the encoder 20, by the encoder 20 doing a PCI Read or the CPU 18 doing a PCI Write. Each PCIe lane delivers 4Cbps. If pixels are packed as 32 bits then this makes l25Mps (Mega pixels/second). For an FPCA-based system, 125MHz is achievable. 250MHz is possible. This means an n-lane PC1e interface will feed pixels at n pixels/clock (125Mhz) or n/2 pixels/clock (250MHz) into the encoder. Desirable values of n are 4, 8, 16. There is value in keeping up with the bulk arrival rate of pixels in the encoder 20. Otherwise there has to be added a storage buffer to the start of the encoder 20 which adds little value, and increases encode latency.

A full high-definition video at 30Hz is 60Mpixels/second, so an n-lane PCIe interface can drive 2xn screens at this update rate. In practical commercial systems it would be reasonable to support more screens than this, but this is a higher level product spec choice. At this level it's more relevant to think of the number of worst-case sessions that can be supported. Also, note there may be other bottlenecks in the system, such as the CPU, the CPU or any LAN interface that is present.

The encode stages are as follows. Firstly there is a colour transform, which can be done in parallel on as many pixels as required. Inputs are 8 bits, output to be determined but perhaps 10-11 bits. It is possible that Y will require more bits than Cr, Cb. If necessary it is possible to add clip logic to control this, the whole protocol is slightly lossy so this is acceptable if done carefully. From here on the three colour channels are split apart and dealt with in parallel.

The next stage is the tile transform. The entire transform consists of what amounts to straightforward 3-tap and 5-tap FIR filters so it is possible to achieve whatever level of parallel operation is required. The coefficients are is fixed and are simple integer values, no multiplies required.

The encoder 20 includes a stage 1 vertical filter, which must keep up with arriving pixel rate, 5x64 pixel delay line required. Assuming 30 bits/pixel this makes 9600 bits regardless of calculations/cycle. This is done as flip-flops because of the bandwidth required. Similarly, a stage 1 horizontal filter is needed, which must keep up with arriving pixel rate, five adjacent pixels. At this point % of the pixels go straight to the entropy coder, 1⁄4 (32x32) require stage 2 wavelet transformation. These naturally arrive at half the overall pixel arrival rate, so it is possible to halve the width of any processing.

The encoder 20 also includes a stage 2 vertical filter, 5x32 pixel delay line required, using 4800 flip-flops and a stage 2 horizontal filter, 5 adjacent pixels. At this point another % of the pixels go to the entropy coder, 1⁄4 (16x16) require stage 3 wavelet transformation. These arrive at a quarter of the overall pixel arrival rate. The stage 3 vertical filter, with 5x16 pixel delay line required, has another 2400 flip-flops. There is also a stage 3 horizontal filter, 5 adjacent pixels. Of these, another 1⁄4 (8x8) go through a final delta-coding stage. These are effectively the DC values of 3x3 sub-portions of the whole tile.

Quantization is done as a simple shift, followed by rounding. This should be done as soon as possible after the final filtering stage as it allows data path widths to be reduced. The values must be fed to the entropy coder in the correct order. About % of the tile can feed directly from the tile filter into the entropy coder, but 1⁄4 of the tile (the 1024 pixels that require stage2/stage3 processing) have to be stored while the rest are entropy coded. This store is SRAM, though it is not large (4KB). Its bandwidth must be at least half the pixel arrival rate.

Entropy coding is performed separately on each of Y, Cr, Cb. The entropy coder is adaptive. The number of bits used to encode a value depends on two parameters kP and kRP, and these adapt as each value is encoded, depending on the size of each passing value and the current values of the parameters. This means that it is not possible to do entropy coding in parallel on values in a stream, although fortunately each channel of the tile is a is separate stream.

The speed of the encoder 20 is very important. It is simple to add, subtract and compare values, but if only one per cycle can be handled then the encoder 20 cannot output more than one pixel per cycle for a tile. Ideally the encoder 20 will perform 2 or 4 per cycle. On an FPGA, 4 per cycle at 125MHz is easier than 2 per cycle at 250MHz.

ideally the entropy coding can be performed at the input pixel rate. For at least part of the tile processing, the values can arrive this fast. If they are processed slower than this then there is a need for more store internally; the encoder 20 can make the CPU 16 wait longer for each tile; and there may need to be more replications of the entropy coding circuit in order to keep the whole system busy, i.e. to allow the host to feed data into the encoder 20 at the full interface rate. The result of entropy coding should be built up in three separate memories, as their final size is unknown. There will need to be multiple copies of these in order to keep the whole system busy.

Once the process is complete a separate command from the CPU 16 will cause the encoder 20 to write the output packet to a chosen address over PCIe. In typical use, the output will be much smaller than the input, compression by x5...xlO ought to be achievable. The CPU 16 gives one command to encode a tile. When the encoding is complete it is told the size of the result. It then gives a separate command to write the output over PCI, as where to write it may be affected by its size, for example packing into transport frames may be needed. There will need to be several copies of the output buffer in order to allow the encoder to stay busy while this happens.

An alternative approach is to process multiple tiles in parallel. This removes some tricky cases for the register transfer language (RTL). However, as a general rule this will increase the size of the encoder 20 because there is more storage required: an initial memory pool to absorb PCIe input; additional copies of the filter delay lines; more output buffers. The CPU 16 gets a very low latency service. Encoding is complete shortly after the tile transfer has completed (little more than 1⁄4 of a tile transfer time I think). The CPU 16 only needs 2-3 tiles in flight in order to keep the encoder fully busy.

is If the output of the compression of the tile is still very big, because this tile has a lot of detail, then the CPU 16 has the option to re-compress with stronger quantization. An improvement to the encoder 20 is to do the transform, sum the unquantized coefficients, and choose the quantization level in an intelligent manner. The cost of this is that the entire tile (about 16KB) must be stored after the tile transform; and latency seen by the CPU 16 is increased.

The encoder 20 can use an input interleave. On input it is possible to load several (say 4) horizontally adjacent tiles at a time. The advantage of this is that PCI transfers are 1024 bytes (256 pixels) rather than 256 bytes (for one tile), so that use of the PCI bus is more efficient. A store performs "de-interleave" which is analogous to the interleaving of RS codewords in a modem. To de-interleave 4 tiles only needs a store which can exactly contain 4 tiles (e.g. 4XI2KB) with a total bandwidth 2XPCI arrival rate. The store can be organised with its own internal interleaving to get the required bandwidth with single-port RAMs. This is easiest if input and output are synchronous. This store can be organised so that any number up to the maximum can be loaded.

This is carried out before colour transform because colour transform causes some bit growth.

Figures 3 and 4 shows more detail of the encoder 20. A PCI interface 22 connects to the CPU 16 over a PCI bus. The interface logic 22 connects to a set of control registers 24. Downstream of the interface 22 is a colour transform unit 26 and the tile transform logic 28. The tile transform unit 28 outputs to a tile store 30. The tile store 30 connects to a quantization stage 32 and downstream of the quantization stage 32 is an entropy encoder 34, which connects to an output store 36.

In relation to the Discrete Wavelet Transform, the diagram is scaled by the factor n, the number of PCIe lanes. Desirable values for n are 4, 8, 16. This part of the device processes the input data at "line rate", i.e. however fast it arrives over PCIe. A tile is completed and stored in the tile store 30 in little more than the PCIe transfer time. In relation to the entropy coder 34, shown in is Figure 4, the second part of the encoding process goes more slowly because in large configurations the entropy coding cannot keep up with the rate of data arrival. This is scaled by the factor n2, the number of steps/cycle of RLGR entropy coding that can be achieved. Likely values for n2 are 1, 2, 4 or 8, depending on clock speed and technology.

In relation to the tile store 30, the number of "tiles in flight" needed to keep the engine fully busy is somewhere between n/n2 and 2xn/n2, depending on how often the CPU 16 checks whether a tile has been completed.

It would be possible to do quantization before the tile store 30. The value of doing it afterwards is that it allows the hardware to determine a quantization level, based on some statistical measure of the values that enter the tile store. If there are many large values (meaning that the current tile is hard to compress) then it may be desirable to increase the quantization level, so that the result is usefully compact. Quantization of the DC values (top left 8x2 pixels, LL3 channel) is done before delta coding.

The entropy coder output must be stored again in an output store 36, as shown in Figure 4, because at this point its size is not known, so for all but the first channel the encoder 20 does not know where to put it. The 12KB value suggested above means "tile not compressed at all" and is assumed to be a worst case. Some upper bound must be chosen, above which the compression has failed (i.e. the tile must be recompressed with greater quantization). The tile store 30 and the output store 36 could be the same memory system or could be separate, depending on bandwidth requirements. The best structure (and interleave organ isation etc.) will vary depending on n and n2.

The complete tile is copied over PCIe to the intended destination address in host memory. To make best use of the interface 22 this should drive all n PCIe lanes at full rate. When this happens there is no input traffic, so this is an advantage for having a combined tile store 30 and output store 36. Tile input (over PCIe) and tile output (over PCIe) are not active simultaneously. This also makes it desirable to have the tile store 30 and the output store 36 as the same memory.

The three two-dimensional DVVT filters have identical logic within certain is parameters. The bits/value, might not be the same for each channel and/or level (e.g. Y might need more than Cr, Cb). The values/cycle, scaled to fit required throughput within a tile and vertical spacing, 64/32/16 depending on level. No multipliers are needed because the coefficients are all simple integers. Figure 5 illustrates the mechanics of the tile transform process carried out by the tile transform logic 28. The individual colour components of the tile 12 are each processed three times with a DWT. Each pass of the DWT is carried out firstly vertically and then horizontally through the tile.

Logically the CPU 16 must perform two operations for each tile. Firstly, the CPU 16 must start a compress operation. The CPU 16 must tell the CPU 18 to write to encoder 20 (or tell the encoder 20 to read from the CPU 18).

There are a few parameters such as quantization level, RLGRI/3 selection. An output tile ID ( index in tile store and output store) should also be selected.

The second operation is that once compression is complete, data in the output store must be identifiable by the tile ID quoted in the first operation. The CPU 16 has to check pass/fail, read output size, decide where the data is required and tell the encoder 20 to write to required destination. The intended output address could be specified at the start and this reduces the CPU involvement in each tile, but only works if the address of each does not rely on the compressed size of the previous tile.

A preferred embodiment is for the CPU 16 is to provide a scatter list of output (location, size) entries. Each entry is tagged with whether a tile can be split over the end of the entry. This allows a set of network buffers to be described, where each buffer might be split up due to logical/physical translation. The encoder 20 has to indicate which have been used somehow.

Within a buffer there has to be allowed enough space for a TS_RFX_TILE block header. This is 19 bytes including (x, y) coordinates, length fields, quantization table fields, then the bit-packed data.

When under high system load, higher compression factors have to be used. Assuming a working average compression factor of six, this leads to a 4 bits/pixel output. One full high-definition times 30Hz = 60M pixels/second = 240Mbps. Gigabit Ethernet ports will quickly get saturated by not many is screens. Therefore, when all users are watching movies the encoder 20 has to compress far more than a factor of six. Two bits/pixel at 15 fps is probably near the low end before users start to notice a degredation in quality. Reducing the refresh rate saves work all round and reducing the quality increases the asymmetry between encoder input and encoder output. Therefore content-based quantization selection within the encoder 20 is a very useful solution to provide compression flexibility. The encoder 20 will compress by the "right" amount far more often, when compared with the CPU guess the quantization level and then re-encoding is an expensive operation.

High performance PCIe throughput is essential for a high performance encoder product. Each PCIe lane can support uncompressed data for two full high-definition 30Hz updating screens, as a theoretical maximum. So, a l6xPCle GPU and a l6xPCle Encoder, with everything else perfect, cannot achieve more than 32 such screens. The current structure of GPU interfacing in Windows 7 does not allow movement of Windows 7 screen data direct between the GPU 18 and the encoder 20. At the very least the data would move twice over the PCle bus: once moved by a DirectXl 0 primitive which copies a texture from CPU 18 to system memory, and then again as the encoder 20 issues PCIe reads to that texture.

If the CPU 18 has l6xPCle and the CPU 16 does a good job of this copy then this reduces the theoretical maximum bandwidth, in particular for a high performance encoder with l6xPCle lanes would likely be halved in its throughput. Therefore the known path is for the encoder 20 to perform PCIe READs from texture memory in store. It is desirable that the encoder 20 would accept PCIe WRITE operations containing the pixel data.

The encoder 20 provides content-sensitive quantization. The value of doing quantization after the tile store is that it allows the hardware to determine a quantization level, based on a statistical measure of the coefficient values that enter the tile store. If there are many large values (which implies that this tile is hard to compress) then it is desirable to increase the quantization level, so that the result is usefully compact.

is For the CPU 16, the high level decision is to compress to available bandwidth. Based on a gross measure of current activity, each tile is given a compressed size target at the start of the compress operation. The coefficients exist in ten sub-bands, ranging from low frequency (most important) to high frequency (can be quantized most) data. The encode process can quantize each sub-band separately, so that each tile has ten quantization values sent with it.

As the transform coefficients are stored the encoder 20 collects statistics about them. The most desirable statistic to collect is the number of significant bits for a range of quality settings, all in parallel. When the tile is ready for entropy coding the encoder 20 picks the quality setting where the number of significant bits does not exceed the desired encoded tile size. The entropy coder will not result in precisely this many bits but on average it will be close enough to be useful.

A simple area of the screen will not require much space even when coded at maximum possible quality. So, there is little waste and complexity in one area of the screen will not compromise quality in another. There are other strategies which the CPU 16 can use to balance quality in different areas, or to mend tiles which were sent at low quality and where there is now spare bandwidth available.

There is a further adaptation that can be used when the computer 14 is under heavy load. The CPU 18 subsamples the entire screen in store. The subsampled screen is then passed to the encoder 20 in 32x32 tiles, reducing the initial transfer time by 3h. The encoder 20 operates to encode these as if the high-frequency coefficients are all zeros. Or, the transfer happens 64x64 but the encoder 20 then appears to produce 4 encoded tiles (in a 2x2 square).

This reduces POle load a great deal for cases where the encoder 20 is going to quantize the high frequency coefficients out of existence anyway.

The fundamental part of the compression algorithm is shown in Figure 6. The method of compressing the frame of pixel tiles comprising, firstly step 6.1, which comprises receiving a frame of pixel tiles, secondly step 6.2 which comprises determining a bandwidth available for each pixel tile, and then for is each colour channel of each pixel tile steps 6.3, 6.4 and 6.5 are repeated. This method is executed by the encoder 20, receiving the frame of pixels from either the CPU 16 or the CPU 18, with the information about the bandwidth being supplies by the CPU 16. Preferably, the available bandwidth for each pixel tile is expressed as a bit rate per pixel.

Step 6.3 comprises performing a transform of the pixel data to create a series of coefficients, step 6.4 comprises selecting a quantization level according to a function of the determined bandwidth and the size of the coefficients, and step 6.5 comprises performing a qua ntization of the series of coefficients using the selected quantization level. In this way the tiles of the frame are compressed, using a quantization level that is selected intelligently in real-time as the compression is carried out. Essentially a desired size for the post-quantization data is known and the quantization level is selected to achieve that desired size, taking into account the coefficients that have resulted from the transform step.

Step 6.4 uses estimates of the final entropy-coded size mapped to a range of different quantization settings. The estimates need not be exact in order to be useful, as meeting an approximate target for the encoded size of the tile is sufficient to meet design objectives concerning bandwidth management. The tile is quantized using a range of quantization values, with different values for each channel and each sub-band. A number of possible estimation methods are possible, ranging in complexity.

In one simple embodiment, the quantization settings for the tile are approximated using a single "quality" metric which is expressed as an expected bit/pixel value. These are determined by exhaustive search over a chosen set of reference images. A small finite set of quality metric settings is chosen, giving "minimum compression", "maximum compression", and a range of values between. Eight values would be sufficient. Each quality setting provides a qua ntization level for each sub-band. For each quality metric setting a statistic is gathered over the coefficients. In a hardware implementation these can be done in parallel. The statistic is the sum over all the coefficients of (number of significant bits in the coefficient -quantization shift value for this is quality metric, for this sub-band) (or 1 if this is c= 0). Having gathered this metric, select the quality level where the statistic most closely meets the desired target tile size in bits.

The statistic can be improved by giving special consideration to zero coefficients, where the entropy coder in use can code runs of zeros in less than one bit. When a run of zeros appears within a sub-band, after a fixed number of zeros their size is taken to be 0 rather than 1. Making this change after six consecutive zeros gives reasonable results, but the optimum will depend on the precise entropy coding system in use.

Claims

CLAIMS1. A method of compressing a frame of pixel tiles comprising: o receiving a frame of pixel tiles, o determining a bandwidth available for each pixel tile, and o for each colour channel of each pixel tile: § performing a transform of the pixel data to create a series of coefficients, § selecting a quantization level according to a function of the io determined bandwidth and the size of the coefficients, and § performing a quantization of the series of coefficients using the selected quantization level.
2. A method according to claim 1, wherein the available bandwidth is for each pixel tile is expressed as a bit rate per pixel.
3. A method according to claim 1 or 2, and further comprising varying the bandwidth available for different tiles of the same pixel tile.
4. A method according to claim 1, 2 or 3, wherein the transform comprises a Haar transform or a Discrete Wavelet Transform.
5. A method according to any preceding claim, and further comprising performing entropy coding of the quantized series of coefficients.
6. A method according to any preceding claim, wherein the step of selecting a quantization level according to a function of the determined bandwidth and the size of coefficients comprises mapping the coefficient size to one of a plurality of predetermined levels and using a quantization level pre-assigned to the mapped level.
7. A device for compressing a frame of pixel tiles comprising an encoder arranged to: o receive a frame of pixel tiles, o determine a bandwidth available for each pixel tile, and o for each colour channel of each pixel tile: § perform a transform of the pixel data to create a series of coefficients, § select a quantization level according to a function of the determined bandwidth and the size of the coefficients, and § perform a quantization of the series of coefficients using the selected quantization level.
8. A device according to claim 7, wherein the available bandwidth for each pixel tile is expressed as a bit rate per pixel.
9. A device according to claim 7 or 8, wherein the encoder is further arranged to vary the bandwidth available for different tiles of the same pixel tile.
10. A device according to claim 7, 8 or 9, wherein the transform comprises a Haar transform or a Discrete Wavelet Transform.
11. A device according to any one of claims 7 to 10, wherein the encoder is further arranged to perform entropy coding of the quantized series of coefficients.
12. A device according to any one of claims 7 to 11, wherein the encoder is arranged, when selecting a quantization level according to a function of the determined bandwidth and the size of coefficients, to map the coefficient size to one of a plurality of predetermined stages and using a quantizafion level pre-assigned to the mapped stage.