WO2000058911A9

WO2000058911A9 - Advance memory reduction system for image processing systems

Info

Publication number: WO2000058911A9
Application number: PCT/US2000/008684
Authority: WO
Inventors: Bruce K Holmer; Stanley Craig Nelson
Original assignee: Teralogic Inc; Bruce K Holmer; Stanley Craig Nelson
Priority date: 1999-03-31
Filing date: 2000-03-30
Publication date: 2002-04-04
Also published as: WO2000058911A3; WO2000058911A2; AU4061500A

Abstract

An image processing system receives discrete cosine transform coefficient derived from decompressed MPEG-2 frames including I, P and B frames and applies a scalable tree encoding of the transformed coefficients. In this state, the I, P and B frames can be stored quickly and efficiently. When the frames are needed for further processing, the system decodes the coefficients and applies an inverse DCT to obtain the I, P and B frames. Then, the I, P and B frames may be passed to the MPEG-2 motion compensation algorithm for recovery of the original image frames and subsequent display.

Description

ADVANCED MEMORY REDUCTION SYSTEM FOR IMAGE PROCESSING SYSTEMS

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention is directed to a system and method for reducing memory consumption in image processing systems. More specifically, it relates to a system for reducing the amount of memory required to store decoded graphic image frames compressed using the MPEG-2 image compression standard and similar techniques.

2. Description of Related Art

MPEG-2, formally described in International Standards Organization document ISO/IEC 13818 (incorporated herein by reference), is an international standard for compressing moving images such as video and their associated audio information. Video or other moving picture information to be compressed is, if necessary, transformed into the YUN (luminance-chrominance) color space, and divided into macro-blocks each consisting of, e.g., a 2x2 array of 8x8 blocks of Y components and an 8x8 block each of Cb and Cr components (alternatively, other formats may be used). At the macro-block level, redundancy in sequential image frames is exploited by applying a motion compensation algorithm in which some image frames are directly encoded as intra-frames (I-frames) and motion vectors are used to represent information present in a previous frame (for predictive frames or "P-frames") or in either a previous or subsequent frame (for bidirectional frames or "B-frames"). The frames are encoded at the block level using a discrete cosine transform (DCT). The transformed blocks are then quantized and Huffman encoded to compress the data.

The hierarchy of I, P and B frames is particularly important to the MPEG-2 scheme. Since the I frames are calculated independently of other frames, they can be used as arbitrary selection points for fine-grain accessing of the image sequence after it begins. In contrast, since the P and B frames require decoding of I frames and possibly P-frames before they can be accessed, they aren't as readily employed for random accessing of the image sequence.

The decompression process is essentially the reverse of the compression process. Compressed MPEG-2 data is Huffman decoded, transformed by an inverse DCT, and the original image frames are recovered by a complementary motion compensation algorithm. If necessary, the image is transformed from the YUN color space to an appropriate space such as RGB space.

Although the MPEG-2 standard provides excellent compression results for image data so that it can be transmitted or stored efficiently, its encoding and decoding processes can be quite memory-intensive. For example, to decode a MPEG-2 B frame, a memory must be capable of containing two frames, e.g., an I and P frame, as well as the B frame. Depending on the format of the source image data, the memory may need to be a large as 12 MB. Thus, while the MPEG-2 standard may alleviate data transmission and storage demands, it at least partially does so by increasing the burden on the MPEG decoder.

One attempt to solve this problem is described in European Patent Application Publication Number 0,778,709 Al which uses adaptive pulse code modulation (ADPCM) to recompress the I and P frames after they are obtained from MPEG-2 decompression and stores the ADPCM-encoded data in memory. Then, the ADPCM data is read from the memory and the I and P blocks are recovered. The B blocks are then generated using a macroblock to raster scan conversion, and the I, P and B frames are passed to the MPEG-2 motion compensation algorithm.

Although this technique does realize some reduction in memory requirements, it can achieve only a 2:1 reduction in data size. Further, it complicates the recompression process because luminance and chrominance data are compressed and decompressed by different amounts, thereby requiring different actions depending on whether the data is luma data or chroma data. Further, this prior art technique places additional constraints on the system by requiring a separate memory for storing data required by the decompression process.

Also, United States Patent No. 5,748,116, entitled "System and Method for Nested Split Coding of Sparse Data Sets" (Chui & Yi) which is hereby incorporated by reference, describes data compression techniques which are suitable for processing image data; however, techniques described therein perform best with sparse data sets and don't handle relatively dense data sets arising from the small blocks typically employed in the MPEG-2 standard.

BRIEF SUMMARY OF THE INVENTION

A data processing method is described. In one embodiment, the method comprises partitioning of coefficients into groups so that coefficients expected to have the same magnitude on average are grouped together and encoding grouped coefficients into a compressed bit stream. BRIEF DESCRIPTION OF THE DRAWINGS

The above and further aspects of the present invention will become readily apparent after reading the following detailed description in conjunction with the appended drawings in which:

FIGURE 1 is a block diagram of one embodiment of an image processor;

FIGURE 2 shows an exemplary ordering of DCT coefficients prior to scalable encoding;

FIGURE 3 A and 3B are a flowchart showing one embodiment of a DCT tree encoding process;

FIGURE 4 is a block diagram of one embodiment of a tree encoder;

FIGURE 5 is a block diagram of one embodiment of an AC encoder in the tree encoder of FIG. 3;

FIGURE 6 is a flowchart showing one embodiment of a DCT tree decoding process;

FIGURE 7 is a block diagram of one embodiment of a tree decoder implementing a DCT coefficient procedure;

FIGURE 8 is a block diagram of one embodiment of a bit classifier in the tree decoder of FIG. 6; and

FIGURE 9 shows an alternative ordering of DCT coefficients prior to scalable encoding. DETAILED DESCRIPTION

A method and apparatus for performing encoding and/or decoding is described. In the following description, numerous details are set forth. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.

A system and method are described which can perform encoding and decoding of data, particularly MPEG-2 image data, while, in one embodiment, requiring a relatively small amount of memory space. One or more embodiments of the system and method may perform encoding and decoding of image data with little or no loss in final image quality. One or more embodiments of the system and method may perform encoding and decoding of image data while preserving the fine-grain positional accessibility of the final image sequence as well as fine-grain temporal accessibility even in encoded form and/or at various fixed compression rates. The system and method may perform encoding and decoding of image data using a uniform algorithm for data in each component of the image data color space. The decoding of image data may be performed in such a way so as to not be processor-intensive nor significantly increase the complexity or cost of the system.

In one embodiment, an image processing system receives discrete cosine transform coefficients derived from decompressed MPEG-21, P and B frames and applies a scalable tree encoding to the transformed coefficients. In this state, the I, P and B frames can be stored in a compact form and sections thereof can be quickly and efficiently retrieved. When the frames are needed for further processing, the system decodes the coefficients and applies an inverse DCT to obtain the I, P and B frames. Then, the I, P and B frames may be passed to the MPEG-2 motion compensation algorithm to generate I and B frames. I, P and B frames may be decompressed and passed on for recovery of the original image frames and subsequent display.

The basic structure of one embodiment of an MPEG-2 decoder is shown in FIG. 1, in which a memory 10 storing graphics data is writeable via a write bus 12 and independently and simultaneously readable via two read buses 14 (of course, the use of this bus structure is simply an implementation choice, and other architectures such as a single shared bus may be used). The write bus 12 writes data from an Advanced Memory Reduction (AMR) encoder 16 into the memory 10. One of the read buses 14 provides data from the memory 10 to an AMR decoder 18, and the other read bus 14 provides data from the memory 10 to another AMR decoder 20. Two AMR decoders are used so that while one decoder is used to drive a display processor 22, the other can be used by MPEG core engine 24 to decode MPEG I and P frames stored in memory 10 for use in generating MPEG P and B frames as will be familiar to those skilled in the art.

AMR encoder 16 intercepts frames sent from the MPEG core engine 24 to the system write bus 12 to compress the data before putting it on the write bus 12 for storage by memory controller 10, and the AMR decoder 18 intercepts compressed data on the read bus 14 which is destined for the MPEG core engine 24 and decodes it to generate blocks suitable for the MPEG core engine 24 before passing them thereto. Similarly, AMR decoder 20 receives data from the memory controller 10 destined for display processor 22 and decodes the data to generate lines before providing them to the display processor 22.

The MPEG core engine generates I, P and B frames in a manner well-known in the art. In one embodiment, these frames are divided into 8x2 blocks of bytes, transformed by a DCT into 8x2 blocks, and the 8x2 blocks of DCT data are output to the AMR encoder 16. Note that the frames may be divided into other block sizes in alternative embodiments. As is known in the art, the DCT coefficients can be used to model the original image data set, where each coefficient is associated with a corresponding frequency term of the model. An example of an 8x2 block of luminance data is shown in TABLE I below.

T A B LE I

The result of a DCT transform of the above luminance data is shown in TABLE II.

T A B L E π

Now, the DCT coefficients are assembled into a histogram-like structure as shown in TABLE m in which the DCT coefficients are linearly arrayed along the abscissa of the histogram and the binary representation of each coefficient extending along the ordinate of the histogram, with the least significant bit of each being lowermost and leading zeroes being omitted. As can be seen from a comparison of TABLES II and IQ, the DCT coefficients are not merely transferred to the histogram in left-to-right, top-to-bottom order; rather, a transformational mapping is applied as in FIG. 2. Thus, the first four coefficients in the upper row of the DCT coefficient array occupy the first four columns of the binary coefficient array, followed by the first four coefficients in the lower row of the DCT coefficient array. These are followed by the fifth coefficients in the upper and lower rows of the DCT coefficient array, the sixth coefficients in the upper and lower rows, etc. The reason for this mapping will be explained shortly.

T A B L E m

Using this ordered set of uncompressed parameters, encoding begins on a bit plane basis. Given the arrangement of data, a simplistic compression scheme would be to simply output the bin / ary digits from the binary coefficient array in a raster scan fashion, e.g., all digits in the 512-place position, all digits in the 256-place position, etc., until the maximum number of bytes for the desired compression ratio have been output. For example, using an 8x2 block and a 4:1 compression ratio permitting a four byte output, only the 512-place digits and the 256-place digits would be output. This technique, however, doesn't exploit the fact that the DC coefficient is usually the largest of the group and the other coefficients generally have several leading zeroes. One embodiment of a DCT tree encoding process that makes use of these facts as follows.

This embodiment is shown in FIG.3A and FIG.3B and reference may also be made to the C source code of APPENDIX I. This embodiment may be understood to include two parts: a DC encoding process which codes all bits down to the first bit plane having non-zero AC coefficient bits; and an AC encoding process which codes all remaining bits beginning with the first non-zero AC plane. First, the non-DC DCT coefficients in the binary coefficient array are partitioned into four groups as shown in TABLE El above. Then, the scalable transform begins.

First, the most significant bit (here, the 512-place bit) of the DC coefficient is output (see Step S100 of FIG. 3 A). Then, for each bit plane beginning with the 256-place plane (Step S102), the bit of the DC coefficient is output as is (Step S104) (the non-DC bits in the 512- place plane are not checked because they are necessarily "0" due to the nature of the DCT). Then, if all of the non-DC coefficient bits in a given plane are zero (Step S106), a single "0" is output to signify that all of the remaining bits in the plane are zero (Step S 108). When a bit plane has a "1" bit in a non-DC coefficient, a "1" is output to end DC encoding and begin AC encoding (Step SI 10).

Then, for each of groups 1-4 (Steps SI 12 - S136), if no "1" has appeared in any coefficient in the group in a higher-order bit plane (Step SI 14) and there are no "1" bits in the group in the current bit plane (Step SI 16), a "0" flag is output to signify that all bits in the group are zero (Step S132). If, on the other hand, there are no "1" bits in higher-order bit planes for this group but there is a "1" bit in the current bit plane (Step SI 16), a "1" flag is output (Step SI 18).

If there are "1" bits in the group in the current bit plane and a "1" flag has been output (step SI 18) or, also, if there were "1" bits in higher-order bit planes for this group (Step SI 14), then, beginning with the first coefficient in the group (Step S 120), each bit is checked to see if it is "1" (Step S122). If so and it is the first "1" for that coefficient (Step S124), the "1" is output with its sign (Step S126) ("0" may be used to represent a positive sign and "1" may be used to represent a negative sign). If it is not the first "1" for that coefficient, the "1" is output without its sign (Step S128). If the bit is not "1", a "0" is output (Step S130). When all coefficients have been processed (Step SI 34), execution proceeds to the next group (Step S 136). When all groups in the bit plane have been processed, execution proceeds to the next bit plane (Step S138) and its DC bit is output (Step S140), and the process repeats until all bit planes have been processed.

Applying the technique to the above example, using Table HI as input and beginning with the 512-place plane, its DC bit "1" is output (see TABLE IV below).

T A B L E IV

Moving on to the 256-place plane, a "0" is output for the DC coefficient bit followed by a "0" indicating that all non-DC coefficients in the plane are "0". Moving on to the 128- place plane, a "0" is output for the DC coefficient followed by a "1" indicating the presence of at least one non-zero AC coefficient bit in the plane.

Then, each group is handled in turn. There are no non-zero coefficient bits in this plane in Group 1 (nor in higher-order bit planes), so a single "0" is output. Group 2 has a nonzero AC coefficient for the first time, so a "1" indicating the existence of the first non-zero AC coefficient bit for this group is output. Then, each bit in the group is output, followed by its sign if the bit is a "1" (sign bits are only output for the first non-zero bit in a coefficient, and since this bit plane is the first to have a non-zero coefficient for this group, any "1" occurring therein can be assumed to be the first). Groups 3 and 4 have no non-zero coefficients; thus, zeroes are output for them.

In the 64-place plane, a "1" is output for the DC coefficient and, taking Group 1, a "1" is output since that group includes a "1" for the first time, followed by the individual bits: "1" (followed by its sign, represented here by "0"), "0" and "0". The process is repeated for the remaining groups to produce "11000" for Group 2 and "0" for each of Groups 3 and 4, since they have no "1" bits.

Moving to the 32-plane, the DC coefficient "0" is output, followed by the Group 1 and 2 bits as is (with the inclusion of sign bits as appropriate, of course), since "1" bits have previously occurred in those groups. The process continues as described above, and the result is shown in TABLE IV. As can be seen from TABLE IV, the result of the scalable transformation may be thought of as having a number of bit planes, where each bit plane includes an uncompressed section (the DC coefficient bit) and a compressed section (the AC coefficient bits).

The result of the encoding operation is a net increase from 128 bits (the 8x2 block of bytes) to 134 bits — a less than effective result if this were the end of the compression method. However, now that the DCT coefficients have been arranged and encoded to take advantage of their inherent redundancies, the data can be compressed by truncating the encoded bit stream to give an arbitrary compression ratio. For example, since the original data was 128 bits long, taking only the first 64 bits of the encoded stream (all encoded bits corresponding to the 512- place plane through the 16-place plane except for the last bit in the 16-place plane) yields a 2: 1 compression ratio; similarly, taking only the first 32 bits (all encoded bits corresponding to the 512-place plane through the 64-place plane, as well as the encoded bits corresponding to Group 1 in the 32-place plane) yields a 4:1 compression ratio.

It is possible to achieve the above-described goal of encoding image data with little or no loss in final image quality using this compression technique for two reasons which can be understood by inspecting the arrangement of coefficients in TABLES π and HI. First and most importantly, the coefficients are grouped so that coefficients which tend to be of the same general power-of-two magnitude, i.e., all coefficients within a group tend to fall within the range 2^N"K < ICI < 2^N, where C is a coefficient in the group, N is an arbitrary integer and K is a small integer, i.e., 1, 2 or 3; preferably, the smaller the better. That is, within a group the binary magnitudes of the coefficients tend to differ by only a few orders. This means that with reference to TABLE in, ideally there should be a homogenous block of zeroes (or blank spaces in the Table) above the bit plane where most of the coefficients in the group should have their first non-zero coefficient. Since the encoding scheme can encode each all-zero row of this homogenous block of zeroes with a single "0" bit (Step SI 08), substantial compression can be effected.

Second, it has been found that the ordering shown in TABLE HI usually ensures that the coefficients contributing most to the image are most likely to survive the truncation step. In other words, TABLE m orders the DCT coefficients so that the coefficients which are most likely to make a larger contribution are ordered toward the beginning of each bit plane, and therefore are likely to have more of their bits included in the truncated bit stream. However, since most bit planes in the Table will either be fully encoded (in the example, the 512-place through 128-place planes for 4:1 compression and the 512-place through 64-place planes for 2:1 compression) or completely omitted (the 32-place through 1-place planes for 4:1 compression and the 16-place through 1 -place planes for 2:1 compression), this ordering provides an advantage only in the plane in which truncation occurs; thus, its contribution to the overall effectiveness of the encoding process is less than that of the first point described above.

The grouping of coefficients and the ordering of the groups is not infallible for a given set of coefficients and is merely designed to meet the above requirements for the largest number of cases on average. For example, in the example given above, in no group do all coefficients of the group have the same power-of-two magnitude as described in connection with the first point above. Further, with respect to the second point, the first coefficient of Group 3 has been ordered behind coefficients which are smaller in magnitude. However, it has been found that in most cases the above grouping and ordering satisfactorily predicts which coefficients will have like magnitudes as well as the largest magnitudes among the DCT coefficients. The ordering of the coefficients based on their expected magnitudes, rather than their actual magnitudes in a particular block, will be called the "expected magnitude" of the coefficients.

Further, it is possible to achieve the above goal of performing encoding of image data while preserving the fine-grain accessibility of the image sequence even in encoded form because the image data is compressed in units of 8x2 byte blocks at the same ratio and consequently can be indexed easily. Therefore, any byte in the original image can be located with a resolution of sixteen bytes. Finally, since the method is equally applicable to image data whether it be chroma or luma data, processing is uniform for all components of the image color space.

It should be noted that at each step where a bit is examined, the system checks to see if it is the final bit in the stream. If so, execution proceeds to the next process.

Optional features may be added to provide further compression. For example, it was noted above that the grouping of the coefficients is designed so that coefficients which tend to differ in binary magnitude by only a few orders are in the same group. If the ordering is designed so that within the group, the coefficient most likely to be largest is at a known position, e.g., in the first position within the group, the coefficient next most likely to be largest is at another known position, etc., the encoding of the groups can be partially tokenized for additional compression. For example, suppose that coefficients are ordered within a given group so that the coefficient most likely to be the largest is first; the coefficient next most likely to be largest is second; etc. This information can be used to assign tokens to the most common combinations and an escape code for the remaining combinations. Another alternative technique makes use of the fact that given a "1" flag indicating the existence of the first non-zero AC coefficient in any bit plane (e.g., in the 128-place bit plane in TABLE SI) or a "1" flag indicating the existence of the first non-zero AC coefficient within a group (e.g., for Group 3 in the 64-place bit plane in TABLE SI), a combination of all zeroes encoding the bit plane or the group, respectively, is not allowed. That is, suppose the encoding process produces the partial output shown in Table V below. Now, since the initial "1" flag indicates there must be a non-zero

T A B L E V

AC coefficient in the 128-place bit plane and the only place it could possibly be is in Group 4, there is no need to provide a flag for Group 4 indicating that it has its first non-zero AC coefficient. The Group 4 bits can be encoded directly, and a bit is saved in the encoding.

Similarly, assume that the process produces the output shown in TABLE VI below. Since Group 2 has a flag indicating that it has a non-zero AC coefficient and three coefficient bits have been specified as "0", the remaining coefficient bit in the group must be a "1". Thus, it need not be encoded and only its sign bit need be included.

T A B L E VI

One hardware implementation of the AMR encoder 16 preferably reflects the DC- AC encoding structure of the encoding process described above. As shown in FIG. 4, DCT coefficients are stored in an input buffer 200. Both the DC encoder 202 and the AC encoder

204 receive the coefficient bits one bit plane at a time. The DC encoder 202 essentially performs the operations shown in the first part of the flowchart of FIG. 3 A, i.e., encoding the DC coefficient and AC coefficients up to the first bit plane which has a nonzero AC coefficient bit. At that point, controller 206 activates the AC encoder 204 to encode the bit planes starting with the one having the first nonzero AC coefficient bit. The outputs of the DC encoder 202 and the AC encoder 204 are fed to a code combiner 208 which performs bit manipulations as described below to provide a steady bit stream to output buffer 210. Once the bits are stored in the output buffer 210, they may be read and passed on to the memory 10 via the write bus 12.

If the AMR system is used in a real-time image processing environment, for example, in an HDTV-to-NTSC converter or the like, it is preferable that the system be implemented in hardware and provide at least a minimum number of encoded bits each cycle, where the minimum number is determined according to the speed, latencies and the like of the various components used in the image processor as will be readily apparent to those of skill in the art. For example, it may be desirable that the encoder write seven bits to the output buffer 210 each cycle. While this may not be a problem in the lower-order bit plane where the coefficient bits are generally encoded in one-to-one correspondence, it may be difficult to satisfy this constraint in the highest-order bit planes, where an entire plane may collapse to only two bits (one bit for the DC coefficient bit and a zero bit indicating the entire AC coefficient plane is zero). Thus, after passing the most significant DC coefficient bit to the code combiner, the DC encoder 202 processes four bit planes at a time to ensure that even in the worst-case scenario where four contiguous planes have no non-zero AC coefficients, at least seven (actually, eight) bits can be generated. When the number of bits produced by the DC encoder 202 is not exactly seven (as is usually the case), the code combiner 208 outputs the first seven bits, advances the write pointer in the output buffer 210, and holds the remaining bits to be appended as the head of the next output group.

When DC coding is complete, the controller 206 causes the code combiner 208 to begin using the output of the AC encoder 204. For each group in a bit plane, in addition to the coefficients themselves, the AC encoder uses at most five bits of state information describing the contents of previous planes: one bit denotes whether the group had a nonzero coefficient bit on any higher-order plane, and four bits (three in the case of Group 1) denote whether the corresponding coefficient within the group had a nonzero coefficient bit on any higher-order plane (the former may of course be derived by logical ORing together the bits of the latter to reduce the amount of state information that must be carried). As shown in FIG. 5, this information is received by four AC code generators 212 and four corresponding AC count generators 214. Each of the AC code generators 212 uses the corresponding coefficient data from the input buffer 200 and the above-described state information to encode its group, and each of the AC count generators 214 computes the number of bits in the group encoded by the corresponding AC code generator 212. This count information is used by a shift generator 216 to enable the code combiner 208 to align the bits of the encoded groups to assemble the bit stream outputted to the output buffer 210. As in the case of the DC encoder 202, the code combiner 208 holds the output of the AC encoder 204 until seven bits are available for writing to the output buffer 210 and holds the remainder for writing in a subsequent cycle.

The complementary decoding operation may also be understood to have two parts, a DC decoding process which decodes bits down to and including the first plane containing a non-zero AC coefficient bit, and an AC decoding process which handles the remaining bits. The DC decoding process begins by outputting the first bit in the bit stream as the most significant bit of the DC coefficient as shown at Step S300 in FIG. 6. Then, the decoding operation enters a loop in which the next bit is output as the DC coefficient bit for a given bit plane (Step S302) and, if the following bit is a zero (Step S304), indicating the entire AC portion of the bit plane is zero, fifteen zeroes are output to finish the plane (Step S306).

If, on the other hand, the bit following the DC coefficient bit is a one (Step S304), the plane is the first to contain a nonzero AC coefficient bit and execution moves from DC encoding to AC encoding. At this point, the current group is set to Group 1 (Step S310).

At the beginning of the AC decoding process, after setting the current bit plane and current group indications, the current group is checked to determine if any of its coefficients have had a nonzero bit in a higher-order bit plane (Step S312). If not, the process checks the next bit in the bit stream to see if it is a one, indicating that the group contains a nonzero bit for the first time (Step S314). If the bit is zero, there are no one bits in this group, a string of four zeroes are output (only three for Group 1), and the group is finished. If, on the other hand, the bit examined in Step S314 is a one, the group contains a one for the first time. Then, each bit in the group is examined (Step S318), and if it is a one, a one is output with the following bit as its sign (Step S320); if it is not a one, zero is output (Step S322). This process is repeated for each bit in the group (Step S324).

If, on the other hand, a one has previously occurred as the bit for one of the coefficients in the group, each bit in the group is examined to see if it is a one (Step S326). If not, a zero is output (Step S328); if so, the bit is output with its sign if it is the first one to occur for that coefficient (Steps S330, S332) or it is output alone if it is not the first one for that coefficient (Step S334). The process is repeated for each bit in the group (Step S338).

The above process proceeds for each group in the current bit plane (Steps S340, S341) and for each bit plane (Steps S350, S351) until the end of the bit stream is reached. Since the bits representing the remainder of the complete DCT coefficient set have been truncated, if the decoder were to simply substitute zeroes for the missing bits to complete the coefficient set, a systematic bias toward lower-order low-magnitude coefficients would be created. Instead, the decoder assumes that the remaining bits for each non-zero coefficient are "1000. . .", which is the median between the two extremes of the missing bits for a coefficient being "1111. . ." and "0000. . .". Once the DCT coefficient set has been reconstructed in this way, the coefficients are passed to an inverse DCT process.

Like the hardware implementation of the AMR encoder 16, the hardware implementation of the AMR decoders 18 and 20 may be constrained by speed requirements of the circuitry in which it is used. Assuming the decoders 18 and 20 need to decode seven bits per cycle, one hardware implementation of the AMR decoder 18 is shown in FIG. 7. Here, encoded bits stored in an input buffer 400 (read from the memory 10 via one of read buses 14) are presented to a bit classifier 402. The bit classifier 402 receives the encoded bits seven at a time and, for each provides a seven-bit representation thereof (described in greater detail below) to a set enable generator 404. The set enable generator 404 uses the seven-bit representation to selectively set bits in an output buffer 406 (initialized to all zeroes at the beginning of the block decode).

First, when the bit classifier 402 receives seven bits from the input buffer 400, it classifies each into one of thirty-six categories based on the bit itself and twenty state bits for a given bit plane (fifteen bits indicating whether the corresponding coefficient has had any nonzero bits in higher-order bit planes, four bits indicating whether the corresponding group has had any non-zero bits in higher-order bit planes, and one bit indicating whether any AC coefficient has had a nonzero bit in a higher-order bit plane). Categorization of each encoded bit is possible given the bit itself, the state information and the state of the previous bit, so a pipeline architecture as shown in FIG. 8 is used for the bit classifier 402. Here, the encoded bits from the input buffer 400 are provided at inputs 408a-408g of bit parsers 410a-410g, and the state information and classification information for the previous bit generated by state logic 412 and classification logic 414, respectively, is passed from stage to stage in the pipeline. One category classifies the bit as a DC coefficient bit; fifteen categories classify the bit as one of the fifteen AC coefficient bits; fifteen categories classify the bit as a sign of one of the fifteen AC coefficient bits. Additionally, one category is for the flag indicating that the entire AC plane is zero, one category is for the flag indicating that a bit plane is the first to contain a non-zero AC coefficient, and four categories are for each of the four flags indicating that all bits in the corresponding group are zero.

The classification of each bit is represented in one-hot form (the bit corresponding to the appropriate category is set and all others are zero) and converted into a seven-bit representation of the classification, where the first bit is the actual encoded bit, the next two bits denote whether the bit is a DC bit, AC bit, sign bit or flag bit. The lower four bits denote the coefficient to which the bit belongs if it is an AC bit or sign bit. This seven-bit classification appears on the outputs 416a-416g of the bit parsers 410a-410g for use by the set enable generator 404. Based on the seven-bit representation of each of the encoded bits from the input buffer 400, the set enable generator 404 can reproduce the original DCT coefficient bits and store them in the output buffer 406. Additionally, after all encoded bits have been read from the input buffer 400, the set enable generator 404 generates a final coefficient to avoid low-magnitude bias as described above.

Those skilled in the art will recognize that numerous variations on the above techniques are of course possible. For example, the technique is not limited to use with YUV image data in an MPEG-2 environment, and other color spaces, for example, RGB, monochrome or the like may be used. Further, the image data need not be processed in 8x2 byte blocks, and blocks of a different size may be selected depending on such criteria as compression efficiency, granularity of addressability, speed (the larger the block, the larger the number of intermediate values and operations) and the like (however, it is preferable that the blocks have dimensions which are powers of two to ensure easy addressability).

Moreover, the coefficients need not be partitioned in the DC-3-4-4-4 format used in the preferred embodiment, and alternative groupings such as DC-3-3-4-5 or DC-3-3-3-3-3 can alternatively be used. For example, Figure 9 shows an alternative grouping of DC-3-3-4-5. This grouping was found to have approximately the same quality as the DC-3-4-4-4 partitioning. Whatever grouping is used, the coefficients should be assigned to the groups so that as many zeroes as possible uniformly appear in the high-order bit planes of the group.

Also, although the embodiment above uses only a single encoder and decoder, multiple units may be employed to implement parallel processing of DCT coefficient blocks and increase the throughput of the device. In such cases, it is preferable that a buffer system be used with at least one surplus buffer that can be written to or read from by external circuitry while the other buffers are in use. Further, the same number of encoders and decoders need not be used, and a system according to the present invention may employ, e.g., three encoders and four decoders.

Moreover, a DCT need not be used to generate coefficients, and another transform such as projection of the data onto various orthonormal basis functions can be used instead. Preferably, the transform is one that will generate coefficients which can be generally characterized as to their relative contributions to the original data.

Still further, once the block of DCT coefficients is generated, all of them need not be used in the bit stream encoding process; for example, rather than using all coefficients from TABLE El above to generate TABLE IV, only the most significant coefficients, i.e., the eight coefficients in the left half of TABLE HI, need be used. The loss of information content is of little consequence if the image is filtered and downsampled before display — for example, when converting an image from High Definition Television (HDTV) format to a regular NTSC, PAL or SECAM format television signal as mentioned above. Moreover, although mathematical modeling indicates that this variation has a higher overall error than the basic embodiment, it has been found that the resultant image is in fact qualitatively superior to a human viewer. This is believed to be because the coefficient truncation eliminates high frequency artifacts and noise which detract from the overall picture quality.

Alternatively or in conjunction with omission of all but the eight most significant coefficients, the system may reduce the amount of data stored by dividing some DCT coefficients by a power of two prior to encoding, an operation which may be implemented in hardware easily with shift registers.

Also, compression ratios other than 2: land 4:1 can be implemented by truncating the encoded bit stream at the appropriate position. Also, rather than assuming truncated bits are the median of their possible values during the decoding process, the decoder may generate random bits as replacements for the truncated bits. Such variations as those mentioned above are intended to be within the spirit and scope of the present invention.

APPENDIX A

#include <stdio .h> #include <math.h>

tdefine PI 3 .1415926535897932384 #define MAXBIT 64

int maxbit = MAXBIT;

#define bit(n, a) ( ( (a) »(n) ) &1) #define max(a, b) { ( (a)<(b) ) ? (b) : (a) )

int input_bit (char *x) ; void encode (double x[16]); void output_bit (int b) ; void idct(double x[2] [8] , int *buff) ;

int output_ptr = 0; char output_string[1024] ;

Encode8x2(int *buff, char *fp) { int i, j, k, n; int freq, time; double scale; double sum; double c[8][8], cc[2][2]; double x[2] [8] , y[2] [8] ; double xx[16] ;

/* compute the coefficients for the 8 point DCT */ for (freq=0; freq < 8; freq++)

{ scale = (freq == 0) ? sqrt(0.125) : 0.5; for (time=0; time<8; time++) { c [freq] [time] = scale*cos ( (PI/8.0) *freq* (time + 0.5));

} >

cc [ 0] [0] = sqrt ( 0 .5 ) ; cc [0] [1] = sqrt (0 .5) ; cc [l] [0] = sqrt (0.5) ; cc [l] [1] = -sqrt (0 .5) ;

/* read in one 8x2 image block, do a 2-D DCT, and write the results */ for (i=0 ; i<2 ; i++) for (j=0; j<8; j++) x[i] [j] =. buff [8*i+j];

for (i=0; i<2; i++) { for (j=0; j<8; j++) { sum = 0.0; for (k=0; k<8; k++) sum += c[j] [k] * x[i] [k] ; y[i] [jl = sum; } } for (i=0; i<2; i++) { for (j=0; j<8; j++) { sum = 0.0; for (k=0; k<2 ; k++) sum += cc[i] [k] * y[k][j]; x[i] [jl = sum; } > for (i=0; i<2 ; i++) for (j=0; j<8; j++) xx[8*i+j] = x[i] [j] ;

encode ( xx ) ; for (i=0; i<maxbit; i++) fp [i] = output_string [i] ; for (i=maxbit; i<maxbit+10; i++) fp[i] = '0'; }

struct group { int num; int el [10] ; };

tdefine NGRP 4

struct group grp[NGRP] = {

{3, {1, 2, 3}},

{4, {8, 9, 10, 11}},

{4, {4, 12, 5, 13}},

{4, {6, 14, 7, 15}} }; double scaling [16] = {

1.00, 1.00, 1.00, 1.00,

1.00, 1.00, 1.00, 1.00 };

int encode_init_flag = 0;

void encode(double x[16]) { double dc_mag; int flag_nonzero; int tmp, num, pos, index; int i, j, k; int nbit, bi, addr, done; double mag; double magnitude [16] ; int sign[16] ; int flag [16] ; int grp_flag[NGRP] ;

/* initialize maxbit and scaling array */ if (encode_init_flag == 0) { encode_init_flag = 1; scanf ( " %d" , &maxbit) ; for (i=0; i<16; i++) scanf ("%lf", &scaling[i] ) ;

fprintf (stderr, "maxbit: %d\nscaling: " , maxbit); for (i=0; i<16; i++) fprintf (stderr, " %lf", scaling[i]); fprintf (stderr, " \n" ) ; }

/* apply scaling */ for (i=0; i<16; i++) x[i] *= scaling [i] ;

for (i=0; i<16; i++) { magnitude [i] = fabs(x[i]); sign[i] = (x[i] < 0.0) ; flag [i] = 0 ; }

for (i=0; i<NGRP; i++) grp_flag[i] = 0;

/* special case DC term */ dc_mag = magnitude [ 0] ; magnitude [0] = 0;

/* start encoding */ output__ptr = 0; nbit = 0; flag_nonzero = 0;

/* output bit of DC term */ output_bit ( ( (int) (dc_mag/512.0) ) &1) ; nbit++;

for (mag=256.0; nbit<=maxbit; mag /= 2.0) {

• // fprintf (stderr, "nbit: %3d mag: %f\n" , nbit, mag);

/* output bit of DC term */ output_bit ( ( (int) (dc_mag/mag) ) &1) ; nbit++;

if (flag_nonzero == 0) { // first non-zero bit plane not found yet for (i=l; i<16; i++) if (magnitude [i] >= mag) flag__nonzero = 1 ;

output_bit (flag_nonzero) ; nbit++; } if ( flag_nonzero == 1) { for (i=0; i<NGRP; i++) { // do each group one at a time if (grp_flag[i] == 0) { for ( j=0 ; j<grp[i] .nu ; j++) { k = grp[i] .el[j] ; if (magnitude [k] >= mag) grp_flag[i] = 1; }

output_bit (grp_flag[i] ) ; nbit++; } if (grp_flag[i] == 1) { for (j=0; j<grp[i] .num; j++) { k = grp[i] .el[ ] ; if (flag[k]==0 && magnitude [k] >= mag) { output_bit (1) ; output_bit (sign[k] ) ; nbit += 2; flagfk] = 1; } else { output_bit ( ( (int) (magnitud [k] /mag) ) &1) ; nbit++; } } } } } } }

Decode8x2 (int *buff, char *x) { double dc_mag; int flag_nonzero; int tmp, num, pos, index; int i, j, k, z; int nbit, bi, addr, done; double mag; double magnitude [16] ; int sign [16] ; double zz[2] [8] ; int flag [16] ; int grp_flag[NGRP] ;

for (i=0; i<16; i++) { magnitude [ i ] = 0.0; sign[i] = 0; flag[i] = 0; } for (i=0; i<NGRP; i++) grp_flag[i] = 0;

/* special case DC term */ dc_mag = 512.0;

/* start decoding */

/* input bit of DC term */ dc__mag += 512.0 * (input_bit (x++) - 0.5); nbit = 1; flag_nonzero = 0;

for (mag=256.0; nbit<=maxbit; mag /= 2.0) {

//printf ("mag: %f nbit: %d string: %s\n", mag, nbit, x) ;

/* input bit of DC term */ nbit++; if (nbit > maxbit) goto finish; dc_mag += mag * (input_bit (x++) - 0.5);

if (flag_nonzero == 0) { nbit++; if (nbit > maxbit) goto finish; flag_nonzero = input_bit (x++) ; } if (flag_nonzero == 1) { for (i=0; i<NGRP; i++) { if (grp_flag[i] == 0) { nbit++; if (nbit > maxbit) goto finish; grp_flag[i] = input_bit (x++) ; } if (grp_flag[i] == 1) { for (j=0; j<grp[i] .num; j++) { k = grp[i] .el[j] ; nbit++; if (nbit > maxbit) goto finish; z = input_bit (x++) ; if (flag[k] == 0) { if ( z==l) {^" nbit++ ; if (nbit > maxbit) goto finish; flag[k] = 1; magnitude [k] = 1.5 * mag; sign[k] = input_bit (x++) ; } } else { magnitude [k] += mag * (z - 0.5); } } } } } } finish: magnitude [0] = dc_mag; for (i=0; i<2 ; i++) for (j=0; j<8; j++) zz[i][j] = (1.0 - 2.0 * sign[8*i+j]) * magnitude [8*i+j] / scaling [8*i+j] ; idct (zz, buff) ; }

void idct(double x[2] [8] , int *buff) { int i, j, k, n; int freq, time; double scale; double sum; double c [8] [8] , cc[2][2]; double y[2] [8] ;

{ scale = (freq == 0) ? sqrt{0.125) : 0.5; for (time=0; time<8; time++) { c [freq] [time] = scale*cos ( (PI/8.0) *freq* (time + 0.5));

} } cc[0] [0] = sqrt(0.5) ; cc[0] [1] = sqrt(0.5) ; cc[l] [0] = sqrt(0.5) ; cc[l] [1] = -sqrt(0.5) ;

/* read in one 8x2 coefficient block, do a 2-D IDCT, and write the results */ for (i=0; i<2; i++) { for (j=0; j<8; j++) { sum = 0.0; for (k=0; k<2; k++) sum += cc[k] [i] * x[k] [j]; y[i] [j] = sum; } } for (i=0; i<2; i++) { for (j=0; j<8; j++) { sum = 0.0; for (k=0; k<8; k++) sum += c[k] [j] * y[i] [k]; x[i] [j] = sum; } } for (i=0; i<2; i++) { for (j=0; j<8; j++) { if (x[i] [j] > 0) n = x[i] [j] + 0.5; else n = x[i] [j] - 0.5; if (n < 0) n = 0; if (n > 255) n = 255; buff[8*i+j] = n; } } }

void output_bit (int b) { if (b) output_string[output_ptr++] = ' 1'; else output_string[output_ptr++] = ' 0'; } int input_bit(char *x) { return ( (*x)== '1 ' ) ?1:0; }

Claims

CLAIMSWHAT IS CLAIMED IS:

1. A data processing method comprising: partitioning a plurality of coefficients into a plurality of groups so that coefficients expected to have the same magnitude on average are grouped together; and . encoding grouped coefficients into a compressed bit stream.

2. The method defined in Claim 1 wherein partitioning the plurality of coefficients comprises ordering the plurality of coefficients so that coefficients expected to have a contribution greater than other coefficients in the plurality of coefficients are ordered earlier in the plurality of groups.

3. The method defined in Claim 1 further comprising assigning coefficients to the plurality of groups with an expectation of having zeros uniformly included in high-order bit planes of the group.

4. The method defined in Claim 1 wherein bit plane encoding the grouped coefficients comprises outputting a bit from a DC coefficient and one or more bits encoding bits of the AC coefficients for each bit plane.

5. The method defined in Claim 1 further comprising applying a Discrete Cosine Transform (DCT) to data representative of an image to generate the plurality of coefficients.

6. The method defined in Claim 5 wherein the uncompressed bit comprises a part of a DC coefficient of the plurality of coefficients resulting from application of the DCT.

7. The method defined in Claim 1 wherein the plurality of coefficients comprises an 8x2 byte block of coefficients.

8. The method defined in Claim 1 further comprising reducing an amount of bits in the compressed bit stream.

9. The method defined in Claim 1 wherein the step of encoding comprises limiting the compressed bit stream to a predetermined number of bits.

10. The method defined in Claim 1 wherein encoding comprises setting a flag to indicate that all bits in a portion of the bit plane comprises a first value.

11. The method defined in Claim 10 wherein the portion of the bit plane includes the bits of the AC coefficients.

12. The method defined in Claim 1 wherein encoding comprises: setting a flag to indicate that at least one of the bits in a portion of a bit plane comprises a first value; outputting the flag; and outputting one or more bits for each group in the portion of the bit plane.

13. The method defined in Claim 12 wherein outputting one or more bits for each group in the portion of the bit plane comprises: outputting one bit to represent the group if a bit having a first value has not been encountered for the group; and outputting a bit for each bit in the group when a bit having the first value has been encountered for the group, including outputting an indication of the sign for each coefficient immediately after the occurrence of a bit with a second value.

14. An apparatus for data processing comprising: means for partitioning a plurality of coefficients into a plurality of groups so that coefficients expected to have the same magnitude on average are grouped together; and means for encoding grouped coefficients into a compressed bit stream.

15. The apparatus defined in Claim 14 wherein the means for encoding comprises means for bit plane encoding the grouped coefficients into a compressed bit stream from a highest bit plane to bit planes lower than the highest bit plane.

16. The apparatus defined in Claim 14 wherein the compressed bit stream includes an uncompressed bit and one or more encoded bits from each bit plane included therein.

17. The apparatus defined in Claim 14 wherein the means for bit plane encoding comprises means for outputting a bit from a DC coefficient and one or more bits encoding bits of the AC coefficients for each bit plane.

18. The apparatus defined in Claim 14 further comprising means for applying a Discrete Cosine Transform (DCT) to data representative of an image to generate the plurality of coefficients.

19. The apparatus defined in Claim 18 wherein the uncompressed bit comprises a part of a DC coefficient of the plurality of coefficients resulting from application of the DCT.

20. The apparatus defined in Claim 14 wherein the plurality of coefficients comprises an 8x2 byte block of coefficients.

21. The apparatus defined in Claim 14 further comprising means for reducing an amount of the grouped coefficients.

22. The apparatus defined in Claim 14 further comprising means for limiting the compressed bit stream to a predetermined number of bits.

23. The apparatus defined in Claim 14 wherein the means for encoding comprises means for setting a flag to indicate that all bits in a portion of the bit plane comprises a first value.

24. The apparatus defined in Claim 23 wherein the portion of the bit plane includes the bits of the AC coefficients.

25. The apparatus defined in Claim 14 wherein the means for encoding comprises: means for setting a flag to indicate that at least one of the bits in a portion of a bit plane comprises a first value; means for outputting the flag; and means for outputting one or more bits for each group in the portion of the bit plane.

26. The apparatus defined in Claim 25 wherein the means for outputting one or more bits for each group in the portion of the bit plane comprises: means for outputting one bit to represent the group if a bit having a first value has not been encountered for the group; and mean for outputting a bit for each bit in the group when a bit having the first value has been encountered for the group, including means for outputting an indication of the sign for each coefficient immediately after the occurrence of a bit with a second value.

27. A data processing method comprising: partitioning a plurality of coefficients into a plurality of groups so that coefficients expected to have a contribution greater than other coefficients in the plurality of coefficients are ordered earlier in the plurality of groups; and encoding grouped coefficients into a compressed bit stream.

28. The method in Claim 27 wherein encoding grouped coefficients into a compressed bit stream comprises bit plane encoding the grouped coefficients into a compressed bitstream from a highest bit plane to bit planes lower than the highest bit plane.

29. The method defined in Claim 27 wherein the compressed bit stream includes an uncompressed bit and one or more encoded bits from each bit plane included therein.

30. The method defined in Claim 27 wherein bit plane encoding the grouped coefficients comprises outputting a bit from a DC coefficient and one or more bits encoding bits of the AC coefficients for each bit plane.

31. The method defined in Claim 27 further comprising applying a Discrete Cosine Transform (DCT) to data representative of an image to generate the plurality of coefficients.

32. The method defined in Claim 31 wherein the uncompressed bit comprises a part of a DC coefficient of the plurality of coefficients resulting from application of the DCT.

33. The method defined in Claim 27 wherein the plurality of coefficients comprises an 8x2 byte block of coefficients.

34. The method defined in Claim 27 further comprising reducing an amount of bits in the compressed bit stream.

35. The method defined in Claim 27 wherein the step of encoding comprises limiting the compressed bit stream to a predetermined number of bits.

36. The method defined in Claim 27 wherein encoding comprises setting a flag to indicate that all bits in a portion of the bit plane comprises a first value.

37. The method defined in Claim 36 wherein the portion of the bit plane includes the bits of the AC coefficients.

38. The method defined in Claim 27 wherein encoding comprises: setting a flag to indicate that at least one of the bits in a portion of a bit plane comprises a first value; outputting the flag; and outputting one or more bits for each group in the portion of the bit plane.

39. The method defined in Claim 38 wherein outputting one or more bits for each group in the portion of the bit plane comprises: outputting one bit to represent the group if a bit having a first value has not been encountered for the group; and outputting a bit for each bit in the group when a bit having the first value has been encountered for the group, including outputting an indication of the sign for each coefficient immediately after the occurrence of a bit with a second value.

40. An apparatus for processing data comprising: means for partitioning a plurality of coefficients into a plurality of groups so that coefficients expected to have a contribution greater than other coefficients in the plurality of coefficients are ordered earlier in the plurality of groups; and means for encoding grouped coefficients into a compressed bit stream.

41. The apparatus defined in Claim 40 wherein the means for encoding comprises means for bit plane encoding the grouped coefficients into a compressed bit stream from a highest bit plane to bit planes lower than the highest bit plane.

42. The apparatus defined in Claim 40 wherein the compressed bit stream includes an uncompressed bit and one or more encoded bits from each bit plane included therein.

43. The apparatus defined in Claim 40 wherein the means for bit plane encoding comprises means for outputting a bit from a DC coefficient and one or more bits encoding bits of the AC coefficients for each bit plane.

44. The apparatus defined in Claim 40 further comprising means for applying a Discrete Cosine Transform (DCT) to data representative of an image to generate the plurality of coefficients.

45. The apparatus defined in Claim 44 wherein the uncompressed bit comprises a part of a DC coefficient of the plurality of coefficients resulting from application of the DCT.

46. The apparatus defined in Claim 40 wherein the plurality of coefficients comprises an 8x2 byte block of coefficients.

47. The apparatus defined in Claim 40 further comprising means for reducing an amount of the grouped coefficients.

48. The apparatus defined in Claim 40 further comprising means for limiting the compressed bit stream to a predetermined number of bits.

49. The apparatus defined in Claim 40 wherein the means for encoding comprises means for setting a flag to indicate that all bits in a portion of the bit plane comprises a first value.

50. The apparatus defined in Claim 49 wherein the portion of the bit plane includes the bits of the AC coefficients.

51. The apparatus defined in Claim 40 wherein the means for encoding comprises: means for setting a flag to indicate that at least one of the bits in a portion of a bit plane comprises a first value; means for outputting the flag; and means for outputting one or more bits for each group in the portion of the bit plane.

52. The apparatus defined in Claim 51 wherein the means for outputting one or more bits for each group in the portion of the bit plane comprises: means for outputting one bit to represent the group if a bit having a first value has not been encountered for the group; and mean for outputting a bit for each bit in the group when a bit having the first value has been encountered for the group, including means for outputting an indication of the sign for each coefficient immediately after the occurrence of a bit with a second value.

53. A data processing method comprising: partitioning a plurality of coefficients into a plurality of groups; and bit plane encoding grouped coefficients into a compressed bit stream from a highest bit plane to bit planes lower than the highest bit plane, the compressed bit stream including an uncompressed bit and one or more encoded bits from each bit plane included therein.

54. The method defined in Claim 53 wherein partitioning the plurality of coefficients comprises ordering the plurality of coefficients so that coefficients expected to have a contribution greater than other coefficients in the plurality of coefficients are ordered earlier in the plurality of groups.

55. The method defined in Claim 53 further comprising assigning coefficients to the plurality of groups with an expectation of having zeros uniformly included in high-order bit planes of the group.

56. The method defined in Claim 53 wherein bit plane encoding the grouped coefficients comprises outputting a bit from a DC coefficient and one or more bits encoding bits of the AC coefficients for each bit plane.

57. The method defined in Claim 53 further comprising applying a Discrete Cosine Transform (DCT) to data representative of an image to generate the plurality of coefficients.

58. The method defined in Claim 57 wherein the uncompressed bit comprises a part of a DC coefficient of the plurality of coefficients resulting from application of the DCT.

59. The method defined in Claim 53 wherein the plurality of coefficients comprises an 8x2 byte block of coefficients.

60. The method defined in Claim 53 further comprising reducing an amount of bits in the compressed bit stream.

61. The method defined in Claim 53 wherein the step of encoding comprises limiting the compressed bit stream to a predetermined number of bits.

62. The method defined in Claim 53 wherein encoding comprises setting a flag to indicate that all bits in a portion of the bit plane comprises a first value.

63. The method defined in Claim 62 wherein the portion of the bit plane includes the bits of the AC coefficients.

64. The method defined in Claim 53 wherein encoding comprises: setting a flag to indicate that at least one of the bits in a portion of a bit plane comprises a first value; outputting the flag; and outputting one or more bits for each group in the portion of the bit plane.

65. The method defined in Claim 64 wherein outputting one or more bits for each group in the portion of the bit plane comprises: outputting one bit to represent the group if a bit having a first value has not been encountered for the group; and outputting a bit for each bit in the group when a bit having the first value has been encountered for the group, including outputting an indication of the sign for each coefficient immediately after the occurrence of a bit with a second value.

66. An apparatus for data processing comprising: means for partitioning a plurality of coefficients into a plurality of groups; and means for bit plane encoding grouped coefficients into a compressed bit stream from a highest bit plane to bit planes lower than the highest bit plane, the compressed bit stream including an uncompressed bit and one or more encoded bits from each bit plane included therein.

67. The apparatus defined in Claim 66 wherein the means for partitioning comprises means for ordering the plurality of coefficients so that coefficients expected to have a contribution greater than other coefficients in the plurality of coefficients are ordered earlier in the plurality of groups.

68. The apparatus defined in Claim 66 further comprising means for assigning coefficients to the plurality of groups with an expectation of having zeros uniformly included in high-order bit planes of the group.

69. The apparatus defined in Claim 66 wherein the means for bit plane encoding comprises means for outputting a bit from a DC coefficient and one or more bits encoding bits of the AC coefficients for each bit plane.

70. The apparatus defined in Claim 66 further comprising means for applying a Discrete Cosine Transform (DCT) to data representative of an image to generate the plurality of coefficients.

71. The apparatus defined in Claim 70 wherein the uncompressed bit comprises a part of a DC coefficient of the plurality of coefficients resulting from application of the DCT.

72. The apparatus defined in Claim 66 wherein the plurality of coefficients comprises an 8x2 byte block of coefficients.

73. The apparatus defined in Claim 66 further comprising means for reducing an amount of the grouped coefficients.

74. The apparatus defined in Claim 66 further comprising means for limiting the compressed bit stream to a predetermined number of bits.

75. The apparatus defined in Claim 66 wherein the means for encoding comprises means for setting a flag to indicate that all bits in a portion of the bit plane comprises a first value.

76. The apparatus defined in Claim 75 wherein the portion of the bit plane includes the bits of the AC coefficients.

77. The apparatus defined in Claim 66 wherein the means for encoding comprises: means for setting a flag to indicate that at least one of the bits in a portion of a bit plane comprises a first value; means for outputting the flag; and means for outputting one or more bits for each group in the portion of the bit plane.

78. The apparatus defined in Claim 77 wherein the means for outputting one or more bits for each group in the portion of the bit plane comprises: means for outputting one bit to represent the group if a bit having a first value has not been encountered for the group; and mean for outputting a bit for each bit in the group when a bit having the first value has been encountered for the group, including means for outputting an indication of the sign for each coefficient immediately after the occurrence of a bit with a second value.