N-BIT VIDEO CODER AND METHOD OF EXTENDING AN 8-BIT MPEG
VIDEO CODER
BACKGROUND OF THE INVENTION Field of the Invention
This invention relates to coding video data and more specifically to coding high resolution N-bit video data.
Description of the Related Art
The bandwidth requirements of video data requires that the video be compressed and coded for transmission or storage. Fortunately, video data exhibits substantial intraframe spatial correlation and interframe temporal correlation that can be exploited to compress the data with minimal visual artifacts. Entropy coding achieves bit rate reduction by using the statistical properties of the video data and, in theory, is lossless. The current Motion Picture Expert Group (MPEG2) standard documented in ISO/IEC 13818-2:1996 entitled "Information Technology; Generic Coding of Moving Pictures and Associated Audio Information: Video" and the developing MPEG4 standard for coding digital video receive 8-bit per pixel data and compress it to a desired bit rate or at a desired compression rate. The basic approach as shown in detail in FIG. 1 is to perform a two-dimensional discrete cosine transform (DCT) on 8x8 pixel blocks to reduce the intraframe spatial redundancy. This technique is used on the first frame to initiate coding and periodically thereafter to avoid error accumulation. The remaining P- frames are compressed by performing a motion-compensated prediction on adjacent frames to reduce temporal redundancy
and then performing a two-dimensional DCT on 8x8 pixel blocks representing the prediction error in each frame to reduce the spatial redundancy. Entropy coding is then used to code the quantized DC and AC coefficients . Specifically, each I-frame is coded by disabling the motion estimation 10 and motion compensation 12 functions (switches S1,S2 and S3) and passing the original video frame 14 to a transform coder that independently transforms 8x8 pixel blocks using a two-dimensional DCT 16. The DC and 63 AC coefficients are quantized 18 by dividing each value by a step-size Qp, which has a 5-bit range. The step-size determines the amount of compression,- the larger Qp the greater the compression but the greater the coding error. The quantized coefficients are inverse quantized 20, inverse transformed 22, and stored directly in the frame buffer 24. The reconstructed frame is clipped so that it lies in a range of [0,255] .
The quantized I-frame coefficients are then coded 26 and output into a bitstream 28. If the DC value is less than a user specified threshold, the DC value is treated as the first AC coefficient. The AC coefficients are coded by reading them out in a zig-zag pattern, forming them into one or more symbols each comprised of three components; the number of zeros before the next non-zero coefficient, the value of the non-zero coefficient, and whether it is the last non-zero coefficient in the block, and Huffman coding the symbols. High quality statistics characterizing the symbol distribution are readily available for 8-bit video.
Otherwise, the user, based on his knowledge of the video characteristics, selects whether to code the DC value using a fixed code or to subject the DC value to a spatial prediction based upon the values of other DC coefficients in the current frame and use a hybrid fixed-variable length code. In the former case, the DC value is coded using an 8 -bit fixed code that specifies values over a range of
[0,255] . To enhance error resiliency by limiting the number of consecutive zeros in the bitstream, the value of
128, which would be represented as 10000000, is mapped to the value 255. Error resiliency is further enhanced by injecting start codes, e.g. twenty-three zeros followed by a number, in the bitstream when, for example, video transmission is first initiated, prior to each frame and possibly prior to certain blocks. Should the data become corrupted, the decoder can skip to the next start code and reacquire the coded video signal .
When the user enables intraframe DC prediction (MPEG provides several known techniques for DC prediction) , the residual values are coded using a hybrid fixed-variable length code that specifies values over a range of [- 255,255] . For example, the value 9 is coded using a fixed length code 1001 to represent the value and a VLC to represent the number of bits or "size" of the fixed length code. High quality statistics characterizing the size distribution are readily available for 8 -bit video and provide better compression than simple fixed length codes.
Each P-frame is encoded by enabling switches S1,S2 and S3 so that a motion compensated reconstructed frame is subtracted from the P-frame at summing node 30. The two- dimensional DCT 16 is performed on each 8x8 pixel block representing the prediction error in each frame, the coefficients are quantized 18 as before, although the value of Qp may be different, and passed to a P-frame entropy coder 32. The coefficients are inverse quantized 20, inverse transformed 22, added to the motion compensated reconstructed frame at summing node 34 and stored in the frame buffer. The prediction error frame is clipped at the output of the IDCT 22 to a range of [-255,255] .
Motion estimation finds the best match in the previous frame for each microblock, i.e. four 8x8 pixel blocks, in the current frame, represents the match with a motion vector and determines the optimal mode for coding the microblock. First, if the prediction error frame is larger than the current frame, motion estimation and compensation is disabled and the I-frame coding technique
is used for that microblock. Second, assuming prediction does provide coding gain, if the difference between the prediction error associated with a zero motion vector and the best motion vector is less than a decision threshold the zero motion vector is selected. The motion vectors are VLC 35 based on a statistical distribution of motion vectors for 8-bit video data and placed in the bitstream. Since the zero motion vector is the most likely it is assigned the fewest bits, hence coding gain is achieved by selecting the zero motion vector when the prediction gain associated with the best motion vector is small. Third, assuming the best motion vector outperforms the zero motion vector, splitting the microblock into 4 8x8 blocks may enhance coding gain. If the difference between the sum of the prediction errors of the 4 blocks and the single microblock is less than a decision threshold, the single motion vector is selected. More bits are required to transmit four motion vectors than one, hence the coding gain associated with the split must justify the extra cost. If either modes 2 or 3 are selected, motion compensation 12 uses the motion vector (s) to get the corresponding blocks from the reconstructed frame in frame buffer 24 and move them to the correct locations in the current frame. The motion compensated frame is then subtracted from the current frame at summing node 30. If mode 1 is selected, the predictive loop is disabled.
Conventional sensors, e.g. infrared, x-ray, etc., generate video data with 8-bit precision. Thus, the standard 8 -bit MPEG video coder provides a good match. However, new technologies are providing sensor resolutions of 12-16 bits. The current approach is to conform the data to the 8-bit format via truncation or remapping. Although simple, this introduces significant mapping error and largely defeats the advantages provided by the higher precision sensors.
To date, the MPEG standardization committee has not adopted nor encouraged the development of high resolution
N-bit coders that are optimized for the particular bit rate. MPEG encourages participants to use the available tools and to incorporate them in new tools for cost reasons. Optimized N-bit video coders would represent a significant increase in cost. Optimized N-bit coders require a large amount of training data for each value of N to develop the required statistics for the VLC tables. Since, the high precision sensors are an emerging technology this data is not available. A. Tanju Erdem et al, "Scalable extension of MEPG-2 for coding 10-bit video" SPIE Vol. 2186 Image and Video Compression 1994, pp. 245-256 describes and extension of the conventional 8 -bit progressive coding algorithm to 10- bit data. Progressive coding is used to progressively send 8-bit data over a band limited channel to a decoder where it is progressively reconstructed on the display. As shown in FIG. 1 on page 252, the 10 -bit data is mapped to an 8- bit format via downsampling and coded as the base layer. The reconstructed 8 -bit frame is subtracted from the 10 -bit frame, mapped to the 8 -bit format and coded as the enhancement layer. Although this approach reduces the mapping error, the coding efficiency is poor. The enhancement layer is a high pass version of the 10 -bit frame and thus exhibits very little spatial correlation. As a result, coding the enhancement layer requires a lot of bits. At rates just above 8-bits the loss of compression is not substantial, but at higher resolutions the coding inefficiency may be prohibitive.
SUMMARY OF THE INVENTION
In view of the above problems, the present invention provides a direct N-bit MPEG video coder that uses available MPEG tools and substantially eliminates mapping error without requiring N-bit statistics. This is accomplished by extending the existing 8 -bit MPEG video coder to an N-bit video coder, which can be implemented in hardware for a particular value of N or in
a general purpose microprocessor for arbitrary values of N. I-frame coding is modified by 1) extending the range of the quantized values of the non-predicted and predicted DC coefficients to [0,2N-1] and [- (2N-1) , 2N-1] , respectively, 2) extending to N-bits the 8-bit VLC table used to code the number of bits required to represent the predicted DC value, and 3) when the code selected from the VLC table is longer than 8 bits, inserting a marker bit into the bitstream to avoid emulating start codes that are inserted into the bitstream to provide error resilience. P-frame coding is modified by scaling the decision thresholds used in the motion-compensated prediction by NB = 2N~8. In both cases, the saturation levels for clipping the inverse transformed frames are extended to N-bit values. Furthermore, the number of bits used to represent the quantizer step-size is preferably set at N-3 bits to maintain the same resolution as the 8 -bit MPEG video coder for 8-bit video. The N-bit coder may be incorporated in a progressive coding scheme to service multiple users having different SNR requirements and channel capacities or to code very high bitrate data such as 32 -bit floating point data.
These and other features and advantages of the invention will be apparent to those skilled in the art from the following detailed description of preferred embodiments, taken together with the accompanying drawings, in which:
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1, as described above, is a block diagram illustrating the standard 8 -bit MPEG video coder;
FIGs . 2a through 2c are flow charts illustrating the extension of the 8 -bit MPEG video coder to an N-bit coder in accordance with the present invention; FIG. 3 is' a table of the extended N-bit variable length codes for the DC size values,-
FIG. 4 is a block diagram of the N-bit coder,-
7
FIGs. 5a and 5b are SNR performance plots comparing N- bit coding to truncated 8 -bit coding for 12 -bit data;
FIG. 6 illustrates the use of N-bit coding in an 8 -bit video display application; FIG. 7 is a block diagram of a progressive video coder that uses the N-bit coder,-
FIG. 8 is a MSE performance plot of the progressive video coder,- and
FIG. 9 is a block diagram of a multi-user application of the progressive video coder.
DETAILED DESCRIPTION OF THE INVENTION
The present invention provides a technique for extending the 8 -bit MPEG video coder to an arbitrary N-bit coder. This is done using available MPEG tools and without requiring new statistics for the different N-bit values. The extension is accomplished by selectively extending a few critical parameters to N bits while leaving the others unchanged. The resulting N-bit coder provides substantial coding gain over either the data conforming or progressive coding techniques without departing from the goal of using available MPEG tools. Furthermore, the performance of the N-bit coder is only marginally inferior to optimized N-bit coders designed from scratch. The extension of the 8-bit MPEG video coder to an N- bit coder is depicted in FIGs. 2a-2c. As shown in FIG. 2a, the modifications occur in three categories: modification of the syntax (step 40) , extension of I-frame coding to N- bits (step 42) and extension of P-frame coding to N-bits (step 44) . The syntax of the video coder is modified when the video coder is being used in a simulation for design or testing purposes or when the video coder is implemented in a general purpose microprocessor that can code arbitrary- values of N. If the video coder is implemented in hardware for a particular value of N, the number of bits N and the number of bits for representing the step-sizes are hard coded.
The syntax is modified by inserting the following code into the VideoObjectLayer of the 8-bit MPEG video coder: not_8_bit if (not_8_bit) { bits_per_pixel bits_per_Qp
} The not_8_bit signifies whether the video data precision is 8 bits per pixel. If the flag is set, the video coder reads the bits_per_pixel and the bits_per_Qp. The user can set the bits_per_Qp at whatever the user prefers. However, a value of N-3 will maintain the same resolution as the 8- bit MPEG video coder for 8-bit data.
As shown in FIG. 2b, I-frame coding is extended to N bits by first extending the range of the quantized values of the non-predicted and predicted DC coefficients to [0,2N- 1] and [- (2N-1) ,2N-1] , respectively (step 46). This is required to maintain the precision of the DC component of the video data. Second, the 8 -bit VLC table used to code the number of bits required to represent the predicted DC value is extended to N-bits (step 48) , as shown in detail in the VLC table 50 in FIG. 3. This increases the number of bits required to code the intraframe predicted DC values but greatly enhances the precision of the data. Since, the N-bit VLC table is an extension of an 8-bit table optimized for 8 -bit data it is, most likely, suboptimal. However, the performance degradation appears to be minimal for bit rates up to 16 bits. Third, when the code selected from the VLC table for a particular DC value is longer than 8 bits, a marker bit, i.e. a "1", is inserted into the bitstream to avoid emulating the start codes that are inserted into the bitstream to provide error resilience (step 52) . Lastly the saturation level for clipping the output of the inverse DCT is scaled to N-bits to match the range of the video data (step 54) .
As shown in FIG. 2c, P-frame coding is extended to N bits by scaling the decision thresholds used in the motion-
compensated prediction by NB = 2N~8 (step 56) . The decision thresholds for modes 2 and 3 in the motion estimation are derived from 8 -bit data and thus must be scaled up to N bits. Otherwise the thresholds will be too low causing the coder to select the best motion vector in mode 2 to often and to select the four blocks over the single microblock to often in mode 3. The saturation level for clipping the output of the inverse DCT is also scaled to N-bits to match the range of the video data (step 58) . Applicant found that the 8 -bit MPEG coder could be effectively extended to an N-bit MPEG video coder without modifying every parameter or extending every VLC table . In particular the AC coefficients in both the I and P-frames and motion vectors for N-bit data can be coded using the 8- bit VLC tables. The VLC tables may be extended to N bits if warranted by a particular application or the characteristics of the video data. Furthermore, when an N- bit coder is used, the value of 128 does not have to be mapped to the value 255 to avoid zero accumulation. As shown in FIG. 4, the N-bit video coder 60 can be implemented in many different ways. First, the N-bit video coder can be simulated for design and testing purposes prior to finalizing the design and implementing it in hardware. At this stage, arbitrary bit rate coders can be simulated simply by changing the value of N. This is a very convenient design and testing feature. Second, the N- bit video coder can be implemented by programming a general purpose microprocessor. This is relatively complicated and expensive but allows the coder to handle arbitrary bit rate data. Current techniques must either map the data to 8 bits or use the inefficient progressive coding technique. Lastly, the N-bit video coder can be implemented in hardware that is specially designed for a specific bit rate and application. This simplifies the hardware and lowers cost.
The relative performance of Applicant's direct 12 -bit coder and the conventional truncation coding to 8 -bits is
10 shown in FIGs. 5a and 5b. As shown in FIG. 5a, for a desired bits/frame direct 12-bit coding maintains a PSNR 62 that is consistently 2dB better than the truncation coding PSNR 64. As shown in FIG. 5b, the direct 12-bit coder PSNR 66 consistently outperforms truncation coding PSNR 68. Note, as the bits/frame increases the truncation coding PSNR 68 flattens out. This technique can not recapture the mapping or truncation error no matter how many bits are used. Conversely, the direct 12 -bit coder PSNR 66 increases monotonically until the error between the reconstructed video at the decoder and the input video is virtually zero. The improvement in performance is mainly to the DC values of the reconstructed image which is particularly critical to perceived visual quality, much more so than AC values.
FIG. 6 illustrates one particularly useful application of the direct N-bit video coder. Current video display terminals 70 have only 8 -bit display capability 72 and thus can not display the N-bit video 74. However, the 8 -bit range 72 can be slid back-and- forth in the N-bit range 74 to optimize the visual display. Typically, the 8-bit range 72 might be centered in the middle of the N-bit range 74, which typically corresponds to the most information. However, if the video is very dark, i.e. the msbs are zero, the 8 -bit range can be slid towards the lsbs.
Although one of the advantages of the direct N-bit coder is to avoid the coding inefficiency of progressive coding, in some situations it may be useful to combine N- bit and progressive coding. For example, if N>>8, for example 32 -bit floating point data used to represent 3-D mesh data, the hardware and specifically the multipliers are very expensive. In addition, extending the 8 -bit DC size VLC table to very large N may exhibit a significant degradation in performance as compared to the optimized N- bit coders. In a multi-user environment, a progressive N- bit coder that is tailored to the SNR requirements and channel . capabilities of the users may improve overall
1 1 coding efficiency.
As shown in FIG. 7, the M-bit video where M>>N is input to the N-bit progressive coder 80 which encodes each of the 8x8 pixel blocks in the base layer and then selectively encodes only those blocks in the enhancement layer (s) that require further coding to satisfy a target distortion. In the base layer, the video frame 82 is passed through summing node 84 and controller 86 to an N- bit quantizer 88 that maps the M-bit data to an N-bit integer value introducing an error e(map) . The quantizer also places the minimum and maximum values into the bitstream.
The N-bit integer values are passed to the N-bit coder 90 where they are encoded according to the extended I-frame and P-frame techniques described above introducing an error e(code) . The coded video data is placed into the bitstream and fedback to reconstruct the quantized video frame. This is accomplished by decoding the coded video data 92, inverse quantizing the decoded data 94, delaying the data by a single frame Z~ l and storing it in memory 96. Initially the memory is empty so that in the base layer there is nothing added to the quantized frame at summing node 98.
In the first enhancement layer, the reconstructed frame stored in memory 96 is subtracted from the video frame. If e(map) + e(code) < tolerance, controller 96 terminates the progressive coding. This can be done by comparing the frame's average block error to the tolerance so that coding of the entire frame terminates at once. Alternately, each block can be terminated individually. This is more difficult to administer but provides better performance in that the SNR of each block is approximately constant .
Provided that the controller does not terminate coding, the prediction error block is again mapped from its M-bit value to an N-bit value by quantizer 88. Coding gain is achieved because the range of the error block is much
12 less than that of the corresponding block in the original video frame. N-bit encoder 90 encodes the quantized values and the process repeats until controller 86 terminates all coding. The goal in designing an N-bit progressive coder for a single user is to maximize the compression rate for a given SNR. The general approach is to try to find a value of N and step-size Qp so that the coding can be done in a single pass to satisfy the SNR requirement. If a single pass cannot be achieved or the value of N is too large to be cost efficient, then values of Qp for the first and second passes and the value of N are selected. Oftentimes, the maximum N may constitute overkill when multiple passes are used. FIG. 8 illustrates a typical rate distortion curve 100 for a two-pass N-bit progressive coder. The diamonds depict different values of Qp in the base layer. The rightmost diamond 102 corresponds to the minimum step-size, hence the minimum error for the selected value of N. The circles depict different values of Qp in the enhancement layer where the minimum value of Qp was used in the base layer. The rate distortion curve illustrates the flexibility in setting the SNR by the N-bit progressive coder and the improvement in mapping error provided by the second pass .
As shown in FIG. 9, the N-bit progressive coder 80 can also be used to efficiently service multiple users 104 having varying SNR requirements and channel capacities 106. The number of enhancement layers, the value of N and values for Qp are selected to reduce the overall bit rate required to provide the SNR required by each user. Although this may be an inefficient approach to service high SNR users, the overall bit rate can be reduced be sending relatively few bits in only the base layer to a large number of low SNR users.
While several illustrative embodiments of the invention have been shown and described, numerous
13 variations and alternate embodiments will occur to those skilled in the art. For example, the 8 -bit MPEG video coder may be updated and features added without changing the basic block DCT motion-compensated prediction architecture to which the invention applies. Such variations and alternate embodiments are contemplated, and can be made without departing from the spirit and scope of the invention as defined in the appended claims.