METHOD AND DEVICE FOR ENCODING/DECODING A DISPLACED FRAME DIFFERENCE
Field of the Invention
The present invention relates generally to video codecs, and more particularly to encoding of displaced frame differences for video codecs.
Background of the Invention
In the realm of very low bit rate video coding, it is very difficult to represent a video sequence with a small number of bits, and still preserve acceptable quality. Many coding techniques tend to represent each frame of encoded video completely, thus updating all pixels at each frame, and therefore use too many bits to meet a very low target bit rate. Generally, though, motion compensated prediction is used to reduce the amount of information needed to be coded for each frame. This approach is used in the ISO MPEG-1 and MPEG-2 standards, and in the ITU-H.261 and H.263 standards. When motion compensated prediction is used between frames in a video sequence, the error in the prediction must be encoded to preserve the quality of the decoded video sequence. This prediction error is referred to as the Displaced Frame Difference, or DFD.
The DFD is generally a nonstationary, high-pass image which consists of error around the edges of moving objects where a motion estimation technique has failed to adequately represent the motion in the video scene. Often, the DFD will also contain large regions of homogeneous error information. This happens when new objects enter the
scene, or when objects are displaced by a large amount of motion between video frames. One common approach used to encode the DFD information is the use of a block-wise Discrete Cosine Transform (DCT), followed by entropy encoding of the coefficients of the transform. This approach is suitable when the pixels within each block are well modeled by a first order Markov random process. The DCT is close to the optimal transform (the Karhunen-Loeve Transform) in terms of energy compaction capabilities when the first order Markov model is met. However, when this model breaks down, the DCT can become less efficient than other techniques at coding the DFD image. One such alternative technique is to use an iterative expansion of the DFD over a dictionary of non-orthogonal Gabor, or modulated Gaussian functions, followed by coding of the expansion information. This approach can allow localization of the prediction error coding to only those pixels of the DFD which are perceptually most important. It can also reduce the required bit expenditure for the DFD over a comparable DCT based approach. Technology does not currently exist, however, for using the iterative Gabor expansion for video coding in an efficient and effective manner.
Thus, there is a need for a method, device and microprocessor that provide a computationally efficient iterative expansion of a displaced frame difference, DFD, image for the purposes of video compression.
Brief Description of the Drawings
FIG. 1 is a flow chart of a preferred embodiment of steps of a method in accordance with the present invention.
FIG. 2 is a graphical representation of the residual energy in the displaced frame difference as decomposed in accordance with the present invention.
FIG. 3 is a diagrammatic representation of regions from a predetermined segmentation, and blocks touching those regions in accordance with the present invention.
FIG. 4 is a block diagram of one preferred embodiment of a device in accordance with the present invention.
FIG 5 is a schematic of one preferred embodiment of a microprocessor in accordance with the present invention.
FIG. 6 is a flow chart of a preferred embodiment of steps of a method in accordance with the present invention.
FIG. 7 is a flow chart of a preferred embodiment of steps of a method in accordance with the present invention.
FIG. 8 is a block diagram of a preferred embodiment of a device in accordance with the present invention.
Detailed Description of a Preferred Embodiment
The present invention provides a method, device and microprocessor for performing a computationally efficient iterative expansion of a displaced frame difference, DFD, image over a special dictionary of two-dimensional non-orthogonal modulated Gaussian, or Gabor functions for the purposes of video compression. The coefficients of this expansion are encoded here using techniques which combine to give an efficient representation of the data at very low bit rates. In addition, the expansion is directed to spend the
primary amount of available bits on a perceptually important area of the image being encoded. Finally, the method describes the encoding of chrominance information associated with the DFD.
FIG. 1 , numeral 100, is a flow diagram of a preferred embodiment of a method for performing an iterative expansion of a DFD image over a special dictionary of modulated Gaussian functions in accordance with the present invention. This is an iterative process which encodes a projection of the luminance channel of the DFD onto a single basis function and computes a new residual DFD image, at each iteration. For the purposes of this description, the term DFD used here will always refer to the luminance channel of the current residual DFD at the present iteration. The initial residual DFD is set equal to the original DFD obtained from a predetermined motion estimation and compensation technique.
The first step in the method is dividing the DFD image into a plurality of blocks (102). These blocks permit the treatment of the DFD image in a localized manner. The sum of absolute values of the intensity at each pixel within each block is then computed (104). These sums are computed for use in estimating which local region of the DFD image has the highest intensity energy. The next step is weighting the blocks in the DFD according to a predetermined biasing scheme, selected from two options in a predetermined manner (106), which places an emphasis on a perceptually important region of the DFD (108). Where the DFD has been segmented into a plurality of regions according to a predetermined segmentation scheme, this information is used in selecting a perceptually important region (110). Otherwise, a centrally biased weighting scheme is used to represent the perceptually important area of the DFD as the center of the image (112).
The block within the DFD which has the highest weighted energy estimate is then chosen as the best block for concentrating the encoding (114). Using this best block as a starting spatial point, the DFD energy is
then encoded using a predetermined hierarchical Gabor function expansion.
A predetermined dictionary of two-dimensional, normalized, real Gabor functions is used for this method (116). The analytical expression for the functions in this dictionary is given by
G^j) = 8£{i)8g(j) U = {0.1 N- 1}, r r a,β eB,
B = SET(s,ς,φ), where
i = {0,l,...,N - l}, N a predetermined positive integer, where N =16, and
S(*) = V2V"*\ K? is a normalizing constant chosen such that |gr(;)j| = ι. The predetermined dictionary is completely specified by defining a set of parameters B = SET(s,ς,φ),wh ch are to be used.
The hierarchical Gabor function expansion begins by picking a representative quadrant of the predetermined dictionary of two-dimensional Gabor functions(118). This representative quadrant is determined by finding the best matching Gabor basis function from four predetermined basis functions which represent each quadrant of the dictionary using a predetermined hierarchical pixel search. The best matching Gabor basis function within the representative quadrant of the dictionary is then found using a projection of the DFD signal onto each basis function with the hierarchical pixel search (120), and picking the function associated with the largest valued projection. The projections in steps (118) and (120) are
computed only over bases whose center points lie within the best DFD block chosen in step (114).
The predetermined hierarchical pixel search is used in finding the expansion of the DFD at each iteration in (118) and (120). In this search, the projection, or inner product, is computed for the current basis function centered at every nth pixel within the best block in the DFD. For example, if the best block of interest was comprised of the pixels with indices between 16 and 31 in the x direction, 16 and 31 in the y direction, and t?=2, then the (x,y) centers for computing the projections would be:
(16,16 ), (16,18), (16,20) .... (16,30); (18,16), (18,18), (18,20) .... (18,30);
(28,16), (28,18), (28,20) ... (28,30). (30,16), (30,18), (30,20) ... (30,30).
The projection yielding the largest amplitude from these computations determines the best initial matching pixel position. Next, the pixels in a predetermined neighborhood of the best initial matching pixel position are used as centers of projections of the current Gabor basis function, and the best matching pixel position is found by choosing the position yielding the largest projection value. For example, if the best initial matching pixel position from the previous example was (18,20), and the predetermined neighborhood was +/-1 pixel, then the position search would additionally involve computing projections at the positions: (17,19), (17,20), (17,21 );
(18,19), (18,21 );
(19,19), (19,20), (19,21 ). The final projection for the current Gabor basis function is chosen as that which has the largest projection value after the neighborhood pixel search. Step (1 18) uses the hierarchical pixel search for each of the four representative Gabor basis functions. The basis function having maximum final projection among the four functions is chosen as the representative
matching Gabor basis function. In step (120), the remaining Gabor basis functions in the representative quadrant of the dictionary are searched using the same hierarchical pixel position search within the best block in the DFD. The Gabor basis function with the largest valued projection after this search is chosen as the current best matching Gabor atom. An atom comprises three key pieces of information which are encoded: the final best matching pixel position, the index of the chosen Gabor basis function in the dictionary, and the value of the corresponding projection coefficient.
The value of the current projection coefficient is quantized with a uniform, fixed length mid-tread quantizer (122). The limits of this quantizer are defined by using the value of the projection coefficient associated with the atom chosen at the first iteration (124). The residual energy in the DFD has a monotonically decreasing nature with the iterative expansion described here. FIG. 2, numeral 200, shows this behavior graphically.
Because of this unique characteristic, the first projection coefficient gives the maximum value needing to be represented in the encoded data. The maximum output of the quantizer is the nearest integer value to the projection coefficient in the first iteration. Here, the minimum output value of the quantizer is a predetermined percentage. The maximum quantizer value is encoded using a predetermined fixed number of bits. For example, the use of 9 bits to encode the maximum value of the quantizer permits the quantization of projection coefficients having an absolute value less than 512, and requires clipping any which have a greater value than this to a value of 51 1. A suitable choice for setting the minimum quantizer value is 10% of the maximum quantizer value. Thus, after the first iteration of the hierarchical expansion, the projection coefficient quantizer is completely defined, and can be used on all subsequent iterations for the current video frame being encoded.
After quantization of the current projection coefficient, a new residual DFD is computed (126). This is the current residual DFD minus the
quantized projection of that DFD onto the last atom. This can be expressed as
Rn+1 = Rn - pGn«r/3r„
where R" represents the current residual DFD, G"iβ represents the current best matching basis function from the predetermined dictionary, and
which is the quantized projection of the current residual DFD onto the current best matching dictionary function.
After this new residual image is computed, the hierarchical expansion process is repeated, steps (102)-(126). In step (104), only those block sums which have been affected by the computation of the current residual DFD need to be recomputed. This iterative expansion continues until a predetermined number of iterations has occurred, or the value of the projection coefficient of the current atom falls below a predetermined threshold (128).
After the last iteration, all of the information associated with the expansion must be encoded. This information can be described in terms of the atom, or selected matching basis function, at each iteration. The components of the atoms which need to be encoded are the position, quantized projection coefficients, and dictionary basis indices.
FIG. 6, numeral 600, shows a flow diagram of the encoding process for the atom information associated with the iterative expansion of the DFD image. The spatial position of each atom must be encoded so that the decoder can reconstruct the same residual. This can be accomplished by representing these positions with a differential code which is arithmetically
encoded (602), or encoded with a fixed length code (604), the latter being more robust when there is a possibility of errors during transmission of the coded data. In the case of arithmetic coding, a differential vector for each atom position, representing the distance from the last atom in the x and y directions, and from the origin for the first atom, is encoded. This approach allows for the encoding of these positions near the actual entropy of the set of symbols representing the positions because of the non-integer value of bits required to uniquely encode each symbol. One arithmetic code is used for the x component differential codewords, and one is used for the y component differential codewords. An adaptive model of the probability distribution o- the symbols is used. Th 3 model can De uplated with each encoded frame of video.
In the case of fixed length codes for representing the position (604), the plurality of blocks used in step (102) is used to assist in encoding this data as well. For each block, one bit is encoded to specify the presence or absence of any atom positions in that block. For those blocks which have atoms, the positions of those atoms are encoded using a vector for each atom position, representing the distance from the x and y coordinates of the origin of the block. The components of each vector are encoded using a fixed number of bits determined by the block size. For example, for 32x32 blocks, 5 bit codes are required to fully represent the x and y components. A final bit is encoded after each encoded position to signify the presence or absence of more atoms within the block.
The basis indices are encoded with 2 fixed-length codes (606). For example, if the dictionary is made up of 16 one-dimensional elements, 4 bits uniquely specifies one address in the table. Each 2-D function is uniquely represented by the two 4-bit addresses associated with the two one dimensional functions.
The projection coefficient associated with each encoded position is encoded with either an arithmetic (608) or a fixed length (610) coder as well.
The color channels of the DFD are generally of a much more homogeneous nature locally than the luminance channel. Thus, the Gabor function expansion in accordance with the present invention is generally not as efficient as the block DCT for the chrominance channels. The Gabor approach is better suited for thinner, "edge-like" characteristics that are found in the luminance channel, as opposed to the lowpass characteristics of the chromir ance channels. For the c.irominance cnanne s of the DFD, in step (612), each 8x8 block in is transformed into the DCT domain, and the coefficients of this transform are quantized and variable length encoded, using the appropriate syntax provided by the ITU-T Draft International Standard, H.263. The result is a complete encoded DFD which permits very low bit rates for encoding a motion compensated video sequence.
For the biasing of the DFD selected in (106), the absence of a predetermined segmentation indicates the choice of a centrally-oriented bias (112). This involves weighting the pixels of the DFD according to the expression
where, DFD(i,j) refers to the three channel image of the displaced frame difference, and the symbol |_ J indicates integer truncation. The constants appearing in the weighting function are given here for QCIF resolution video images, which have a support of 176x144 in the Y channel, and 88x72 in the Cr and Cb channels. These constants can be changed to accommodate any image format by appropriate scaling. In the QCIF resolution, a 16x16 macroblock contains 16x16 pixels from the Y channel, and 8x8 pixels from
each of the chrominance channels. As a result, the function above describes a window which maps one-to-one onto the pixels of the Y channel. The chrominance pixels in each macroblock are weighted by the same value as the corresponding pixels from the Y channel in that macroblock. The emphasis in this weighting is placed on macroblocks towards the center of the image, which is perceptually important.
In the presence of a predetermined segmentation, a region oriented bias is selected (110). The choice of which region receives the bias is accomplished through a ranking operation. For the pixels of the DFD within each region provided by the predetermined segmentation, :ιe blocks which touch that region are marked. For example, FIG. 3, numeral 300, shows an example of two predetermined regions in an image, and the blocks touching those regions are highlighted accordingly. For each region, and for those accompanying marked blocks, the average absolute value of the centrally weighted DFD, expressed above, and the average absolute value of the centrally weighted motion vectors for the marked blocks are computed. There is a motion vector associated with each pixel in any motion compensated coding scheme, although, in a block based motion compensation approach, all vectors within one block are identical. This information is available from a predetermined motion estimation and compensation process. These values are normalized by the maximum value for each category: motion vectors in the x direction, dx, motion vectors in the y direction, dy, and DFD pixel intensity values over all the regions provided by the predetermined segmentation. The result is a score for each region, indexed by /, which allows the regions to be ranked by perceptual importance:
Scored) = Av*(DFD + Wd* ) + AvsW Max(DFD) Max(dx) Max(dy) '
where DFD(i ) is defined above, and, in the same fashion,
and
Again, the weighting constants are given in terms of QCIF resolution images, but can be generalized to any resolution.
The region having the greatest score receives the highest rank, and serves as the perceptually important region for biasing the DFD. The pixel values in all blocks of the DFD which do not touch the highest ranking region are set to zero, thus limiting the expansion algorithm to encoding only atoms which lie within the chosen region of interest.
FIG. 4, numeral 400, is a block diagram of one preferred embodiment of a device for performing an iterative expansion of a DFD image over a special dictionary of modulated Gaussian functions in accordance with the present invention. The device comprises an estimation unit (402), a memory unit (404), a selection unit (406), a quantizer (408), a residual computation unit (410), a comparator (412), a controller (414), and an encoding unit
(416). The estimation unit is used for determining which block in the current residual DFD has the highest energy (402). The memory unit (404) which is coupled to the selection unit (406), is used for storing the predetermined dictionary of Gabor functions and the representative quadrant functions. The selection unit (406), coupled to the estimation unit (402), applies the hierarchical expansion algorithm on the block of interest and selects the best matching atom. The quantizer (408), coupled to the selection unit, is
used for quantizing the projection coefficient of the current atom. The residual computation unit (410), coupled to the quantizer, is then used for subtracting out the quantized projection of the current atom, and for computing a new current residual DFD. The comparator (412), which is coupled to the residual computation unit, is used for determining if the iteration termination conditions have been met. The controller (414), which is coupled to the comparator (412), the estimation unit (402), the selection unit (406), the quantizer (408), and the residual computation unit (416), is used for running the iteration and managing the data from the expansion for encoding. The device also contains an encoding unit (416) which is coupled to the controller (414), and encodes the position, basis indices and projection coefficient information of the entire expansion of the DFD.
FIG. 5, numeral 500, is a block diagram of one preferred embodiment of a microprocessor for performing an iterative expansion of a DFD image over a special dictionary of modulated Gaussian functions in accordance with the present invention. The microprocessor comprises an estimation unit (502), a memory unit (504), a selection unit (506), a quantizer (508), a residual computation unit (510), a comparator (512), a controller (514), and an encoding unit (516). The estimation unit is used for determining which block in the current residual DFD has the highest energy (502). The memory unit (504) which is coupled to the selection unit (506), is used for storing the predetermined dictionary of Gabor functions and the representative quadrant functions. The selection unit (506), coupled to the estimation unit (502) and to the memory unit (504), applies the hierarchical expansion algorithm on the block of interest and selects the best matching atom. The quantizer (508), coupled to the selection unit, is used for quantizing the projection coefficient of the current atom. The residual computation unit (510), coupled to the quantizer, is then used for subtracting out the quantized projection of the current atom, and for computing a new current residual DFD. The comparator (512), which is coupled to the residual computation unit, is used for determining if the
iteration termination conditions have been met. The controller (514), which is coupled to the comparator (512), the estimation unit (502), the selection unit (506), the quantizer (508), and the residual computation unit (516), is used for running the iteration and managing the data from the expansion for encoding. The microprocessor also contains an encoding unit (516) which is coupled to the controller (514), and encodes the position, basis indices and projection coefficient information of the entire expansion of the DFD.
FIG. 7, numeral 700, is a flow chart showing a preferred embodiment of steps of a method in accordance with the present invention. The method for encoding includes the steps of: A) utilizing (702) a predetermined center/central biased weighting scneme for weighting, for each iteration, block sums to provide a selected block; B) determining (704), for each iteration, a best atom having a center which lies within the selected block using a predetermined hierarchical Gabor function search technique wherein predetermined Gabor functions are utilized from a memory; and C) utilizing (706) an energy adaptive dynamic quantization of Gabor basis coefficients from the best atom of each iteration to provide a minimized bit representation of coefficients for a displaced frame difference. The method for decoding includes the steps of: A) utilizing (708) an i nverse quantization of Gabor basis coefficients defined by the parameters in the energy adaptive dynamic quantization used in the encoder (706), B) projecting (710), for each decoded atom, the quantized Gabor basis coefficient of that atom onto the decoded Gabor basis function of that atom; and C) reconstructing (712) the quantized Gabor expansion of the displaced frame difference by summing all of the projections computed in step (710). The method is described with greater particularity above.
FIG. 8, numeral 800, is a block diagram of a preferred embodiment of steps of a device in accordance with the present invention. The device for encoding includes: A) a center/central biased estimator (802), coupled to
receive a displaced frame difference, for weighting, for each iteration, block sums utilizing a predetermined center/central biased weighting scheme to provide a selected block; B) a best atom selector (804), coupled to the center/central biased estimator and a memory unit (806) having at least stored predetermined Gabor functions, for determining, for each iteration, a best atom having a center which lies within the selected block using a predetermined hierarchical Gabor function search technique; and C) an energy adaptive dynamic quantization unit (808), coupled to the best atom selector (804), for utilizing an energy adaptive dynamic quantization of Gabor basis coefficients from the best atom of each iteration to provide a minimized bit representation of coefficients for a displaced frame difference. The device for decoding includes A) an inverse quantization unit (810) for decoding Gabor basis coefficients defined by the parameters in the energy adaptive dynamic quantization unit used in the encoder (808); B) a computation unit (812), coupled to a memory unit (814) having at least stored predetermined Gabor functions, for projecting, for each decoded atom, the quantized Gabor basis coefficient of that atom onto the decoded Gabor basis function of that atom; and C) a summation unit (816) coupled to the computation unit (812) for reconstructing the quantized Gabor expansion of the displaced frame difference by summing all of the projections computed by the computation unit (812). The operation of the device is described with greater particularity above.
The method and device may be selected to be embodied in least one of: A) an application specific integrated circuit; B) a field programmable gate array; and C) a microprocessor; and D) a computer-readable memory; arranged and configured to determine the first modified received signal having minimized distortion and interference in accordance with the scheme described in greater detail above.
Although exemplary embodiments are described above, it will be obvious to those skilled in the art that many alterations and modifications
may be made without departing from the invention. Accordingly, it is intended that all such alterations and modifications be included within the spirit and scope of the invention as defined in the appended claims.
We claim: