WO1999018735A1 - Picture masking and compositing in the frequency domain - Google Patents

Picture masking and compositing in the frequency domain Download PDF

Info

Publication number
WO1999018735A1
WO1999018735A1 PCT/US1998/020783 US9820783W WO9918735A1 WO 1999018735 A1 WO1999018735 A1 WO 1999018735A1 US 9820783 W US9820783 W US 9820783W WO 9918735 A1 WO9918735 A1 WO 9918735A1
Authority
WO
WIPO (PCT)
Prior art keywords
masking
dct
image signal
signal
frequency domain
Prior art date
Application number
PCT/US1998/020783
Other languages
French (fr)
Inventor
Ragnar H. Jonsson
Original Assignee
Thomson Consumer Electronics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Consumer Electronics, Inc. filed Critical Thomson Consumer Electronics, Inc.
Priority to AU96801/98A priority Critical patent/AU9680198A/en
Publication of WO1999018735A1 publication Critical patent/WO1999018735A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/48Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using compressed domain processing techniques other than decoding, e.g. modification of transform coefficients, variable length coding [VLC] data or run-length data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals

Definitions

  • the present invention relates to video processing systems, and, in particular, to apparatuses and methods for perforating picture masking and compositing in the DCT domain.
  • Computer systems are frequently used to perform various types of video or image processing, such as picture masking and compositing.
  • masking a specified fraction of certain pixels of a first image are retained in a new image, according to a provided mask.
  • compositing pixels of two input images are combined or "blended" at a certain ratio, to form a new image.
  • Such masking and compositing are important operations, for example in commercial video or image processing.
  • effects such as chroma keying, wipe, and overlaying are based on compositing pictures from two video sources.
  • Masking and compositing are also frequently used in production of still images, for example, when generating magazine advertisements and posters.
  • Computer systems are also used for various data encoding purposes, such as video compression.
  • Many video compression standards including JPEG, MPEG-1, MPEG-2, H.261, and
  • H.263 are based on the discrete cosine transform (DCT), it may be desirable to process compressed pictures in the DCT domain.
  • DCT discrete cosine transform
  • image processing techniques like masking and compositing are typically designed to operate in the spatial domain, not the frequency, or DCT, domain.
  • the input compressed video signals must be transformed into the spatial domain before being processed, and the processed signal must be transformed back into the DCT domain once more.
  • Such transformation to the spatial domain and back into the frequency domain can be very computationally expensive and, therefore, undesirable.
  • conventional ''brute force" convolutions performed directly in the frequency domain are also extremely computationally exoensive.
  • At least one image signal and a mask signal are received, wherein the image signal and mask signal are in the DCT domain.
  • Masking of the image signal is performed in the DCT domain, in accordance with the mask signal, by representing the masking in terms of the DCT basis functions, to provide an output image signal.
  • Fig. 1 shows a prior art spatial domain image processing system
  • Fig. 2 is a block diagram of a DCT domain image processing system, in accordance with a preferred embodiment of the present invention.
  • Fig. 3 depicts an exemplary processed image processed by the DCT domain image processing system of Fig. 2.
  • the technique of the present invention is based on representing the masking function in terms of the DCT basis functions and computing the masking as a weighted sum of the results of masking by the DCT basis functions.
  • spatial domain image processing system 100 includes three inverse DCT (TDCT) functional blocks 120, 121, 122, and a DCT functional block 130, as well as spatial domain processing functional block 110.
  • TDCT inverse DCT
  • each of these functional blocks may be implemented in hardware or software.
  • the IDCT and DCT operations of blocks 120, 121, 122, and 130, respectively, as well as the spatial domain processing of block 110 may be performed by a suitably programmed general-purpose or special-purpose microprocessor.
  • System 100 receives as input signals the mask signal and image signals x g and x each r>f which are in the DCT domain.
  • image signals x 0 and x may have been previously compressed with a process that utilizes the DCT.
  • System 100 outputs output image signal ⁇ , which represents the compositing of image signals x 0 and x, in accordance with the mask signal.
  • Output image signal y is also in the DCT domain. Since block 110 performs image processing in the spatial domain (e.g., with RGB or YUV spatial representations of image pixels), IDCT blocks 120, 121, and 123 are necessary in prior art svt ⁇ ms to transform the input signals into the spatial domain. Once the (spatial domain) input signals are processed, the processed output signal must be transformed back into the DCT domain, to provide signal y.
  • spatial domain processing unit 110 it is trivial for spatial domain processing unit 110 to implement spatial masking in the spatial domain by using spatial windowing.
  • masking also referred to as windowing
  • w[m,nj is simply
  • windowing in the spatial domain is equivalent to convolution in the frequency domain.
  • the masking in (1) can, therefore, be implemented by DCT processing of DCT signals as
  • Y[k,l] W[k,l] * X[k,l] (2)
  • XfkU, Yfi,l] W ⁇ ,lJ are the frequency representations of x[m,nj, y[m,n], and w[m,nj, respectively, * is the convolution operator
  • m, n are the spatial domain indices
  • k, I are the DCT or frequency domain indices.
  • the approach in (2) is a "brute force" DCT domain processing implementation based on symmetric convolution.
  • a symmetric convolution is achieved by making a symmetric extension of two finite length signals and the convolving the extended signals together using circular convolution. If the frequency domain is the discrete Fourier transform (DFT) domain, the convolution in (2) is circular convolution. Further background on such techniques may be found in D.E. Dudgeon & R.M. Mersereau,
  • the convolution in (2) is a symmetric convolution. Further background on symmetric convolutions may be found in S.A. Martucci, Symmetric Convolution and the Discrete Sine and Cosine Transforms: Principles and Applications, PhD thesis, Georgia Institute of Technology, 1993. Spatial masking in the DCT domain can, therefore, be implemented by using symmetric convolution according to (2).
  • Masking can be used to implement compositing of two input pictures x 0 [n,m] and X j [n,m] according to
  • the convolution in (2) can be implemented as two separate one-dimensional (1-D) convolutions.
  • the convolution may provide a reasonable approach to masking, since it requires, for example, only 16 multiplications per sample for an 8x8 DCT.
  • the convolution approach to masking is not as feasible since, for example, masking for an 8x8 block DCT requires 64 multiplications and considerable data shuffling.
  • compressed pictures are processed in the DCT domain with a technique based on representing the masking function in terms of the DCT basis functions and computing the masking as a weighted sum of the results of masking by the DCT basis functions, as described in further detail below.
  • DCT domain processing makes it possible to reduce both the computational complexity and the latency of the processing, by eliminating the need for transforming signals from the DCT domain into the spatial domain and back.
  • the desired processing i.e. masking and compositing
  • the desired processing i.e. masking and compositing
  • the desired processing i.e. masking and compositing
  • system 200 comprises DCT domain processor 210, but does not comprise nor require the three IDCT transforms and one DCT transform used in spatial domain processing. Instead, DCT domain processor 210 operates in the DCT domain, and is thus able to provide processing efficiencies relative to spatial domain processing.
  • system 200 operates with respect to two-dimensional (2-D) type-LT DCT of 8x8 blocks, such as is used by the image and video compression standards JPEG, MPEG-1,
  • the present invention may be utilized with other types of DCTs and other block sizes.
  • the 8x8 type-II DCT is given by
  • is a frequency-dependent DCT normalization coefficient which depends on the values of DCT domain indices k, I. It should be noted that even though the 2-D DCT can be used to represent non-separable signals, the transform itself is separable — and the basis functions of the 2-D DCT are separable. DCT basis functions are discussed in further detail below, with reference to (15).
  • the 2-D DCT of each block can be implemented using matrix multiplications
  • V k diag(v k [n]) (7)
  • non-separable masking cannot be expressed in a simple matrix multiplication form similar to (11).
  • a non-separable mask can be transformed by the 2- DCT, which does have separable basis functions.
  • the IDCT of the DCT domain representation of the non-separable mask, W[k,l], is given by
  • V J3 can be evaluated according to (12).
  • a non-separable mask can be implemented as weighted sum of separable functions
  • non-separable masks can be implemented as a weighted sum of separable masking operations (16).
  • the DCT basis functions form an orthogonal basis that can represent all discrete functions of length N.
  • the factor ⁇ normalizes the basis functions so r ⁇ [k] times the basis function in (15) (i.e., ⁇ [£]v,(.[m]) forms an orthonormal (normalized orthogonal) basis for all functions of length N. Since the basis functions for the 2-D DCT are formed as the product of two 1-D basis functions v k [m], v,[ ], the 2-D DCT basis functions are separable.
  • each matrix multiplication in (16) can be implemented using only one addition per sample and two multiplications (by A>) per 64 samples (for 8 x 8 DCT). If the DCT coefficients are obtained from decoding JPEG or MPEG streams, the multiplications byA c n be incorporated into the quantization matrices, reducing the computational complexity to only one addition per sample for each matrix multiplication in (16). In addition, there is one multiplication and one addition per pixel for each term in the weighted sum in (16). Therefore, the computational complexity of implementing masking according to (16) is approximately one multiplication and three additions per pixel for each term that is evaluated. Additionally, when the weighting coefficient, W[k,l], is zero, the whole term can be dropped and no computation is needed for that term.
  • the DCT approach is used in compression systems, such as JPEG and MPEG, because for most signals the energy is concentrated into relatively few DCT coefficients.
  • this property is utilized to save computations by skipping all processing for weighting coefficients, Wfk,l], equal to zero.
  • the savings can be made more substantial by dropping weighting coefficients close to zero.
  • the weight is zero or close to zero, terms can be dropped from the sum, which reduces the computational complexity.
  • the representation of the masking in terms of the weighted sum allows computational complexity to be reduced by skipping all processing for weighting coefficients W[k,l] equal to zero (or, in one embodiment, for all weighting coefficients W[k,l] less than a predetermined threshold).
  • a predetermined threshold for choosing which coefficients are dropped, the quality of the masking operation can be traded for computational complexity in a similar manner as quality is traded for bit rate in encoding.
  • the ability to trade off quality of the masking against computational complexity gives great flexibility in trading cost for quality. Accordingly, the frequency domain implementation of picture masking and compositing of the present invention can be very efficient.
  • the masking function is implemented in terms of the DCT basis functions.
  • any necessary scaling is first performed, and may be incorporated into the quantization matrix in an inverse quantization.
  • a weighted sum of the blocks masked in this fashion is then implemented.
  • the masked block at this point is re-normalized, in accordance with the scaling done previously. (As will be appreciated, the initial scaling and re-normalization scaling may be incorporated into the quantization matrix if the input signal is dequantized and the output signal is quantized.)
  • Compositing of two images can be implemented by use of masking, according to (3).
  • the following steps may be taken by a suitably programmed processor to implement the present invention, in one embodiment.
  • First examine every DCT coefficient of the mask, W[k,l], and if the coefficient is
  • the masking by the DCT basis functions can be implemented in terms of matrix multiplications as shown in (11) (and (16)). However, a more efficient implementation can be achieved by taking into account the regular structure of the windowing matrices as the example in
  • the frequency domain processing of the present invention requires less computation than both spatial domain processing and brute force DCT domain processing based on symmetric convolution.
  • the computational complexity involved in using the frequency domain processing of the present invention is approximately one to four multiplications per sample for most typical masking operations.
  • the complexity of spatial masking in the DCT domain can be limited to only three multiplications per sample without any noticeable degradation of the masking quality.
  • a single 2-D DCT takes about three multiplications per sample, and when implementing masking of JPEG or MPEG compressed pictures in the spatial domain, IDCTs must be first used to transform the DCT data into the spatial domain, and then use the DCT operation to transform the processed picture back into the DCT or frequency domain.
  • IDCTs must be first used to transform the DCT data into the spatial domain, and then use the DCT operation to transform the processed picture back into the DCT or frequency domain.
  • the present invention in one embodiment, requires about three times fewer multiplications per pixel than spatial domain processing, and about twenty times fewer multiplications than processing based on brute force convolution.
  • Image 300 contains a head-and-shoulder portion 312, which is overlaid over a flower garden background 310, and a transparent logo "SARNOFF" 315, which was inserted in the top right hand corner of image 300.
  • the picture compositing performed by system 200 to arrive at image 300 was performed, in one actual experiment, using only 1.8 multiplications per pixel.
  • the present invention is also potentially applicable to other frequency domains in which the masking function may be represented in terms of the frequency domain's basis functions and in which the masking can then be computed as a weighted sum of the results of masking by these basis functions.
  • the present invention may be applicable to other frequency domains such as the DFT and discrete sine transform (DST).
  • the present invention can be embodied in the form of computer- implemented processes and apparatuses for practicing those processes.
  • the present invention can also be embodied in the form of computer program code embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other computer-readable storage medium, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention.
  • the present invention can also be embodied in the form of computer program code, for example, whether stored in a storage medium, loaded into and/or executed by a computer, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention.
  • computer program code segments configure the microprocessor to create specific logic circuits.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)

Abstract

Processing image signals. At least one image signal and a mask signal are received, wherein the image signal and mask signal are in the discrete cosine transform (DCT) domain. Masking of the image signal is performed in the DCT domain, in accordance with the mask signal, by representing the masking in terms of the DCT basis functions, to provide an output image signal.

Description

PICTURE MASKING AND COMPOSITING IN THE FREQUENCY DOMAIN BACKGROUND OF THE INVENTION
Field of the Invention
The present invention relates to video processing systems, and, in particular, to apparatuses and methods for perforating picture masking and compositing in the DCT domain.
Description of the Related Art
Computer systems are frequently used to perform various types of video or image processing, such as picture masking and compositing. In masking, a specified fraction of certain pixels of a first image are retained in a new image, according to a provided mask. In compositing, pixels of two input images are combined or "blended" at a certain ratio, to form a new image.
Such masking and compositing are important operations, for example in commercial video or image processing. For example, commonly used effects such as chroma keying, wipe, and overlaying are based on compositing pictures from two video sources. Masking and compositing are also frequently used in production of still images, for example, when generating magazine advertisements and posters.
Computer systems are also used for various data encoding purposes, such as video compression. Many video compression standards (including JPEG, MPEG-1, MPEG-2, H.261, and
H.263) are based on the discrete cosine transform (DCT), it may be desirable to process compressed pictures in the DCT domain. However, image processing techniques like masking and compositing are typically designed to operate in the spatial domain, not the frequency, or DCT, domain. Thus, if the image processing of compressed video signals is done in the spatial domain, the input compressed video signals must be transformed into the spatial domain before being processed, and the processed signal must be transformed back into the DCT domain once more. Such transformation to the spatial domain and back into the frequency domain can be very computationally expensive and, therefore, undesirable. Moreover, conventional ''brute force" convolutions performed directly in the frequency domain are also extremely computationally exoensive. SUMMARY
In the present invention, at least one image signal and a mask signal are received, wherein the image signal and mask signal are in the DCT domain. Masking of the image signal is performed in the DCT domain, in accordance with the mask signal, by representing the masking in terms of the DCT basis functions, to provide an output image signal.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 shows a prior art spatial domain image processing system;
Fig. 2 is a block diagram of a DCT domain image processing system, in accordance with a preferred embodiment of the present invention; and
Fig. 3 depicts an exemplary processed image processed by the DCT domain image processing system of Fig. 2.
DESCRIPTION OF THE PREFERRED EMBODIMENT
As explained above, performing image processing techniques like masking and compositing in the spatial domain offers drawbacks when processing compressed video signals, which are in a frequency domain such as the DCT domain. Accordingly, in the present invention, there is provided an efficient method and associated apparatus for implementing picture masking and compositing in the DCT domain. As described in further detail below, the technique of the present invention is based on representing the masking function in terms of the DCT basis functions and computing the masking as a weighted sum of the results of masking by the DCT basis functions.
In the DCT domain, masking by the DCT basis functions has a relatively simple and efficient implementation. Because of the energy compaction property of the DCT, the weight of many of the functions is very small and can be dropped from the weighted sum without introducing noticeable artifacts. This leads to very efficient implementations for masking and compositing images in the DCT domain, typically requiring less than three multiplications per pixel. These and other features and advantages of the present invention are described in further detail below.
Spatial Domain Processing of DCT Images
Referring now to Fig. 1 , there is shown a prior art spatial domain image processing system 100. As illustrated, spatial domain image processing system 100 includes three inverse DCT (TDCT) functional blocks 120, 121, 122, and a DCT functional block 130, as well as spatial domain processing functional block 110. As will be appreciated by those skilled in the art, each of these functional blocks may be implemented in hardware or software. For example, the IDCT and DCT operations of blocks 120, 121, 122, and 130, respectively, as well as the spatial domain processing of block 110, may be performed by a suitably programmed general-purpose or special-purpose microprocessor.
System 100 receives as input signals the mask signal and image signals xg and x each r>f which are in the DCT domain. For example, image signals x0 and x, may have been previously compressed with a process that utilizes the DCT. System 100 outputs output image signal^, which represents the compositing of image signals x0 and x, in accordance with the mask signal. Output image signal y is also in the DCT domain. Since block 110 performs image processing in the spatial domain (e.g., with RGB or YUV spatial representations of image pixels), IDCT blocks 120, 121, and 123 are necessary in prior art svtεms to transform the input signals into the spatial domain. Once the (spatial domain) input signals are processed, the processed output signal must be transformed back into the DCT domain, to provide signal y.
As will be appreciated, it is trivial for spatial domain processing unit 110 to implement spatial masking in the spatial domain by using spatial windowing. For an input picture x[m,n], masking (also referred to as windowing) with the mask, or window, w[m,nj, is simply
y[m,n] = w[m,n] x x[m, ] (1)
As will be appreciated, windowing in the spatial domain is equivalent to convolution in the frequency domain. The masking in (1) can, therefore, be implemented by DCT processing of DCT signals as
Y[k,l] = W[k,l] * X[k,l] (2) where XfkU, Yfi,l], and Wβ,lJ are the frequency representations of x[m,nj, y[m,n], and w[m,nj, respectively, * is the convolution operator, m, n are the spatial domain indices, and k, I are the DCT or frequency domain indices. The approach in (2) is a "brute force" DCT domain processing implementation based on symmetric convolution. As will be appreciated, a symmetric convolution is achieved by making a symmetric extension of two finite length signals and the convolving the extended signals together using circular convolution. If the frequency domain is the discrete Fourier transform (DFT) domain, the convolution in (2) is circular convolution. Further background on such techniques may be found in D.E. Dudgeon & R.M. Mersereau,
Multidimensional Digital Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1984. If the frequency domain is the DCT domain (or other discrete trigonometric transforms), the convolution in (2) is a symmetric convolution. Further background on symmetric convolutions may be found in S.A. Martucci, Symmetric Convolution and the Discrete Sine and Cosine Transforms: Principles and Applications, PhD thesis, Georgia Institute of Technology, 1993. Spatial masking in the DCT domain can, therefore, be implemented by using symmetric convolution according to (2).
Masking can be used to implement compositing of two input pictures x0[n,m] and Xj[n,m] according to
/ y[m,n] = x, [m, ή] +w [m,ri] x (X0[JH, «] -x. [m, «]) (3) In this case a mask value of one means that samples from x0[n,m] are used, while a mask value of zero means that samples from x,[n,m] are used. Mask values in the range from zero to one imply linear interpolation between the two signals x0 and x,. (Mask values outside the unit interval imply linear extrapolation of the two input pictures.)
If the mask, w[m,n], is a separable signal, the convolution in (2) can be implemented as two separate one-dimensional (1-D) convolutions. As will be appreciated, a 2-D signal x[m,n] is separable if there exist two 1-D signals rfmj and sfnj such that x[m,n] = rfmjsfnj (i.e., it can be implemented as a cascade of horizontal and vertical DCTs). In the separable case, the convolution may provide a reasonable approach to masking, since it requires, for example, only 16 multiplications per sample for an 8x8 DCT. For non-separable masks, however, the convolution approach to masking is not as feasible since, for example, masking for an 8x8 block DCT requires 64 multiplications and considerable data shuffling.
Accordingly, both spatial domain processing of DCT images and the brute force DCT approach often require an undesirably high amount of computation. As video compression resulting in DCT images becomes more common, it becomes more desirable to do picture processing on compressed image data without completely decoding or decompressing the image data. Some techniques in this regard are discussed in further detail in B.C. Smith & L.A. Rowe, "Algorithms for Manipulating Compressed Images," IEEE Computer Graphics & Applications, pp. 34-42, September 1993; S-F Chang & D.G. Messerschmitt, "A New Approach to Decoding and Compositing Motion-Compensated DCT-Based Images," ICASSP-93, pp. V421-V424, 1993; and
N. Merhav & V. Bhaskaran, "A Transform Domain Approach to Spatial Domain Image Scaling," ICASSP-96, pp. 2405-2409, 1996.
Frequency Domain Processing of DCT Images
In the present invention, compressed pictures are processed in the DCT domain with a technique based on representing the masking function in terms of the DCT basis functions and computing the masking as a weighted sum of the results of masking by the DCT basis functions, as described in further detail below. Such DCT domain processing makes it possible to reduce both the computational complexity and the latency of the processing, by eliminating the need for transforming signals from the DCT domain into the spatial domain and back. As will be appreciated, if the desired processing (i.e.. masking and compositing) is done in the DCT domain, the three IDCT transforms 120, 121. 122, and the DCT transform 130, required in spatial domain processing of DCT images can be eliminated. Referring now to Fig. 2, there is shown a block diagram of a DCT domain image processing system 200, in accordance with a preferred embodiment of the present invention. As shown, system 200 comprises DCT domain processor 210, but does not comprise nor require the three IDCT transforms and one DCT transform used in spatial domain processing. Instead, DCT domain processor 210 operates in the DCT domain, and is thus able to provide processing efficiencies relative to spatial domain processing.
In one embodiment, system 200 operates with respect to two-dimensional (2-D) type-LT DCT of 8x8 blocks, such as is used by the image and video compression standards JPEG, MPEG-1,
MPEG-2, H.261 and H.263. As will be appreciated, however, in alternative embodiments the present invention may be utilized with other types of DCTs and other block sizes.
The 8x8 type-II DCT is given by
X[k,l] = τ ]η[/] + 1)/) (4)
Figure imgf000008_0001
where η is a frequency-dependent DCT normalization coefficient which depends on the values of DCT domain indices k, I. It should be noted that even though the 2-D DCT can be used to represent non-separable signals, the transform itself is separable — and the basis functions of the 2-D DCT are separable. DCT basis functions are discussed in further detail below, with reference to (15).
The 2-D DCT of each block can be implemented using matrix multiplications
X = CXC τ, (5)
where X and X are matrix representations of Xfk J and x[m,n], respectively, and C is the DCT
transformation matrix (a unitary matrix, i.e., CCT=I). For the 8x8 DCT, the matrices X, X and C are all 8x8.
Vertical spatial masking by any window v^mj and horizontal masking by any window vfnj can be implemented as the matrix multiplication Ϋκl = VkXV, (6) where
Vk = diag(vk[n]) (7) Based on the fact that C is unitary, it can be derived that
Figure imgf000009_0001
= CVkXV,C τ (9)
= CVkC τCXC τCV,C τ (10)
= vkxv, (1i) where
Vj = CVjC Z (12)
It should be noted that non-separable masking cannot be expressed in a simple matrix multiplication form similar to (11). However, a non-separable mask can be transformed by the 2- DCT, which does have separable basis functions. The IDCT of the DCT domain representation of the non-separable mask, W[k,l], is given by
v[m,n] = ∑ ∑ W[k,l] τ#] cos(^(2 + l)*)η[Z]cos(-^(2n + 1)1) (13) t=o ι=o 16 16
Substituting (13) into (1) gives
y[ ,n] = ∑ ∑ {(W[k,l] t ]η[/]) vk[m]x[m,n] v,[n]} (14)
where
π v. [m] cos( — (2m + \)k) (15) 16 This implies that masking x[m,n] with w[m,n] is equivalent to a weighted sum of the masking of x[m,n] with the basis functions of the IDCT. It should be noted that the windowing in
(14) is separable and can, therefore, be written in terms of the matrix multiplications in (6), where both Vj mJ and v,[nj are given by (15) for k,l e {0,1,...,7}. Therefore, the DCT transform of the masked signal becomes
Y = Σ Σ { [k,l] τ ]η[/]) VkXV\ (16)
*=o /=o
where the numerical values of the DCT domain windowing matrices, VJ3 can be evaluated according to (12).
Thus, a non-separable mask can be implemented as weighted sum of separable functions
(13), and masking can be accomplished with separable functions using matrix notations (11). Therefore, non-separable masks can be implemented as a weighted sum of separable masking operations (16). The separable masking operations in (16), by the DCT basis functions, then turn out to have simple and efficient implementation.
As will be appreciated, the functions defined in (15) are the DCT basis functions (for 1-D type-Li DCT of size N=8). The DCT basis functions form an orthogonal basis that can represent all discrete functions of length N. The factor η normalizes the basis functions so r\[k] times the basis function in (15) (i.e.,η[£]v,(.[m]) forms an orthonormal (normalized orthogonal) basis for all functions of length N. Since the basis functions for the 2-D DCT are formed as the product of two 1-D basis functions vk[m], v,[ ], the 2-D DCT basis functions are separable.
The windowing matrices for the DCT basis functions, VJ5 are sparse and have very regular structure. For example, for j = 4 we have
Figure imgf000011_0001
0 0 0 1 0 1 0 0
0 0 1 0 0 0 1 0
1 0 1 0 0 0 0 0 1
V. = - (17)
2 J2 0 0 0 0 0 0 0
0 1 0 0 0 0 0 -1
0 0 1 0 0 0 -1 0
0 0 0 1 0 -1 0 0
and the other windowing matrices have the same kind of structure.
By incorporating the factor of lA into the windowing function W[k,l], each matrix multiplication in (16) can be implemented using only one addition per sample and two multiplications (by A>) per 64 samples (for 8 x 8 DCT). If the DCT coefficients are obtained from decoding JPEG or MPEG streams, the multiplications byA c n be incorporated into the quantization matrices, reducing the computational complexity to only one addition per sample for each matrix multiplication in (16). In addition, there is one multiplication and one addition per pixel for each term in the weighted sum in (16). Therefore, the computational complexity of implementing masking according to (16) is approximately one multiplication and three additions per pixel for each term that is evaluated. Additionally, when the weighting coefficient, W[k,l], is zero, the whole term can be dropped and no computation is needed for that term.
As will be appreciated, the DCT approach is used in compression systems, such as JPEG and MPEG, because for most signals the energy is concentrated into relatively few DCT coefficients. In the present invention, this property is utilized to save computations by skipping all processing for weighting coefficients, Wfk,l], equal to zero. Alternatively, the savings can be made more substantial by dropping weighting coefficients close to zero. As will be appreciated, when the weight is zero or close to zero, terms can be dropped from the sum, which reduces the computational complexity. That is, the representation of the masking in terms of the weighted sum allows computational complexity to be reduced by skipping all processing for weighting coefficients W[k,l] equal to zero (or, in one embodiment, for all weighting coefficients W[k,l] less than a predetermined threshold). As will be appreciated, by adjusting the threshold for choosing which coefficients are dropped, the quality of the masking operation can be traded for computational complexity in a similar manner as quality is traded for bit rate in encoding. The ability to trade off quality of the masking against computational complexity gives great flexibility in trading cost for quality. Accordingly, the frequency domain implementation of picture masking and compositing of the present invention can be very efficient.
Thus, as will be appreciated by those skilled in the art, in the present invention, the masking function is implemented in terms of the DCT basis functions. As will be appreciated, any necessary scaling is first performed, and may be incorporated into the quantization matrix in an inverse quantization. Next, a weighted sum of the blocks masked in this fashion is then implemented. In one embodiment, the masked block at this point is re-normalized, in accordance with the scaling done previously. (As will be appreciated, the initial scaling and re-normalization scaling may be incorporated into the quantization matrix if the input signal is dequantized and the output signal is quantized.)
In one embodiment, all processing for weighting coefficients W[ l] equal to zero is skipped
(where, for an input picture x[m,n], w[m,n] is the window used to mask the input picture, and W ife,// is the frequency representation of w[m,n]). In an alternative embodiment, because of the aforementioned energy compaction property of the DCT, all processing is skipped for weighting coefficients Wfk J close to zero. In general, all processing is skipped for weighting coefficients W[k, I] less than or equal a predetermined threshold value (where a threshold value of zero yields the former case). In one embodiment of the present invention, this threshold is selected in accordance with a desired tradeoff between the quality of the masking operation and computational complexity, where a higher threshold provides lower quality but greater savings in computational complexity, and vice-versa.
Compositing of two images can be implemented by use of masking, according to (3).
For a given block of the image (e.g., an 8-by-8 block), for example, the following steps may be taken by a suitably programmed processor to implement the present invention, in one embodiment. First, examine every DCT coefficient of the mask, W[k,l], and if the coefficient is
"relevant" (i.e., either bigger than zero or bigger than a given threshold, depending on the embodiment), then do masking by the corresponding basis function, multiply each coefficient of the masked signal by the weighting coefficient, and add the result to the weighted sum for the block. This implements the weighted sum in (16).
The masking by the DCT basis functions can be implemented in terms of matrix multiplications as shown in (11) (and (16)). However, a more efficient implementation can be achieved by taking into account the regular structure of the windowing matrices as the example in
(17) shows. Several of these more efficient implementations are discussed in the detailed description above.
For processing of original signals already in the DCT domain, the frequency domain processing of the present invention requires less computation than both spatial domain processing and brute force DCT domain processing based on symmetric convolution. Through empirical testing and modeling, the inventors have found that the computational complexity involved in using the frequency domain processing of the present invention is approximately one to four multiplications per sample for most typical masking operations. As will be appreciated, by using an algorithm similar to rate control algorithms, the complexity of spatial masking in the DCT domain can be limited to only three multiplications per sample without any noticeable degradation of the masking quality.
By contrast, a single 2-D DCT takes about three multiplications per sample, and when implementing masking of JPEG or MPEG compressed pictures in the spatial domain, IDCTs must be first used to transform the DCT data into the spatial domain, and then use the DCT operation to transform the processed picture back into the DCT or frequency domain. Thus, when implementing picture compositing, there are at least two IDCTs (one for each input picture) and one DCT (for the composite picture) needed, in addition to the spatial processing. Therefore, there are at least nine multiplications needed for implementing picture compositing in the spatial domain, which is approximately three times what is needed for described embodiments of the DCT domain implementation of the present invention. In sum, therefore, for picture compositing of pictures in the DCT domain, the present invention, in one embodiment, requires about three times fewer multiplications per pixel than spatial domain processing, and about twenty times fewer multiplications than processing based on brute force convolution.
Referring now to Fig. 3, there is depicted an exemplary image 300 processed using DCT domain picture compositing performed in the frequency domain by image processing system .-00. Image 300 contains a head-and-shoulder portion 312, which is overlaid over a flower garden background 310, and a transparent logo "SARNOFF" 315, which was inserted in the top right hand corner of image 300. The picture compositing performed by system 200 to arrive at image 300 was performed, in one actual experiment, using only 1.8 multiplications per pixel.
As will be appreciated, although the embodiments of the present invention described above is implemented with respect to the DCT frequency domain, the present invention is also potentially applicable to other frequency domains in which the masking function may be represented in terms of the frequency domain's basis functions and in which the masking can then be computed as a weighted sum of the results of masking by these basis functions. For example, the present invention may be applicable to other frequency domains such as the DFT and discrete sine transform (DST).
As will be understood, the present invention can be embodied in the form of computer- implemented processes and apparatuses for practicing those processes. The present invention can also be embodied in the form of computer program code embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other computer-readable storage medium, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention. The present invention can also be embodied in the form of computer program code, for example, whether stored in a storage medium, loaded into and/or executed by a computer, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention. When implemented on a general-purpose microprocessor, the computer program code segments configure the microprocessor to create specific logic circuits.
It will be understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated above in order to explain the nature of this invention may be made by those skilled in the art without departing from the principle and scope of the invention as recited in the following claims.

Claims

CLAIMSWhat is claimed is:
1. A method for processing image signals, comprising the steps of: (a) receiving at least one image signal and a mask signal, wherein the image signal and mask signal are in a frequency domain; and (b) performing masking of the image signal in the frequency domain, in accordance with the mask signal, by representing the masking in terms of the basis functions of the frequency domain, to provide an output image signal.
2. The method of claim 1, wherein the frequency domain is the discrete cosine transform (DCT) domain.
3. The method of claim 2, wherein step (b) comprises the steps of: (1) masking blocks of the image signal with the DCT basis functions to provide masked blocks; and (2) performing a weighted sum of the masked blocks.
4. The method of claim 2, wherein the image signal is divided into 8x8 blocks and the DCT is a two-dimensional type-II DCT.
5. The method of claim 1, wherein step (a) comprises the step of receiving first and second image signals, the method further comprising the step of compositing the first and second image signals in accordance with the mask signal.
6. The method of claim 1, wherein step (b) comprises the step of skipping all processing for weighting coefficients W[k,l] equal to zero, wherein the image signal is represented by x[m,n], the mask signal is represented by a window w[m,nj, and Vffk ] is the frequency representation of w[m,nj.
7. The method of claim 1 , wherein step (b) comprises the step of skipping all processing for weighting coefficients W[k,l] less than or equal to a specified threshold, wherein the image signal is represented by x[m,n], the mask signal is represented by a window w[m,n], and W[k,lJ is the frequency representation of wfm, n].
8. The method of claim 7, further comprising the step of selecting the threshold in accordance with a desired tradeoff between the quality of the masking of step (b) and the computational complexity required to perform the masking of step (b), wherein a higher threshold provides lower masking quality but smaller computational complexity, and vice-versa.
9. An apparatus for processing image signals, the apparatus comprising: (a) means for receiving at least one image signal and a mask signal, wherein the image signal and mask signal are in a frequency domain; and (b) means for performing masking of the image signal in the frequency domain, in accordance with the mask signal, by representing the masking in terms of the basis functions of the frequency domain, to provide an output image signal.
10. The apparatus of claim 9, wherein the frequency domain is the discrete cosine transform (DCT) domain.
11. The apparatus of claim 10, wherein means (b) comprises: (1) means for masking blocks of the image signal with the DCT basis functions to provide masked blocks; and (2) means for performing a weighted sum of the masked blocks.
12. The apparatus of claim 10, wherein the image signal is divided into 8x8 blocks and the DCT is a two-dimensional type-II DCT.
13. The apparatus of claim 9, wherein means (a) comprises means for receiving first and second image signals, the apparatus further comprising means for compositing the first and second image signals in accordance with the mask signal.
14. The apparatus of claim 9, wherein means (b) comprises means for skipping all processing for weighting coefficients W[k,l] equal to zero, wherein the image signal is represented by x[m,n], the mask signal is represented by a window w[m,n], and W[k,lJ is the frequency representation of w[m, n] .
15. The apparatus of claim 9, wherein means (b) comprises means for skipping all processing for weighting coefficients Wf lJ less than or equal to a specified threshold, wherein the image signal is represented by x[m,n], the mask signal is represented by a window w[m,n], and W X, I] is the frequency representation of wfm, nj.
16. The apparatus of claim 15, further comprising means for selecting the threshold in accordance with a desired tradeoff between the quality of the masking of step (b) and the computational complexity required to perform the masking of step (b), wherein a higher threshold provides lower masking quality but smaller computational complexity, and vice-versa.
17. A storage medium having stored thereon a plurality of instructions for processing image signals, wherein the plurality of instructions, when executed by a processor, cause the processor to perform the steps of: (a) receiving at least one image signal and a mask signal, wherein the image signal and mask signal are in a frequency domain; and (b) performing masking of the image signal in the frequency domain, in accordance with the mask signal, by representing the masking in terms of the basis functions of the frequency domain, to provide an output image signal.
18. The storage medium of claim 17, wherein: the frequency domain is the discrete cosine transform (DCT) domain; and step (b) comprises the steps of: (1) masking blocks of the image signal with the DCT basis functions to provide masked blocks; and (2) performing a weighted sum of the masked blocks.
19. The storage medium of claim 17, wherein step (b) comprises the step of skipping all processing for weighting coefficients WfklJ less than or equal to a specified threshold, wherein the image signal is represented by x[m,n], the mask signal is represented by a window w[m,n], and W/£, I] is the frequency representation of wfm, nj.
20. The storage medium of claim 19, further comprising the step of selecting the threshold in accordance with a desired tradeoff between the quality of the masking of step (b) and the computational complexity required to perform the masking of step (b), wherein a higher threshold provides lower masking quality but smaller computational complexity, and vice-versa.
PCT/US1998/020783 1997-10-07 1998-10-02 Picture masking and compositing in the frequency domain WO1999018735A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU96801/98A AU9680198A (en) 1997-10-07 1998-10-02 Picture masking and compositing in the frequency domain

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US6124697P 1997-10-07 1997-10-07
US60/061,246 1997-10-07
US327398A 1998-01-06 1998-01-06
US09/003,273 1998-01-06

Publications (1)

Publication Number Publication Date
WO1999018735A1 true WO1999018735A1 (en) 1999-04-15

Family

ID=26671565

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/020783 WO1999018735A1 (en) 1997-10-07 1998-10-02 Picture masking and compositing in the frequency domain

Country Status (2)

Country Link
AU (1) AU9680198A (en)
WO (1) WO1999018735A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005091621A1 (en) * 2004-03-10 2005-09-29 Nokia Corporation Method and device for transform-domain video editing
EP2248343A1 (en) * 2008-02-01 2010-11-10 ActiveVideo Networks, Inc. Transition creation for encoded video in the transform domain
US9674530B1 (en) 2013-04-30 2017-06-06 Google Inc. Hybrid transforms in video coding
US9769499B2 (en) 2015-08-11 2017-09-19 Google Inc. Super-transform video coding
US9807423B1 (en) 2015-11-24 2017-10-31 Google Inc. Hybrid transform scheme for video coding
US10142628B1 (en) 2013-02-11 2018-11-27 Google Llc Hybrid transform in video codecs
US10277905B2 (en) 2015-09-14 2019-04-30 Google Llc Transform selection for non-baseband signal coding
US10462472B2 (en) 2013-02-11 2019-10-29 Google Llc Motion vector dependent spatial transformation in video coding
US11122297B2 (en) 2019-05-03 2021-09-14 Google Llc Using border-aligned block functions for image compression

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0595218A1 (en) * 1992-10-26 1994-05-04 Nec Corporation Image sub-sampling apparatus and method
WO1994021079A1 (en) * 1993-03-11 1994-09-15 Regents Of The University Of California Method and apparatus for compositing compressed video data
WO1995033342A1 (en) * 1994-05-27 1995-12-07 Ictv Inc. Compressed digital video overlay controller and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0595218A1 (en) * 1992-10-26 1994-05-04 Nec Corporation Image sub-sampling apparatus and method
WO1994021079A1 (en) * 1993-03-11 1994-09-15 Regents Of The University Of California Method and apparatus for compositing compressed video data
WO1995033342A1 (en) * 1994-05-27 1995-12-07 Ictv Inc. Compressed digital video overlay controller and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SMITH B C ET AL: "ALGORITHMS FOR MANIPULATING COMPRESSED IMAGES", IEEE COMPUTER GRAPHICS AND APPLICATIONS, vol. 13, no. 5, 1 September 1993 (1993-09-01), pages 34 - 42, XP000562744 *
SMITH B C ET AL: "COMPRESSED DOMAIN PROCESSING OF JPEG-ENCODED IMAGES", REAL-TIME IMAGING, vol. 2, no. 1, February 1996 (1996-02-01), pages 3 - 17, XP000656168 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005091621A1 (en) * 2004-03-10 2005-09-29 Nokia Corporation Method and device for transform-domain video editing
US7599565B2 (en) 2004-03-10 2009-10-06 Nokia Corporation Method and device for transform-domain video editing
CN101529892B (en) * 2004-03-10 2012-07-25 诺基亚公司 Method and device for transform-domain video editing
EP2248343A1 (en) * 2008-02-01 2010-11-10 ActiveVideo Networks, Inc. Transition creation for encoded video in the transform domain
EP2248343A4 (en) * 2008-02-01 2011-02-02 Activevideo Networks Inc Transition creation for encoded video in the transform domain
US8149917B2 (en) 2008-02-01 2012-04-03 Activevideo Networks, Inc. Transition creation for encoded video in the transform domain
US10142628B1 (en) 2013-02-11 2018-11-27 Google Llc Hybrid transform in video codecs
US10462472B2 (en) 2013-02-11 2019-10-29 Google Llc Motion vector dependent spatial transformation in video coding
US9674530B1 (en) 2013-04-30 2017-06-06 Google Inc. Hybrid transforms in video coding
US9769499B2 (en) 2015-08-11 2017-09-19 Google Inc. Super-transform video coding
US10277905B2 (en) 2015-09-14 2019-04-30 Google Llc Transform selection for non-baseband signal coding
US9807423B1 (en) 2015-11-24 2017-10-31 Google Inc. Hybrid transform scheme for video coding
US11122297B2 (en) 2019-05-03 2021-09-14 Google Llc Using border-aligned block functions for image compression

Also Published As

Publication number Publication date
AU9680198A (en) 1999-04-27

Similar Documents

Publication Publication Date Title
EP0798927B1 (en) Fast DCT domain downsampling and inverse motion compensation
EP0781052B1 (en) Universal MPEG decoder with scalable picture size
DE69831961T2 (en) IMAGE OBJECT PROCESSING FOR OBJECT-BASED CODING SYSTEMS USING MASKS AND ROUNDED MEDIUM VALUES
US5703965A (en) Image compression/decompression based on mathematical transform, reduction/expansion, and image sharpening
KR101291869B1 (en) Noise and/or flicker reduction in video sequences using spatial and temporal processing
JP4515263B2 (en) Low Complexity Unification Transform for Video Coding
Shen et al. Inner-block operations on compressed images
US7489827B2 (en) Scaling of multi-dimensional data in a hybrid domain
US6067384A (en) Fast scaling of JPEG images
KR20010033772A (en) Fast dct domain downsampling
EP2300982A1 (en) Image/video quality enhancement and super-resolution using sparse transformations
US6125212A (en) Explicit DST-based filter operating in the DCT domain
WO1999018735A1 (en) Picture masking and compositing in the frequency domain
US6807310B1 (en) Transformation of image parts in different domains to obtain resultant image size different from initial image size
US6041079A (en) Field/frame conversion of DCT domain mixed field/frame mode macroblocks using 1-dimensional DCT/IDCT
US6304604B1 (en) Method and apparatus for configuring compressed data coefficients to minimize transpose operations
CA2336255A1 (en) Efficient down-scaling of dct compressed images
US6111989A (en) 1/4 size real time decoding of digital video
Dugad et al. A fast scheme for downsampling and upsampling in the DCT domain
US7099523B2 (en) Method and system for scaling a signal sample rate
US6671414B1 (en) Shift and/or merge of transformed data along one axis
EP1563679B1 (en) Method for resizing images using the inverse discrete cosine transform
Walker et al. The Transform and Data Compression Handbook
US6104838A (en) 1/16 size real time decoding of digital video
Shen et al. Scanline algorithms in compressed domain

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WA Withdrawal of international application
NENP Non-entry into the national phase

Ref country code: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA