MULTI-DIMENSIONAL DATA COMPRESSION
This invention relates to the field of data compression and more particularly, to a method and system for efficient compression of digital video data.
BACKGROUND OF THE INVENTION
One of the most significant trends affecting the efficiency of the Internet today is the movement towards the full motion video and audio data across the Internet. As web sites continue to increase their multimedia content through the integration of audio, video and data, the ability of the web to effectively deliver this media to the Internet end users will yield a congestion problem due to the architecture of the web. The significant increase in multimedia incorporated in web pages is due in part to the developments in hardware and software that have allowed web page designers to efficiently create, design, access and utilize multimedia applications. These developments in multimedia content place significant demands on the network access functions of the Internet. There is ongoing development in improving network access functions such as providing high speed links, not only throughout the Internet backbone but down to the local access to the user. To reduce network traffic is to decrease the size of the data transferred across the network. This is achieved in many ways. One of these techniques is the use of data compression and manipulation. Traditionally, image compression methods may be classified as those which reproduce the original data exactly, that is, "loss less compression" and those which trade a tolerable divergence from the original data for greater compression, that is, "lossy compression". Typically, lossless methods have a problem that they are unable to achieve a compression of much more than 70%. Therefore, where higher compression ratios are needed, lossy techniques have been developed. In general, the amount by which the original media source is reduced is referred to as the compression ratio. Compression technologies have evolved over time to adapt to the various user requirements. Historically, compression technology focused on telephony, where sound wave compression algorithms were developed and optimized. These algorithms all implemented a one-dimensional (ID) transformation, which increased the ID entropy of the data in the transformed domain to allow for efficient quantization and ID data coding.
Compression technologies then focused on two-dimensional (2D) data such as images or pictures. At first, the ID audio algorithms were applied to the line data of each
image to build up a compressed image. Research then progressed to the point today where the ID algorithms have been extended to implement a two dimensional (2D) transformation, which increases the 2D entropy to allow for efficient quantization and 2D data coding. Currently, state of the art technology requires compression of moving pictures or video. In this area, research is focused on applications of 2D image coding algorithms to a multitude of images which comprise video (frames) and apply motion compensation techniques to take advantage of correlation between frame data. For example, United States Patent No. RE 36015, re-issued December 29, 1998, describes a video compression system which is based on the image data compression system developed by the motion picture experts group (MPEG) which uses various groups of field configurations to reduce the number of binary bits used to represent a frame composed of odd and even fields of video information.
In general, MPEG systems integrate a number of well known data compression techniques into a single system. These include motion compensated predictive coding, discrete cosine transformation (DCT), adaptive quantization and variable length coding (VLC). The motion compensated predictive coding scheme processes the video data in groups of frames in order to achieve relatively high levels of compression without allowing the performance of the system to be degraded by excessive error propagation. In these group of frame processing schemes, image frames are classified into one of three types: the intraframe (I-Frame), the predicted frame (P -Frame) and the bidirectional frame (B-Frame). A 2D DCT is applied to small regions such as blocks of 8 x 8 pixels to encode each of the I-Frames. The resulting data stream is quantized and encoded using a variable length code such as amplitude run length Huffman code to produce the compressed output signal. As may be seen, this quantization technique still focuses on compressing single frames or images which may not be the most effective means of compression for current multimedia requirements. Also, for low bit rate applications, MPEG suffers from 8 x 8 blocking artifacts known as tiling. Furthermore, these second-generation compression approaches as described above, have reduced the media of data requirements for video by as much as 100:1. Typically, these technologies are focused on the following approaches: wavelet algorithms and vector quantization.
The wavelet algorithms are implemented with efficient significance map coding such as EZW and line detection with gradient vectors depending on the application's final reconstructed resolution. The wavelet algorithms operate on the entire image and have
efficient implementation due to finite impulse response (FIR) filter realizations. All wavelet algorithms decompose an image into coarser, smooth approximations with low pass digital filtering (convolution) on the image. IN addition, the wavelet algorithms generate detailed approximations (error signals) with high pass digital filtering or convolution on the image. This decomposition process can be continued as far down the pyramid as a designer requires where each step in the pyramid has a sample rate reduction of two. This technique is also known as spatial sample rate decimation or down sampling of the image where the resolution is one half in the next sub-band of the pyramid as shown schematically in figures 1 and 2. In vector quantization (NQ), algorithms are used with efficient codebooks. The
NQ algorithm codebooks are based on macroblocks (8 x 8 or 16 x 16) to compress image data. These algorithms also have efficient implementations. However, they suffer from blocking artifacts (tiling) at low bit rates (high compression ratio). The codebooks have a few codes to represent a multitude of bit patterns where fewer bits are allocated to the bit patterns in a macro block with the highest probability. The VQ technique is shown schematically in figure 3.
As discussed earlier, these current techniques are limited when applied to third generation compression requirements, that is, compression ratios approaching 1000:1. That is, wavelet and vector quantization techniques as discussed above still focus on compressing single frames or images which may not be the most effective for third generation compression requirements.
SUMMARY OF THE INVENTION
In accordance with this invention there is provided a method of compressing a data signal, the method comprising the steps of:
(a) selecting a sequence of image frames, the sequences comprising part of a video stream;
(b) applying a three dimensional transform to the selected sequence to produce a first transformed output; and (c) encoding the transformed output to produce a compressed stream output.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other features of the preferred embodiments of the invention will become more apparent in the following detailed description in which reference is made to the appended drawings wherein: Figure 1 is a schematic diagram of a multi resolution wavelet compressor;
Figure 2 is a schematic diagram of a one-stage wavelet decoder; Figure 3 is a schematic diagram showing a single frame vector quantization technique;
Figure 4(a) and (b) is a schematic diagram of a video frame sequence for use in the present invention;
Figure 4(c) is a schematic representation of a transformed sequence; Figure 5 is a schematic diagram of a 3D wavelet dyadic sub-cube structure in accordance with the present invention;
Figure 6 is a graph showing compression ratio versus frame depth for different media types;
Figure 7 is a flow chart showing the operation of a 3D compression system; and Figure 8 is a flow chart showing the operation of a 3D decompression system.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS In the following description, like numerals refer to like structures in the drawings.
Referring to figure 4(a), a schematic diagram of a sequence of digitized video frames is shown generally by numeral 40. The sequence comprises N frames 42 each temporally sampled by an amount Δt„ . Each frame is made up of a two dimensional matrix of pixels. In order to compress this video frame sequence, a three dimension transform is applied to a three dimensional matrix of pixels defined in the sequence of frames defined in 3D space by (x,y,t) to yield a 3D cubic structure in the transformed domain. For the case of a 3D Fourier or cosine transform the center of the 3D structure shall be DC (for the case of spatial to spectral transformations). As one leaves the center of the cubic structure the density will decrease since image data information in the 3D structures dictates the spectral distribution. This is shown graphically in figures 4(b) and 4(c). Because there is a high correlation over the space defined by the (x,y,t) dimensions, there will be very high entropy in the transformed domain which will provide for compression ratios that can approach 1000: 1. The 3D algorithm may use (Ax, Ay, At). ,
spatial/frame data pixel values where (Ax, Ay, At) are constant and the total number of frames (NΔt) used in the transformation shall be variable, i.e., the frame depth, depending on the scene data and the media type. In scene data the probability is high that adjacent pixels in a frame are the same. This also applies to neighbouring pixels in adjacent frames.
Referring back to figure 4(b), a sequence of frames to be transformed is indicated by label A. The three dimensional continuous fourier transform when applied to the object A that is defined by a function / of three independent variables x, y, and z , is:
3{ ,
J f(x, y,z)x e-
j2π{ux+vy+W2) dxdydz
Using Euler's formula, this may be expressed as:
F(U, v, w) = \ \ \f(x, v, z) x [cos{2;z"(- + vy + wz)} - j sin{2^-(w + vy + wz)}] dxdydz
or:
F{u, v, w) = R{u, v, w) + jl(u,v, w) = \F(u,v, w]eM"'v'w) with:
+∞ R(U,V, W) = J ]{/( , v,z)x [cos{2;r(wx + y + wz)}] dxdydz
and,
The Fourier spectrum and spectral density are then defined by the following equations:
\F{u,v, W] = [{R(U, V, W)Y + {l{u,v, w)}2 f = 3D Fourier Specttrrum
(U, V, W) : 3D Fourier Phase
P(U, V, W) = {R(U,V, W)}2 + {l(u,v,w)Y = 3D Spectral Density The transformation of the object A which is represented by P(u,v,w) will define the three dimensional spectral information with DC located at the center P (0,0,0). As u, v, or w are changed, the spectral density also changes. In fact, the largest percentage of the
energy within P(u,v,w) will be contained near the center P (0,0,0) with the density falling off dramatically(non-linearly) as u,v,w are non-zero.
For the case of the object A being a cubic structure, boundary conditions exist and the triple integration will result in a cubic structure with the spectral density being the greatest at the center of the cubic structure. Proceeding away from the center of the cubic structure, the spectral density will rapidly become smaller and the spectral density will approach zero at the edges of the cubic structure. This is shown graphically in figure 4(c). The uniformity throughout the object A, determines the rate at which the spectral density decreases from maximum at the center of the cubic structure to zero at the edges. With a high level of uniformity throughout A, there will be a large correlation or low entropy in A. As a result, the 3D Spectral Density will result in high entropy. This means the rate of change of density from the center of the transformed object A will be very high.
Rather than the continuous Fourier transform above, a three dimensional discrete fourier transform may be applied to the object A. In this case, a function f(x,y,z) that is sampled in the x, y, and z, dimensions by Δx, Δy, and Δz is given by the following equation:
where: u=0, 1, 2, . M-l, v=0, 1, 2, . N-l, w=0, l, 2, . . 0-1.
Using Euler's formula, this may be expressed as:
or:
F(u, v, w) = R(U, v, w) + jl(u,v, w) - (u,v,w)
, w e j
with:
The spectral density is given by:
P(u,v, w) = {R{u,v, W)Y + {l(u,v, w)Y
For the case where N=M=O, the numerical complexity of implementation is proportional to N3. There will be 2N3 trigonometric calculations, 2N3 real multiplication's, and 2N3 real additions.
For video processing applications where Δx, Δy, and Δz correspond to the horizontal spatial sample rate, the vertical spatial sample rate, and the temporal sample rate respectively, the object A will be a cubic structure defined by the video input format such as Common Input Format (CIF). This results in a three dimensional (352,240,z) pixel array where z varies with the frame rate and the scene change data. The discrete fourier transform will result in a cubic structure (352,240,z) with the spectral density being the greatest at the center (352/2,240/2,z/2). Proceeding away from the center, the spectral density will decrease and approach zero at the edges. The pixel correlation throughout the area (352,240,z), determines the rate at which the spectral density decreases from maximum at the center of the cubic structure to zero at the edges. Generally there is a high correlation of temporal and spatial neighbors of a pixel with A. As a result, the 3D transformation will result in low correlation or high entropy. This means the rate of change of density from the center of the transformed object A will be very high. The rate of change is dependant on the type of video being processed and the temporal dimension z defined by scene changes. More clearly by type of video is meant talk shows, high action movies, cartoons and such like. Thus for different types of video the spectral content will vary.
The transformed object A may be recovered by applying the appropriate inverse transform. The three dimensional inverse discrete fourier transform is defined as follows:
M-iN-i ox nSΕ_y__y!__\
Z-l {F(u,v,w)} = f(x,y,z) = ∑∑∑F(u,v,w)x e ^ " °
_=0 v=0 w=0 where: x=0, 1, 2, . . . M-l, y=0, l, 2, . . . N-l, z=0, 1, 2, . . . O-l.
Using Euler's formula, this may be expressed as:
f{x, v, z) =
or: f(x, v, z) = r(x, , z) + j i(x, v, z) = \f(x, y,
with:
M-\ N-\ 0-1 wz λλ r(x,y,z) = ∑∑∑R(",v,w)> cos > -+ + u=0 v=0 w=0 V M N O J) and,
The amplitude of f(x,y,z) is given by:
\f(x, y, z] - [{r(x, y, z)}2 + {i{x, y, z)}2 f
For the case where N=M=O, the numerical complexity of implementation is proportional to N3. There will be 2N3 trigonometric calculations, 2N3 real multiplication's, and 2N real additions.
Those experienced in the art will see the application of a three dimensional Discrete Cosine Transform for highly correlated video frames will yield optimal compaction in the density of the transformed video. The Discrete Sine Transform as well as other transforms can also be applied to the 3D structure defined by A.
Referring to figure 5, a schematic diagram of a 3D wavelet transform applied to the 3D matrix of pixels, is shown generally by numeral 50. The illustration shows a 3D wavelet dyadic sub-cube tree structure. In general, a 3D wavelet and / or a fractal algorithm may also be applied to the 3D transformation process to yield a multiresolution sub-cube with a dyadic sub-cube tree structure where a 3D Embedded Zerotree Wavelet (EZW) coding technique can be applied. In addition, an efficient DCT (FFT) expanded for 3D can be followed with entropy coding or code books for 3D spaces (i.e., 8x8x8, 16x16x16, etc.).
Referring to figure 6, a graph showing typical compression ratios estimated by the present invention for various types of media as a function of the frame depth (z) is shown generally by numeral 60.
The basic concept which is the subject of the present application, may be used in extending conventional transforms to 3D fixed frame depth using such approaches as fractals, NQ, DCT and wavelets. Furthermore, optimizations can be realized as a result of the human visual system response to contrast sensitivity and the adaption range of the eye due to brightness levels. Optimal 3D coding techniques may also be derived by extending present 2D coding methods such as Huffman coding, arithmetic coding, and vector or surface quantization coding. Although the compression requirements for such approaches is expected to be high, an efficient 3D variable frame depth decoder may be implemented in hardware on a desktop PC. Such a variable frame depth decoder may also be implemented using a neural network or the like.
In addition, an algorithm may be used for determining the optimal frame depth on the fly for the 3D transformation which is dependent on the video content of the frame to frame pixel correlation. For low frame to frame pixel correlation (or SΝR Methods), a scene change is detectable and the length of the 3D matrix of pixels is determined on the fly. In this regard, the curves of compression ratio effectiveness vs. frame depth for media type as shown in figure 6 may be developed to indicate the expected performance for the applications ranging from high action movies, television broadcasts, to white boarding. For lossless compression, ID, 2D or 3D entropy coding can be used to achieve >70% compression. For lossy compression, a 3D Pixel Quantization mask is applied before entropy coding to achieve larger compression ratios.
Referring to figure 7, a flow chart of the general steps implemented in 3D compression systems is shown by numeral 70. Similarly, figure 8 a flow chart of the general steps implemented in a 3D decompression system is shown generally by numeral 80. The descriptions in each block therein incorporated herein. Although the invention has been described with reference to certain specific embodiments, various modifications thereof will be apparent to those skilled in the art without departing from the spirit and scope of the invention as outlined in the claims appended hereto.