DESCRIPTION
METHOD FOR COMPRESSING AND DECOMPRESSING MOTION VIDEO
This application claims benefit of a prior filed copending U.S. provisional application, serial number 60/064,391, filed November 7, 1997.
TECHNICAL FIELD
This invention relates to apparatus and methods for compressing video data for transmission and for decompressing such data following reception, and, more particularly to such apparatus and methods in which blocks of video frames are compressed, transmitted, and decompressed.
BACKGROUND ART
FIG. 1 is a flowchart of a current process for video compression representing the current state of the art. Frames of moving video images are digitized or reduced to numerical form in a digitizing step 1. Then, in step 2, this data is stored in a dual-port memory, allowing the overlapped collection and processing of digitized data. In step 3, a determination is made whether to compress the digitized frame as a reference frame or a motion-compensated frame. The first step in compressing a reference frame is performance of a two-dimensional DCT (Discrete Cosine Transform) in step 4, with a transform function being applied to eight by eight sub-blocks of the reference frame. The transformed data is next quantized in step 5 through division by a fixed value representing the relative visibility of each DCT coefficient. The quantized data is entropy-coded in step 6, constructing a compressed reference frame 14 to be transmitted. The quantized data from step 5 is also dequantized in step 7 by scaling each quantized
DCT coefficient up by its relative visibility, and an Inverse Discrete Cosine Transform operation is performed on the dequantized data in step 8. The resulting reconstructed
reference frame 9 is identical to the reference frame that will be reconstructed in the decoder when the frame is received.
Non-reference frames are submitted to a motion estimation process in step 10 that finds, block-by-block, the best match between the reconstructed reference frame 9 and the digitized frame. The difference between the matched blocks is taken in the process of motion compensation in step 11. The difference between the matched blocks is quantized by an arbitrary value in step 12, and a vector describing the best matched block and the quantized difference between the blocks is entropy coded in step 13, constructing a compressed frame 15 to be transmitted. FIG. 2 is a flow chart of a conventional video decompression system operating in accordance with a current state of the prior art. In step 16, a decision is made of whether compressed data 14, 15 represents a reference frame 14 or a motion- compensated frame 15. A reference frame 14 is decompressed by performing entropy decoding in step 17, resulting in a quantized reference frame. The quantized reference frame is dequantized in step 18 by scaling each DCT coefficient by a constant representing its relative visibility. An Inverse Discrete Cosine Transform operation is then performed in step 19 on the dequantized frame, returning the reconstructed reference frame 20, which is also displayed in step 25.
Motion compensated frames 15 are entropy-decoded in step 21 , returning motion compensation vectors and dequantized error terms. The quantized error terms are dequantized in step 22, returning dequantized error terms. An Inverse Discrete Cosine Transformation operation is performed in step 23 on the dequantized error terms, returning reconstructed error terms. The motion compensation vectors are used to select the matched block of the reference frame 20 which is added to the reconstructed error terms to perform the motion compensation process in step 24. The resulting motion compressed frame 26 is then displayed.
Referring to FIGS. 1 and 2, a number of problems with this prior art process are encountered in step 11 , in which the motion occurring between frames must be estimated to compensate for motion. This method requires searches between frames to find matches. Since such searches can require significant time and processing resources, what
is needed is method for compression and decompression which does not require such searches. Furthermore, what is needed is a method for reducing the sensitivity of the overall sensitivity of the video system to noise and losses in the transmission process.
DISCLOSURE OF INVENTION
According to a first aspect of the present invention, there is provided a method for processing video data representing frames of motion video, including steps of applying a spatial frequency transform to data representing each frame within the video data and applying a temporal frequency transform to data representing sequential frames within a block of data representing a pre-determined number of the frames. On each frame, the segments to which the spatial transform is applied extend in horizontal and vertical directions, so that the spatial transform is performed in two-dimensional space. The temporal frequency transform is applied in a third dimension, time, to sequentially occurring frames within the block of data. Thus, the spatial and temporal frequency transforms are considered together to be a three-dimensional frequency transform, which forms a plurality of frequency coefficients representing frequencies occurring in the block of data.
The three-dimensional transform may be a discrete cosine transform (DCT) function, a wavelet transform, or another frequency-based mathematical transform. These types of transforms may be mixed, for example, with one type of transform being used for the spatial frequency transform, while another type of transform is used for the temporal frequency transform.
Video data is provided by an analog video source, which also provides a vertical blank interrupt signal. The video data is digitized in an analog to digital converter. Data representing individual video frames is moved into the transform process described above from a first two-port data buffer including a pair of buffer segments, with data representing one video frame being saved in one of the segments as data representing the previous video frame is read from the other segment. In this way, data is fed through alternating buffer segments, with the data storage and reading functions being initiated
in response to the vertical blank interrupt signal.
Data, in the form of frequency coefficients derived, within the transform process described above, from the block of data representing a pre-determined number of video frames is stored within a second two-port data buffer including a pair of buffer segments. Blocks of data are fed through alternating buffer segments, with data storage and reading functions being initiated in response to the occurrence of a pre-determined number of vertical blank interrupt signals, indicating that an entire block of video frames has been processed.
In accordance with another aspect of the present invention, there is provided a method for decompressing compressed video data representing frames of motion video.
This method includes steps of receiving the compressed video data from a transmission stream, converting the compressed video data into intermediate blocks of data, applying an inverse frequency transform to each of the intermediate blocks of data, and assembling uncompressed video data. Each intermediate block of data includes frequency coefficients representing a pre-determined number of frames of motion video. These frequency coefficients describe spatial variations in each frame within the intermediate block of data in vertical and horizontal directions and variations among sequential frames within the block of data. The inverse frequency transform provides an uncompressed block of video data representing spatial variations in horizontal and vertical directions on each of the pre-determined number of frames within the intermediate block of data.
BRIEF DESCRIPTION OF DRAWINGS
FIG. 1 is a flow chart of a conventional video compression system operating in accordance with a current state of the prior art;
FIG. 2 is a flow chart of a conventional video decompression system for decompressing video data transmitted from the compression system of FIG. 1 in accordance with a current state of the prior art;
FIG.3 is a flow chart of a video compression system operating in accordance with the present invention;
FIG. 4 is a flow chart of a video decompression system operating in accordance with the present invention for decompressing video data transmitted from the compression system of FIG. 3;
FIG.5 is a block diagram of a process of capture and transformation of video data in the video compression system of FIG. 3;
FIG. 6 is a pictorial representation of the quantization of transformed video frames in the compression system of FIG. 3;
FIGS.7, 8, and 9 are graphical representations of first, second, and third methods, respectively, for quantizing data in the video compression system of FIG. 3; FIG. 10 is a flow chart of a method for entropy coding and transmission of transform-based video within the video compression system of FIG. 3;
FIGS. 11, 12, and 13 are block diagrams showing first, second, and third sections, respectively, of a video compression system built in accordance with the present invention and operating with the processes of FIG. 3. FIGS. 14 and 15 are block diagrams showing first and second sections, respectively, of a video decompression system built in accordance with the present invention and operating with the processes of FIG. 3.
BEST MODE FOR CARRYING OUT THE INVENTION
FIG.3 is a flow chart of a video compression system operating in accordance with the present invention. Frames of moving video images are digitized or reduced to numerical form in a digitizing step 27. This data is stored in dual -port memory in step 28, allowing overlapped collection and processing of digitized data. A 3D (three dimensional) frequency transform is applied to a collection of frames in step 29. The transformed data is quantized in step 30 by reduction by a value representing the relative spacio-temporal visibility of each frequency coefficient. The quantized data is entropy-coded in step 31 , constructing compressed frame data 32 to be transmitted. The three-dimensional transform may be a discrete cosine transform (DCT) function, a wavelet transform, or another frequency-based mathematical transform. These types of
transforms may be mixed, for example, with one type of transform being used for the spatial frequency transform, while another type of transform is used for the temporal frequency transform.
FIG. 4 is a flow chart of a video decompression system operating in accordance with the present invention. The compressed frame data 32 is decompressed by performing entropy decoding in step 33, resulting in a quantized collection of frames, which is dequantized in step 34 by scaling each frequency coefficient by a constant representing its spacio-temporal visibility. An inverse frequency transform operation is then performed in step 35 on the dequantized collection of frames, returning the reconstructed collection of frames 36 for display.
FIG. 5 is a block diagram of a process of capture and transformation of video data in the video compression system of FIG. 3. The compression engine is driven by the output of a digitizing device 37, which reduces each frame of analog video to a digitized frame 38 of numerical values representing the contents of that frame. These frames are stored in a two-port data buffer 39, having a pair of buffer segments 39a, allowing simultaneous collection of uncompressed frames generated by the digitizer while processing cached frames 40 through the 3-D frequency transform 41, the output of which is a collection of 3-D frequency transformed frames 42. This frequency transform 41 is called three-dimensional (3-D) because transform functions are applied in two spatial dimensions, horizontal and vertical, and also in a temporal dimension among successive cached frames 40.
FIG. 6 is a pictorial representation of the 3-D frequency transformed frames 42, which are processed through a quantization engine 43 to form a collection of quantized, transformed frames 45. FIG. 7 is a graphical representation of a first method of quantization, which has been described in the prior art for use in spatial quantization, In this method, transformed coefficient ratios are reduced through division by a fixed value representing the relative visibility of each frequency coefficient. The fixed value in this example is q0. Transformed coefficients in the range (-q( 2 to q,/2) are quantized to the value 0; coefficients in the range (-'i jl to -qg/2 ) are quantized to the value -1; coefficients in
the range (qø/2 to 3q( 2) are quantized to the value 1; etc.
FIG.8 is a graphical representation of a second method of quantization, which has also been described in the prior art for use in spatial quantization. In this method, two fixed values are used to quantize each frequency coefficient. The first of these values is used to reduce the frequency coefficient values centered around the origin, while the second of these values is used to reduce by division all other frequency coefficient values. In the example of FIG. 8, the central quantizing value is 3q0, while the fixed visibility value is q0, so that transformed coefficients in the range -^q^/l to 3q( 2) are quantized to 0. Coefficients in the range Sq^l to -3qo/2) are quantized to -1; coefficients in the range (-7q0 to -5q0) are quantized to -2; coefficients in the range (3q( 2 to 5q,/2) are quantized to 1; coefficients in the range (5q,/2 to 7qo/2) are quantized to 2; etc.
FIG. 9 is a graphical representation of a third method of quantization, which is novel. In this method, three fixed values are used to quantize each frequency coefficient. The first value to reduce the frequency coefficient values centered around the origin. Quantization of the other terms is done by exponential growth of the range of coefficients to be quantized, reflecting the logarithmic response of the human eye. The second value is used as a base value of the exponential, and the third value is used to determine the rate of growth of the exponential. In the example of FIG. 9, the central quantizing value is 3q0, while the fixed visibility value is q0 and the exponential growth rate is 1.4. Transformed coefficients in the range (-3q(/2 to 3q( 2) are quantized to the value 0; transformed coefficients in the range (5q{ 2 to 3q( 2) are quantized to the value 1 , coefficients in the range (9qc 2 to 5q( 2) are quantized to the value 2; coefficients in the range (9q0 to 9q( 2) are quantized to the value 3; etc.
FIG. 10 is a flow chart showing a novel technique for the entropy coding and transmission of transform-based video. Each collection of quantized frames 45 is composed of a fixed number of frequency components. The frequencies to be transmitted are segregated into bands and processed through an entropy coder 46, which generates symbols from the frequency components, looks up the value to be encoded in a code table 47 for the band being compressed, and inserts the bit stream found into the packet 48 being constructed. Each band is associated with its own statistical encoding table 47 and
destination packet 48.
FIGS . 11-13 are block diagrams showing sections of a video compression system embodying the current invention. Referring first to FIG. 11, a first section of the video compression system is driven by analog video source 49. Analog video data is digitized within an analog to digital converter 50 and captured within a two-port buffer 51 , which includes a pair of buffer segments 51a. Frames are clocked into and out of the dual-port buffer 51 by means of a video blank interrupt signal driven from the analog video source 49 along line 51b. Eight by eight sub-blocks of each captured digitized video frame are then transformed through a two-dimensional transform engine 52, which may, for example, apply the 2-D frequency transform to form a transformed frame 53.
FIG. 12 is a block diagram of a second section of the video compression system embodying the current invention. In this second section, the transformed frame 53 from the first section of FIG. 1 1 is treated as a single component in a time domain, which is transformed into a frequency domain within a frequency transform circuit 54. A resulting eight by eight by eight transformed block is accumulated into a buffer 54a holding the data of eight frames transformed in this manner. As described in reference to FIG. 11 , each of these eight frames has previously been transformed into eight by eight spatially differentiated segments. The buffer 54a forms half of a two-port memory 55. Data is alternately driven into each of the buffers 54a in an alternating manner according to a signal on line 56, which occurs with every eighth vertical blank interrupt from the analog video source 49 (shown in FIG. 11). As data is driven into each buffer 54a, data from the other buffer 54a is quantized within a quantizer 57, using parameters dictated by a quantization table 58. Alternative methods of quantization have been discussed above in reference to FIGS. 7-9. The output of quantizer 57 is a quantized eight by eight by eight data block 59.
FIG. 13 is a block diagram of a third section of the video compression system embodying the current invention. The quantized data block 59 from FIG. 12 is divided by frequency within a band filter 60 into bands of frequency coefficients 61. The coefficients in each of these bands are independently collected as symbols 62, which are formed into streams of symbols 63 for encoding. These symbols are next looked-up in
an entropy coding table 64 by an entropy encoder 65, which inserts bitstrings into data packets 66 which form a transmission data stream 67. The transmission data stream 67 may be sent, for example, to a remote system for decompression, or it may be sent to a storage device, such as a hardfile, within the computer system in which it is generated, to be stored for later use.
FIGS. 14 and 15 are block diagrams showing the operation of sections of a video decompression system embodying the current invention. Referring first to FIG. 14, which shows a first section of the video decompression system, data from the transmission stream 67, which may, for example, be sent from a remote system or from a storage device within the computer system performing the decompression, is collected into received data packets 68, which are separated from one another according to the band of frequencies represented within the individual packets. These separated packets are driven into entropy decoders 69, each of which decodes information within the packets by referring to an entropy coding table 70. The output of each entropy decoder 69 is a stream of decoded symbols 71 , which is subsequently converted within a symbol decoder 72 into frequency coefficients. These frequency coefficients are subsequently accumulated into a cache 73 of eight by eight by eight blocks. As these individual blocks in the cache 73 are filled, they leave the cache 73 as completed eight by eight by eight blocks 74. The transmission data stream 67 may be sent, for example, to a remote system for decompression, or it may be send to a storage device, such as a hardfile, within the system in which it is generated, to be stored for later use.
FIG. 15 is a block diagram showing a second section of the video decompression system embodying the current invention. In this section, each completed block 74 of quantized frequency coefficients is dequantized with a dequantizer 75, making reference to a quantization table 76, into blocks of dequantized frequency coefficients 77. In block
78, a three-dimensional inverse frequency transform is performed on each block of frequency coefficients 77, returning eight by eight by eight uncompressed block of video data 78a. These blocks of data 78a are then cached within ajitter buffer 79. The contents of the jitter buffer 79 are fed, in an alternating manner, into the two sides of a two-port buffer 80 having a pair of buffer segments 80a. Each buffer segment 80a of the two-port
buffer 80 holds data representing eight video frames. Switches 81 clock data into and out of the buffers within two-port buffer 80 every eight frame times. As a block of frames is assembled into one side of the two-port buffer 80, data is driven from the other side into a digital to analog converter 82, in which the digital data is converted to an analog signal driving the analog video device 83, with one reconstructed frame of data being driven into the analog video device 83 upon receipt of each vertical blank interrupt therefrom
The present invention has an advantage over the conventional method, described in reference to FIGS. 1 and 2, of not requiring searches, and of consequently being relatively fast and efficient. The conventional method has to look from one frame to another for best matches. The present invention also has an advantage of better reducing temporal redundancy than the convention method, so that better compression ratios can be achieved. The present invention also has an advantage of having a simpler algorithm, so that it can be more easily implemented. The symbol connection encoding scheme and encoding tables as specific to each band of frequencies to be encoded, allowing greater compression rations than those achieved by fixed encoding schemes. Furthermore, the algorithm of the present invention is symmetric, so that it is particularly suitable for realtime two-way communication, and the encoding scheme of the algorithm is insensitive to transmission noise. While the invention has been described in its preferred form or embodiment with some degree of particularity, it is understood that this description has been given only by way of example and that numerous changes in the details of construction, fabrication, and use, including the combination and arrangement of parts and method steps, may be made without departing from the spirit and scope of the invention. For example, other quantization methods and other entropy encoding methods may be used within the present invention.