WO1997002705A1

WO1997002705A1 - Method and apparatus for hierarchical representation and compression of data

Info

Publication number: WO1997002705A1
Application number: PCT/AU1996/000426
Authority: WO
Inventors: Donald James Bone; Franklin George Horowitz; Jan Paul Veldkamp
Original assignee: Commonwealth Scientific And Industrial Research Organisation
Priority date: 1995-07-05
Filing date: 1996-07-05
Publication date: 1997-01-23
Also published as: AUPN399395A0

Abstract

The present invention relates to a method and apparatus for hierarchically representing data P (31). The data P (31) comprises an MxN array of datapoints, where M and N each are an integer greater than one. The data P (31) are sub-band coded to provide a plurality of coefficient sets (41, 51, ..., 81) where each coefficient set (41, 51, ..., 81) has a different scale (σ = 1, 2, ..., 5). The coefficient sets C (41, 51, ..., 81) are then interscale coded at least once to obtain a set of coefficient vectors and residual vectors R to represent the image data P (31). Preferably, the interscale coding involves primary interscale coding the plurality of coefficient sets C (41, 51, ..., 81) and secondary interscale coding the primary interscale coded plurality of coefficient sets (41, 51, ..., 81). Preferably, the sub-band coding involves base image coding that is applied at least twice. The base image coding can be selected from the group consisting of a Haar transform, a Fourier transform, and a Sample-Difference transform, for example. The set of coefficient vectors and the residual vectors are then quantised and encoded using run-length encoding and entropy encoding.

Description

METHOD AND APPARATUS FOR HIERARCHICAL REPRESENTATION AND COMPRESSION OF DATA Field of the Invention

The present invention relates generally to data representation and compression/decompression, and more particularly, to methods and apparatuses for compressing and decompressing image data. Background of the Invention

The representation and/or the compression of data, such as images or sound, can broadly be divided into two general classes. The first class, commonly called "lossless" compression, allows for an exact reconstruction of the original data from the compressed (or represented) data in the reconstruction or decompression process. Lossless compression normally proceeds by removal of real redundancy in the data. Two broad categories of lossless compression include: entropy encoding and dictionary techniques. Entropy encoding techniques assign variable bit-length codes for each piece, or unit, of fixed-length data provided as input dependent upon a statistical analysis of the input data. In broad terms, such techniques encode frequently occurring data using a shorter variable-length code, whereas infrequently occurring data are assigned longer variable-length codes. Examples of entropy encoding techniques include Huffman coding, Shannon- Fano coding, arithmetic coding, etc.

The second class of lossless compression techniques are referred to as "dictionary" or substitutional techniques. In dictionary techniques, a sequence of data is replaced by a token, or other appropriate type of indexing reference, that refers to a previous occurrence of the current data sequence contained in the input data stream and which is smaller in size. Thus, previous occurrences of data sequences found in the input file, which can be represented by a token, or the like, act as a "dictionary" for subsequent, redundant occurrences of the same data sequences (or portions thereof). Examples of dictionary/substitutional techniques include the LZ77 and LZ78 techniques developed by Lempel and Ziv, and the LZW technique developed by Terry Welch. Still another conventional compression techniques is run length encoding, which is well-known to persons skilled in the art of compression methods.

In contrast to lossless techniques, a second class of techniques, known as "lossy" techniques represent or compress data in such a manner that, upon decompression or reconstruction, the decompressed or reconstructed data vary in relation to the original data. That is, information is lost which cannot be recovered. Lossy techniques are typically used in applications where it is not necessary to exactly reconstruct the original data, but a close approximation to the original data, in most cases, is suitable. Such techniques compress input by taking advantage of known perceptual limitations. For instance, lossy techniques are commonly used to compress/decompress image data. In particular, compression is achieved by exploiting limitations of the human eye in perceiving fine details of colour and contrast. One well-known conventional lossy technique for compressing images is JPEG compression defmed by the Joint Photographic Experts Group (JPEG) which has been widely adopted for compressing still images. The JPEG compression technique takes advantage of limitations of the human eye to effect compression of an input still image. The JPEG compression and decompression techmques are generally illustrated in Figs. IA and IB. Input data 1 of a first signal, or image, space is provided as input to a Discrete Cosine Transform (DCT) block 4 to transform, or map, the input data 1 to a second signal space. It will be apparent to a person skilled in the art that the input data 1 comprises a still image consisting of a plurality of pixels organised essentially into a two-dimensional array. That is, the input data 1 consists of an image arranged as an MxM block shown in Fig. 2. As is well known in the art, the input data 1 is parsed typically into 8x8 blocks of data before being transformed into the second signal space by the DCT block 4. The input data 1 is parsed into a sequence of 8x8 pixel blocks 22, 23, 24, etc. The sequence 28 of blocks 22, 23, 24, etc. are then provided on a block- by-block basis to the DCT block 4. The coefficients 5 contain the information of the input data 1 organised according to the frequency content of the input data 1.

Further the input data 1 can be represented in any of the well-known image spaces such as greyscale space, RGB colour space, the YUN colour space, etc. The JPEG compression technique illustrated in Fig. IA can comprise a colour space transformation block (not shown) which transforms the input data 1 from a first colour space to a second colour space before providing the input data 1 to the DCT block 4. The coefficients 5 produced by the DCT block 4 are provided to a coefficient quantisation block 6 (Fig. I ) in which compression is effected by quantising the coefficients 5. Typically, a number of different quantisation methods are used to better take advantage of the characteristics of the input image 1 and the coefficients 5 produced by the DCT block 4 in relation to limitation of visual perception. The quantisation methods, for example, include Differential Pulse Code Modulation (DPCM), as well as other well-known quantisation methods.

The quantised output 7 of the coefficient quantisation block 6 is then provided to a lossless encoding block 8. In the JPEG technique, lossless encoding is implemented using either Huffman coding or arithmetic coding. Thus, the lossless encoding block 8 attempts to further compress the quantised coefficients 7 to produce the compressed data 11 output by the JPEG compression technique. The JPEG decompression technique is illustrated in Fig. IB in which the compressed data 11 is provided to a decoding block 13 which losslessly decodes the compressed data 11. The losslessly decoded output 14 is provided to an Inverse Discrete Cosine transform (IDCT) block 15 which transforms or maps the decoded output 14 from the second signal space to the first signal space to produce the decompressed data 18. As is well-known in the art, the decoded output 14 is dequantised by a dequantisation block (not shown) to reconstruct the coefficients before being applied to the IDCT Block 15. Further, in the event that a colour space transformation was applied to the input data 1 during compression, analogously the output of IDCT block 15 would accordingly be provided to an inverse colour transformation block (not shown) to generate the decompressed data 18 which reconstructs the original input data 1.

While the JPEG compression technique utilises the Discrete Cosine Transform, there are a number of other well known block transforms that can be utilised in block 4 of Fig. IA (and correspondingly in block 15 of Fig. IB) to implement other analogous compression techniques. Such block transforms include the Karhunen-Loeve, Fourier, Hadamard and Haar transforms, etc.

A further standard, known as the MPEG standard developed by the Motion Pictures Expert Group, has also become a popular form of video image compression. This technique is used in particular to compress moving pictures and audio. The MPEG standard also utilises the Discrete Cosine Transform to the compression of images. This technique compresses moving picture or video by taking advantage of the redundant information contained between frames by predicting motion from one frame to another. Yet another conventional compression technique is fractal compression. This involves carrying out a sequence of transformations on the input data that cause the data to contract to an attractor, or fixed point, as is well-known in the art that approximates the data and thereby take advantage of self-similarity in the image. In this technique, it is the transformations that are stored to compress the data. That is, to reconstruct the input data, the known transformations, which have a specific nature and content, are applied to the attractor in an iterative manner to obtain a close approximation of the original input data.

A significant disadvantage of this conventional technique is the requirement to extensively search the input data over possible domains to minimise the error in representing a given range block. Even though this time consuming and complex search is carried out, fractal compression techniques typically are not able to exactly represent the input data. Other forms of representation of data and methods of compression are known. Each form of compression has its own performance trade-offs with a number of different performance factors being relevant, including compression factors, time to compress and decompress, and memory utilisation in compression and decompression, etc.

Although the discussion will hereinafter be continued in relation to images, it will be readily apparent to those skilled in the art that the present invention is not limited to images and is readily applicable to other forms of data. Furthermore, although two dimensional data is referred to in the following description, higher dimensional data sets can easily be accommodated by those skilled in the art with analogous methods. Summary of the Invention

In accordance with a first aspect of the invention, there is provided a method of hierarchically representing data, wherein the data comprises an MxN array of datapoints, M and N each being an integer greater than one, the method comprising the steps of: sub-band coding the data to provide a plurality of coefficient sets, each coefficient set having a different scale; • and interscale coding the plurality of coefficient sets at least once to obtain a set of coefficient vectors and residual vectors to represent the image data. Preferably, the interscale coding comprises the steps of primary interscale coding the plurality of coefficient sets, and secondary interscale coding the primary interscale coded plurality of coefficient sets. The interscale coding represents higher- order coefficients as linear combinations of lower-order coefficients to provide the set of coefficient vectors and the residual vectors. The step of sub-band coding comprises base image coding and the base image coding is applied at least twice and can be selected from the group consisting of a wavelet transform, a Haar transform, a Fourier transform, and a Sample-Difference transform. The method can further comprise the steps of quantising the set of coefficient vectors and the residual vectors, and encoding the quantised set of coefficient vectors and the residual vectors. Optionally, the step of encoding comprises the further steps of run-length encoding the set of coefficient vectors and residual vectors, and entropy encoding the run-length encoded vectors, wherein the entropy encoding is selected from the group consisting of Huffman coding and arithmetic coding.

In accordance with a second aspect of the invention, there is provided an apparatus for hierarchically representing data, wherein the data comprises an MxN array of datapoints, M and N each being an integer greater than one. the apparatus comprising: means for sub-band coding the data to provide a plurality of coefficient sets, each coefficient set having a different scale; and means for interscale coding the plurality of coefficient sets at least once to obtain a set of coefficient vectors and residual vectors to represent the image data.

In accordance with a third aspect of the invention, there is provided a method of compressing digital image data, the method comprising the steps of: iteratively base image coding the image data to provide a plurality of coefficient sub-images, each sub¬ image having a different scale; interscale coding the plurality of coefficient sub-images to provide coefficient and residual vectors; quantising the coefficient and residual vectors; and encoding the quantised coefficient and residual vectors to provide hierarchically compressed image data. The step of interscale coding preferably further comprises the steps of primary interscale coding the plurality of coefficient sub-images, and secondary interscale coding the primary interscale coded coefficient sub-images. The step of base image coding can optionally comprise the steps of projecting blocks of the image data onto a number of basis vectors, inteφolating a set of node values for a predetermined datapoint of each block, and orthogonalising blocks containing the predetermined datapoint of each block.

Alternatively, the step of interscale coding the plurality of coefficient sub-images comprises the further steps of predicting HL, LH and HH sub-images dependent upon a lowest LL sub-image to produce prediction kernels, and generating sub-image residuals dependent upon the HL, LH and HH sub-images and the prediction kernels, wherein primary interscale data comprise the lowest LL sub-image, the prediction kernels and the sub-image residuals.

In accordance with a fourth aspect of the invention, there is provided an apparams for compressing digital image data, comprising: means for iteratively base image coding the image data to provide a plurality of coefficient sub-images, each sub¬ image having a different scale; means for interscale coding the plurality of coefficient sub-images to provide coefficient and residual vectors; means for quantising the coefficient and residual vectors; and means for encoding the quantised coefficient and residual vectors to provide hierarchically compressed image data. Brief Description of the Drawings

Figs. IA and IB is a block diagram illustrating the JPEG compression and decompression techniques, respectively;

Fig. 2 is a block diagram illustrating parsing of input data; Figs. 3 A and 3B are diagrams illustrating an input image and a corresponding coefficient image;

Figs. 4A and 4B are diagrams illustrating base coding of the input image of Fig. 3A:

Fig. 5 is a block diagram illustrating scales of base encoding; Fig. 6 is an overview of the base encoded input data of Fig. 3A;

Fig. 7 is a flow diagram illustrating the method of the first embodiment;

Fig. 8 is a flow diagram illustrating further steps of quantisation and encoding;

Fig. 9 is a tree diagram illustrating the hierarchy of levels shown in Figs. 5 and

6;

Fig. 10 is a diagram illustrating a second embodiment; Fig. 11 is a flow diagram illustrating a third embodiment for encoding data; Fig. 12 is a flow diagram illustrating a method of decoding data encoded using the method of Fig. 11 ; and Fig. 13 is a block diagram of a general purpose computer that can be used to implement the embodiments of d e invention. Detailed Description

The embodiments of the invention provide an apparams and a method for representing and/or encoding data using hierarchical transforms. In the following description, numerous specific details, such as block sizes, lossless encoding techniques, colour spaces, etc., are described in detail to provide a more thorough description of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the embodiments of the invention can be practised without these specific details. In other instances, well-known feamres have not been described in detail so as not to unnecessarily obscure the description of d e embodiments of the invention.

The image or data is represented and/or encoded using hierarchical transforms that take advantage of the self similarity between images of different scales as will be described. Using a hierarchical encoding approach, the embodiments of the invention provide effective methods that are able to take advantage of local self similarity in input data, and are also able to exactly represent the input data, or image, where desired.

The hierarchical namre of the embodiments leads to a scale-based ordering of the data which also lends itself to selective quantisation on the basis of perceptual redundancy. Thus, appropriate quantisation of the resulting representation can achieve the desired compression in a controlled manner. Image Representation

The class of transforms utilised in the embodiments of the invention allows for a number of variations. However, operation of the embodiments requires that input data 31 , illustrated in Fig. 3 A, and generally denoted as image (P), be partitioned into a number of small blocks. For purposes of illustration only, the image (P) 31 consists of 64 pixels arranged in an 8x8 two-dimensional array. It will be apparent to a person skilled in the art that the input data can have various levels of resolution, eg. 512x512, 640x480, 1024x768, 1280x1024, etc. without departing from the scope of the invention. The image (P) 31 is preferably partitioned into small blocks comprising 2x2 blocks of pixels.

The image (P) 31 includes a 2x2 block 34 comprising pixels P₀ , P_{Q i}, Po , and Po_,3 and a second such block 35 comprising pixels P_{1 0}, P- , , P- ₂, and P* ₃. In a similar manner, the remaining pixels of the image (P) 31 are partitioned into 2x2 blocks. The last 2x2 block 38 comprises pixels P_{15 0} to P15 3. The pixels contained in each of the small blocks 34, 35, ..., 38 are preferably processed as a single vector of pixels ({P_0>0, Po.i. ^pι,2> ^{and p}0,3))- ^Tnus> ^{if each} of the 2x2 blocks 34, 35, ..., 38 contained in the image (P) 31 is treated as an array of pixels, the image (P) 31 can in mrn be represented by an array of tiiese small blocks. Thus, the image (P) 31 can be defined by the following expression:

{P_{k i} I (k = 0,...,K-l), (i = 0 I-l)}, (1)

where the k represents the block number, i represents the pixel number of the particular block k, and the input image (P) 31 consists of Kxl pixels. Each pixel of the image (P) 31, shown in Fig. 3A, is enumerated with corresponding subscripts, k and i.

Each block 34, 35, ..., 38 of the image (P) 31 is expanded using a set of orthonormal basis vectors. That is, a block transform is performed on each 2x2 block. The namre of the basis vectors is variable and is not set by the overall embodiments of the invention. The basis vectors can be predetermined and fixed. Well-known transforms such as the Haar, Hadamard, Fourier, or Karhunen-Loeve transforms, etc. , can be utilised. Alternatively, they can be derived from the input data or image in other ways. Non-orthonormal basis vectors can also be used since the primary consideration is that the expansion produces a sub-band like encoding of the image (P) 31. The 2x2 block size is preferable in relation to the speed at which transformations can be applied to the block. However it will be apparent to a person skilled in the art that the size of the small block can be varied.

As will be described, the subband coding does not have to result from a block partitioning of the image (P) 31. The only requirement in relation to inter-scale coding is that the transformed representation be in the form of a set of equal-sized subband images. In this way, the embodiments advantageously utilise the redundancy between scales by estimating the higher-order subbands in the decomposition from the low-order subbands. As will be described, wavelet transforms, for example, can provide the subband strucmre, but do not result from a block partitioning of the image (except in the case of the Haar basis). Any representation of the image which can be considered to be a set of equally sized coefficient images from which the image can be exactly reconstructed, can serve as the Base Image Coding mechanism, described below. For example, it can be implemented using a more general wavelet basis in which the basis vectors for neighbouring blocks overlap spatially. A separable orthogonal wavelet basis can be used such as the Daubechies' wavelets, which provide four subbands from each scale analysis of the image (P) 31. The corresponding basis functions could have larger support than 2x2 pixels.

The embodiments of the invention can also be implemented using less complex schemes. For example, a 2x2 block can be used in which the four pixel values in the block act as the subband components (the pixel basis). This results in four similar images as the subbands. Thus, it cannot be considered to be a frequency subband representation. This works well in combination with the primary interscale coding, although quantisation artefacts tend to be localised.

Another approach based on 2x2 blocks uses the top left pixel in the 2x2 block as the low frequency (LL) components, described below, and derives the other three components from the difference between the true pixel values and a smooth interpolation of the LL components. This "sample and difference" approach corresponds to a non-orthogonal basis expansion.

A Direct Cosine Transform could also be used which will produce 8x8 subbands. The lowest frequency 4x4 subbands are then used to predict the other subbands. The lowest 4x4 subbands are then decoded and reblocked into 8x8 blocks that produce a new set of 8x8 subbands. This provides a subband hierarchy based on the DCT. Because redundancy is removed in the interscale coding step and some of the low frequency interblock redundancy is capmred, this representation may be more compressive than the simple DCT. Base Coding The process of image decomposition that will be described is known as "Base

Coding". In an original image (P) 31 shown in Fig. 3 A, each pixel P-, ^• can be represented as follows:

where M represents the number of basis vectors. Thus, each pixel P_kιi is defined as the sum of weighted basis vectors in which the i^m component of the m^m basis vector for the k^m block has the value 2ft κ,ι . . As the basis vectors are known, each pixel can be represented by a set of coefficients:

{C ^m) I k = 0 ... K-l} , (3) where each coefficient is expressed for the corresponding m^th basis factor of the k^th block.

The coefficient image, or representation, (C) 41 shown in Fig. 3B has 64 coefficients C (m) that are organised into four sets of coefficients (ie, C<⁰⁾ ,C<^π ,C<²⁾ , C<³⁾) 43-46 for corresponding basis vectors (m=0,...,3). In particular, coefficient set 43 comprises 16 coefficients for the 0^th basis vector (m=0) for the 0^th through 15^th blocks (k=0,...,15) of the input data (P) 31. Thus, the first coefficient ⁰- contained in coefficient set 43 is the coefficient for the 0^th basis vector of the 2x2 block 34, which comprises pixels P_{0 0} to P₀ 3 as shown in Fig. 3A. Similarly, the coefficient sets 44, 45 and 46 have the corresponding first, second and third basis vector coefficients C_f ¹⁾, ²-¹, and C_f ³⁾ for the block 34 in the upper, left location of each coefficient set 44-46. For the 2x2 block (k=0) 34, the coefficients C₀ ⁽°⁾-C₀ ⁽³⁾ contained in coefficient sets 43-46, respectively, are determined using Equations (2) and (3) as shown in Fig. 4A. The input image (P) 31 and the coefficient sets 43-46 are indicated by boxes in

Figs. 4A and 4B, and the coefficient C₀ ^(m) in each coefficient set 43-46 is indicated by a corresponding arrow (m=0, ..., 3) extending from the 2x2 block 34. Coefficients C-^(m) for 2x2 block (k= 1) 35 are generated similarly as shown in Fig. 4B. This process is repeated until the entire input image 31 is expanded using the basis vectors. The first basis vector (m=0) of the four basis vectors (m=0, ..., 3) is preferably chosen to represent the low frequency characteristics, or behaviour, of the input data, or image, (P) 31 shown in Fig. 3A. Thus, the resulting coefficient set, or subimage, 43 shown in Fig. 3B for the C⁽°⁾ case is similar in characteristics to the original image (P) 31 , but at one half of the scale. To simplify discussion of various levels in the hierarchy, an index σ can be defined to be σ = -log2 (scale). The image (P) 31 can be indicated in this connection by σ = 0 and the coefficient image 41 by σ = 1. For the first level of the hierarchy (σ= 1), there are four sets of coefficients C(0), C(!), C(²)' and C(³) in coefficient image 41.

The first of the four basis vectors (m) of a decomposition are chosen to capmre the low frequency behaviour of the "image" that is being operated on (this can be either an original image or a coefficient image) so that the resulting coefficient image will be very similar to the original image, but at one half the scale. The coefficient set C⁽⁰⁾ 43 captures the low frequency components, low-low components (LL). of the image (P) 31 in both the x and y directions. The three remaining coefficients sets 44, 45 and 46 of coefficient image 41 contain the higher frequency details of the original image (P) 31. The higher frequency details consist of the high x-low y (HL). low x-high y (LH). and high x-high y (HH) frequency characteristics, respectively. To reconstruct the original image (P) 31 from the coefficient image 41, for each block, weight the basis blocks by the corresponding coefficients and add them, and repeat the process for all blocks.

For example, the pixels within block k of the original image (P) 31 of Fig. 3 A are determined as follows:

The coefficients sets 43-46, for a given basis vector m, shown in Fig. 3B can, in turn, be treated as images. That is, each coefficient image 43-46 can be further divided into small blocks (eg. 2x2 blocks) and expanded using a set of basis vectors in the same manner as described above in relation to the original image (P) 31. Each coefficient image 43-46 is expressed as follows:

where k' is the block number for the small block in the coefficient image and mn indicates the basis vector (eg, B⁰⁰). Expansion of such a coefficient image 41 using the basis vector ⁽-^πm produces four further coefficient images 53-56. These coefficient images 53-56 are one quarter the scale (σ = 2) of the original image (P) 31 shown in Fig. 3A.

The four sets of coefficients 53-56 shown in Fig. 5 are denoted with the index C(0n) w ere n is the respective basis vector. The coefficient set (C⁽°°⁾) resulting from the first basis vector (m = 0 ) captures the low frequency x and y components of the coefficient image 43. The other coefficient sets 54-56 are utilised to capmre the higher frequency components of the coefficient image 43. From C(0), four 1/4 scale (σ=2) images C(0°), C(0D, C(⁰²)> and C(°³) are derived, as well as CdO), cG-l), C(¹²), and C(¹³), etc. In the next level of the hierarchy (σ=3), the four images C(000), C(001)_; c(°⁰²)> and C(⁰⁰³) are derived from image 61. If the residuals from the interscale coding of the other subbands contain redundant information, the Base Coding and Primary Interscale Coding can be applied to the "images".

The above process can be repeatedly applied to each of the coefficient images. For example, Fig. 9 illustrates the decomposition process for the LL subband 43 of the coefficient image 41 (σ= 1) of Fig. 3B into four further levels of coefficient images 51, 61 , 71 and 81 (σ = 2, ..., 5) in the manner described above in relation to Fig. 5. The hierarchical process of decomposition from the σ= l level to the σ =5 level is illustrated in Figs. 6 wherein the decomposition of the coefficient images 41, 51 , 61, 71 , 81) is shown. Primary Interscale Coding After base coding the image (P) 31, the resulting data is interscale coded.

Examination of the various coefficient images (eg. 53-56 of image 51) indicates that there is a strong resemblance between some of the coefficient images. For example, the image C(01) 54 often bears a strong resemblance to coefficient image C(--O). Similarly, C^²) often bears a strong resemblance to C^ ^). Any two subimages having indices which are a permutation of each other tend to have some similarity between them.

Thus, there is clearly redundant information contained in the higher order coefficient images. As the number of coefficient images increases in an exponential fashion with increasing values of σ, the images can be encoded at a higher level into the hierarchy to increase the number of self-similar blocks.

For example, the higher order coefficient images c(-^mn) where l≠O can therefore be expressed as linear combinations of the lower order coefficient images C(°J^k) as follows:

mn where RO¹¹¹¹-) is the residual with respect to the partial expansion of the coefficient image C(-^mn). The expansion need not be complete and, for a given class of images, a small subset of the coefficient images C πⁱn) can sufficiently represent a given coefficient image. When an adequate representation of the coefficient image c(-^mn) where l≠O is obtained, the residual RO¹¹¹¹¹) has a much lower information content than the corresponding coefficient image C^¹¹-¹-).

After representing the higher order subimages using a set of coefficients and their residuals, the lower-order sub images C^^jk) can be encoded down a further level in the hierarchy. The coefficient images derived from subimage C(QJk) are separated into the C(00jk) images and the C^-¹¹¹¹¹) (l≠O) coefficient images. Therefore, the coefficient images C^lnⁱn) (l≠O) images can be represented as follows:

O/imi ) _ V"¹ ( l) /->(00»iH ) , n(0/»»l )

To effect greater compression, optionally an orthogonal basis for the C(00jk) images can be derived and the Equation (7) can be expressed as an expansion on these basis vectors. Secondary Interscale Coding

The number of coefficients a

is the same as the number of coefficients a ^ produced from the expansion of C(^lmn) of the previous level of the Interscale Coding. Furthermore, at each level of the coding, the retained coefficient image is essentially a reduced copy of the original since the coefficient image contains the low frequency components of its corresponding higher level coefficient image. The values of the coefficients a ^ ' have some similarity to me coefficients a ^ from the previous level in the hierarchy. Therefore, a second level of Interscale Coding can produce an even more compact representation of the image.

This can be achieved by exactly representing the lowest level of coefficients and then encoding the other levels in terms of the levels below them. For example, for the sets of coefficient vectors a ,

α (²), a ), the coefficient vector o (³) can be exactly stored or otherwise remembered, and the following orthogonalisation process can be used to express the other vector quantities in relation to a ):

£ ⁽²⁾ = <x ₂₃g⁽³> - ,(2) (8)

where p^--* is the residual and the coefficient α₂₃ is:

a . a ⁽³⁾ ^α23 = (9) a (3)

Analogously, αj-⁾ is expressed as follows:

where

and

Finally, j°⁾ is expressed as follows:

, (13) I S ¬

where

α (0) . _fl (3)

C' 03 (14) α (3)

(^g , (^w0)-α₀₃ a (^w3)) \ .- „(2)

£ ^α 02 =- (15)

,(2)

and

,( w0)_ ,(3)_

( β α (2⁾^ „(-)

03 β^W-« 02 £^W). JB'

« oι =- (16)

,⁽D

In this form, assuming that there is some similarity between the original vectors, the vectors could be encoded as the coefficients α₂₃, α₁₃, α₁₂, α₀₃, α₀₂, α₀- , and the more compactly representable vectors g '

,£ ■

The embodiments of the invention are preferably implemented using a general purpose computer, such as the one illustrated in the block diagram of Fig. 13. The general-purpose computer 1310 includes the computer 1330, a video display 1320, and input devices 1360, 1362. The computer 1330 preferably has a central processing unit 1332 that can be implemented as one or more INTEL (trademark) processors or compatible processors 1332, for example, one or more storage devices 1338 (e.g., a computer hard disc, a floppy disk, CD-ROM, magneto-optical disc, or the like), a video interface 1336, memory 1334 which can include random access memory (RAM) and/or read-only memory (ROM), and one or more input/output (I/O) interfaces 1340. The components 1332, 1334, 1336, 1338, 1340, of the computer 1330 are typically connected to one another by a computer bus 1350, which typically comprises address, data and control buses, well known to those skilled in the art. Other computers and workstations can be used without departing from the scope and spirit of the invention.

The video interface 1336 provides video signals to the video display 1320 to provide visual output to a user. The user can interact with the computer 1330 using one or more input devices including a keyboard 1360 and a pointing device 1362 such as a mouse, which is coupled to the I/O interface 1340. It will be appreciated by a person skilled in the art that other input devices may be practiced with such a general- purpose computer. First Embodiment

The three primary steps in representing and/or encoding input data, or image, according to a first embodiment are shown in Fig. 7. The first step 90 involves representing the original image as a hierarchy of sub-band images and is referred to as Base Image Coding (BIC). The second step 92 represents each of the coefficient images, except for a small starting step, as a linear combination of the lower-order coefficient images plus a residual, if required, and is referred to as Primary Interscale Coding (PIC). In step 94, the coefficient vectors from each scale produced in step 92 (the Primary Interscale Coding) are represented and/or encoded to remove redundancy between the scales in these vectors to produce the represented output 95, and is referred to as Secondary Interscale Coding (SIC). It will be apparent to a person skilled in the art that reconstruction of the input image can be implemented by performing inverse operations in reverse order to those described in relation to Fig. 7.

Preferably, the represented output 95 can be further compressed as shown in Fig. 8 in which the coefficient and residual vectors are first quantised in step 100. Quantised data is then optionally run length encoded in step 102. In step 104, the run-length encoded data can optionally be entropy encoded to produce output data 106.

The term LL represents the LL subband, C 0), or the set of subbands derived from it. Similarly, HL, LH, and HH represents the corresponding coefficient sets. As will be described, Fig. 11 illustrates a further embodiment including a number of variations.

In decompression or reconstruction, the image representation contained in output data 106 can be decoded to give an exact representation of the image 31. The output data representation 106 is typically more compact than the original image 31 since careful use of entropy encoding methods in step 104 of Fig. 8 results in a smaller file size than that of the original image 31.

To effect greater compression, the residuals R obtained from the primary interscale coding step 92 of Fig. 7 can be quantised so as to take advantage of percepmal redundancy in the input image 31. The residuals Rfl-¹¹--⁾ may be stochastic and have little spatial strucmre. Therefore, the PIC step 92 acts to separate the strucmral and the stochastic components of the image 31. The stochastic component of the image 31 in the form of residual blocks may be replaced with a parametric model of the corresponding stochastic process. This may reduce the number of parameters if the resulting signal matches the stochastic part of the original input image 31 in a statistical sense. High percepmal fidelity may be achieved while obtaining desirable compression ratios.

Furthermore, the strucmral components have some redundancy in the higher frequency (small scale) behaviour that is perceptually less important than the low frequency (large scale) behaviour. Therefore, this separation of the strucmral components of the image 31 on the basis of scale allows each scale to be quantised to a degree appropriate to its percepmal significance.

In further embodiments of the invention, the base image coding step 90 of Fig. 7 can be implemented using Haar block and sample-difference transformations, for example. Example of First Embodiment

Using the embodiment of the invention illustrated in Fig. 7, an example illustrating this method will be described with reference to Fig. 6 in relation to a 512x512 pixel input image (P) 31. In step 90, the input image 31 is base image coded to produce scale 1. The LL subbands of each scale are in mrn base image coded (ie., σ =2, ..., 5). The final representation of the original image (P) 31 has the form illustrated in Fig. 6 comprising coefficient images 41, 51, 61, 71 and 81. The shaded section 41 in the upper left corner is the set of coefficient images of the smallest scale ( σ = 1) representation of the full image 31 in the PIC hierarchy.

For each change of scale (ie., σ=2,...,5), the size of the sub-band image is reduced by the factor of 2, thereby yielding a hierarchy of sub-bands 55, 65, 75 and 85 in Fig. 6. Each sub-band is represented by 16 coefficients and residual image. Each residual set has the same size as the sub-band 45, 55, 65, 75 and 85 but is more compactly representable. Thus a set of 768 coefficients α^ω and block residuals R, represent the higher-order coefficient images required to generate the next scale version of the image. This representation is obtained using the Primary Interscale Coding (step 92 of Fig. 7) and Secondary Interscale Coding (step 94).

After the three steps 90, 92 and 94 shown in Fig. 7 and described above are carried out, the image is represented (compressed/represented output 95) as an exact and invertible non-iterative transformation by the fourth level coefficient images C⁽⁰mn⁾ _? the vectors α⁽³⁾ , ρ⁽²⁾ , ρ⁽¹⁾ , p⁽⁰⁾ and the PIC residual R. In some instances, this representation can be larger than the original image itself. However, this representation facilitates compression achieved by quantising the residual vectors and the coefficient vectors, and then run length encoding and entropy encoding the results as shown in Fig. 8. Second Embodiment

In a second embodiment, recursive sub-banding is performed on the image representation 31 as described above in relation to base image coding step 90 of Fig. 7. The original image 31 is divided into 2x2 pixel blocks, and each block is encoded by projecting it onto four basis vectors. The basis vector for the k¹-¹ block B-j⁰⁾ is formed by linearly interpolating a set of node values in the top left pixel of each block as shown in Fig. 10 to form a continuous linear spline (a linear spline is chosen for simplicity but other interpolations can be used without departing from the scope of the invention). The node values are chosen so that the residual between the original image block and the resulting interpolation is orthogonal to the interpolation. The spline thereby forms the first term in a basis expansion. In addition, the node value is chosen to minimise the L2 Norm of the residuals in all of the blocks. An offset is added to the image (P) 31 to be encoded to ensure that the spline is strictly positive.

Once the spline is calculated, the other three terms in the orthogonal expansion are calculated by orthogonalising the four 2x2 blocks in the interpolation which contain the top left corner of the block as shown in Fig. 10. The values saved to represent the

2x2 block are the top left corner of the interpolation, and the projections of the 2x2 image block onto the other three orthogonal basis vectors 111-113 shown in Fig. 10 to represent and encode the data.

To decode the represented/compressed data, the 0^th component image C⁽°⁾ obtained is linearly interpolated to twice its original size. In turn, the following steps are performed on each block:

(a) Form the set of vectors illustrated in Fig. 10;

(b) Use the Gram Schmidt orthogonalisation method in the above indicated order to orthogonalise the blocks; (c) Multiply each of the first, second and third coefficients by the appropriate basis unit vector; and (d) Add the resulting blocks to the inteφolated block C⁽°⁾ to reconstruct the original image data (P) 31. Overlapping-Block Based Representation The block-based partitioning, described above, which leads naturally to a class of subband representations of the image, can include wavelet basis expansions which are used for image analysis. Wavelets also provide a subband representation of an image and can be used as the Base Image Coding on which the interscale coding is constructed. The wavelet decomposition with overlapping basis wavelets can be represented with the strictly non-overlapping-block, in accordance witii the above description. However, the regularity and convenience of the wavelet description is lost.

In order to more naturally accommodate the wavelet and more general subband decompositions a regular grid of points on MxN pixel centres, referred to as the "block grid", are associated with the image. Each block grid point has associated with it an MxN block of pixels. A set of basis blocks (MN in number) is then determined which are linearly independent of each other, and of their own and the other block's translates on the grid of centres, and which have support of μxv pixels (where μ > =M and v > =N).

The image can be exactly represented as a linear combination of these basis blocks and their translates on the block grid. If the (m,n)^m basis block associated with the (k.l)"^ grid point is B^ⁿ⁾ , the representation of the (i j) image pixel associated with the (m,n)^m block grid point Pk,l,i,j t_*-^*^{5 e} form: p _ V V ₍-l m/i) _j>(mjι) ,_* __Λ ^rkJj ~ Δ Δ ^k'j: ^£>k'J^,j+(k-k^,)MJ+(l-l')N ' V¹ ' )

where Cfif is the coefficient for the (m,n)^m basis block associated with the (k,l)^m block grid point. To allow for the more general case, the formal description permits the basis blocks to vary from block gridpoint to block gridpoint. This is not required for the matrix description, but would be required, for example, for image-derived expansions such as the BOSS transformation referred to above.

This representation is related to and can subsume the expression in Equation (2), where the image pixel i of the block k is represented as the i component of the m^m basis vector for the k^m block has value

. The basis vectors are only associated with the block through being centred on one of the block grid points and can have support which extends outside the block of points associated with that grid point, thereby contributing to pixels in blocks other than that with which they are associated. The multiscale hierarchy is achieved, as described above, by simply reapplying the analysis to the coefficient subband image C(0) and subsequently to the C(0) subband image from that analysis and so on down the hierarchy of scales. If the basis functions are all mutually orthogonal, the analysis is simplified considerably. Convolution-Based Primary Interscale Coding

Primary interscale coding, described above, captures most of the strucmral parts of the subbands, and thereby leaves decorrelated residuals. However, in some applications, the reconstructed image residual may still contain structural information.

In another embodiment, the representation of the subband can include shifted copies of the low frequency subbands (with appropriate boundary handling) in fact, rather than further decomposing each of the LL, HL, LH and HH subbands (e.g., coefficient sets 43 to 46 shown in Fig. 3B) prior to the interscale coding. Thus, a single level decomposition giving four subbands can be utilised and, in a number of applications, may provide optimal results. It will be apparent to a person skilled in the art that either method of the above described primary interscale coding, or the convolution-based primary interscale coding, or a combination thereof can be practised without departing from the scope of the present invention.

The prediction of the HL subband from the LL subband comprises finding the convolution kernel, which when applied to the LL subband with appropriate boundary handling provides a sufficient estimation of the HL subband. Similarly, this is also done for the LH and HH subbands.

For the m^m subband, the coefficients of each subband are expressed as follows: rim⁾ _ r⁽⁰⁾ κ"^{( )} - j ^{( )} ι o\ ^_ ZJ "-<v-y^Λ'V ^κϊJ ' t¹⁸--

I'J" for the kernel K which minimises the L2 norm of the residual R. The term C(0) represents the LL subband with the boundary extended in an appropriate way. Preferably, linear prediction of the column or row outside the boundary from several of the columns or rows near the boundary is used. This is necessary for reasonable prediction of pixels where the convolution kernel would extend outside the boundary of the subband.

A flow diagram illustrating this embodiment of the invention is illustrated in Fig. 11. An input image 31 is subband coded in step 120. The LL subband is provided to decision block 125. A check is made in decision block 125 to determine if the LL subband is at the bottom of the hierarchy. If decision block 125 returns false (No), execution continues by again applying image 31 to step 120. If decision block 125 returns true (Yes), the LL subband are provided to form convolution-based PIC data 128. The LL, HL, LH and HL subbands obtained in step 120 are provided to step 121 in which the HL, LH and HH subbands are predicted from the LL subband. The prediction kernels K determined in step 121 are provided to form convolution-based PIC data 128.

The HL, LH and HL subbands and the prediction kernels K are provided to step 122. The subband residuals Rij(^m _' ⁾ are calculated in step 122. The residuals are provided to form convolution-based PIC data 128. The convolution-based primary interscale coded data 128 consists of the lowest LL subband 128A, prediction kernels (K) 128B, and the subband residuals (R) 128C. The convolution-based PIC data 128 is provided to step 129. In step 129, the data is quantised. As described above, to effect greater compression, the data can be quantised in a manner to take advantage of percepmal redundancy in the input image 31. The quantised data is provided to step 130 in which it is entropy encoded to produce encoded image 132. Considerably less information is contained in the prediction kernels K than is the case for the coefficients provided by the primary interscale coding, described above in relation to Fig. 7. Consequently, while there is some redundancy in the prediction kernels K because of similarity across scales (σ), the secondary interscale coding required in step 94 of Fig. 7 is not required in this case to effect better compression. A flow diagram of the method for decoding data, encoded using the method of

Fig. 11, is illustrated in Fig. 12. The encoded data 132 is entropy decoded in step 133. This produces the convolution-based PIC data 134 (128 in Fig. 11) containing the LL subband 134A, the prediction kernels (K) 134B and the subband residuals (R) 134C.

The LL subband 134 A and the prediction kernels (K) 134B are provided to step

137. In step 137, the HL, LH and HH subbands are predicted using the LL subband 134A and the prediction kernels 134B. This produces the predicted HL, LH and HH subbands 138.

In step 140, the predicted HL, LH and HH subbands 138 and the subband residuals (R) 134c of the convolution-based PIC data 134 are added together to produce the reconstructed subbands 141. In step 142, the reconstructed subbands 141 are decoded. The next LL subband is provided to decision block 145. In decision block 145, a check is made to determine if the LL subband is at the top of the hierarchy. If decision block 145 returns false

(No), execution continues at step 134. Otherwise, if decision block 145 returns true

(Yes), the decoded image 146 is provided. Other embodiments of the invention may utilise encodmg of the HL, LH, and

HH subbands (or the subbands derived from them) using the decoded, quantised LL subband (or subbands derived from it), rather than the original data available at the time of decoding.

The foregoing describes only a number of embodiments of the present invention, and modifications obvious to persons skilled in the art can be made thereto without departing from the spirit and scope of the invention.

Claims

1. A method of hierarchically representing data, wherein said data comprises an MxN array of datapoints, M and N each being an integer greater than one, said method comprising the steps of: sub-band coding said data to provide a plurality of coefficient sets, each coefficient set having a different scale; and interscale coding said plurality of coefficient sets at least once to obtain a set of coefficient vectors and residual vectors to represent said image data.

2. The method according to claim 1, wherein said interscale coding comprises the steps of: primary interscale coding said plurality of coefficient sets; and secondary interscale coding said primary interscale coded plurality of coefficient sets.

3. The method according to claim 1. wherein said step of sub-band coding comprises base image coding and said base image coding is applied at least twice.

4. The method according to claim 3. wherein said base image coding is selected from the group consisting of a wavelet transform, a Haar transform, a Fourier transform, and a Sample-Difference transform.

5. The method according to claim 1 , wherein said interscale coding represents higher-order coefficients as linear combinations of lower-order coefficients to provide said set of coefficient vectors and said residual vectors.

6. The method according to claim 1 , further comprising the steps of: quantising said set of coefficient vectors and said residual vectors; and encoding said quantised set of coefficient vectors and said residual vectors.

7. The method according to claim 6, wherein said step of encoding comprises the further steps of: run-length encoding said set of coefficient vectors and residual vectors; and entropy encoding said run-length encoded vectors, wherein said entropy encoding is selected from the group consisting of Huffman coding and arithmetic coding.

8. An apparams for hierarchically representing data, wherein said data comprises an MxN array of datapoints, M and N each being an integer greater than one, said apparams comprising: means for sub-band coding said data to provide a plurality of coefficient sets, each coefficient set having a different scale; and means for interscale coding said plurality of coefficient sets at least once to obtain a set of coefficient vectors and residual vectors to represent said image data.

9. The apparams according to claim 8, wherein said means for interscale coding further comprises: means for primary interscale coding said plurality of coefficient sets; and means for secondary interscale coding said primary interscale coded plurality of coefficient sets.

10. The apparams according to claim 8, wherein said means for sub-band coding comprises means for base image coding said data at least twice.

11. The apparams according to claim 10, wherein means for said base image coding implements a transform selected from the group consisting of a wavelet transform, a Haar transform, a Fourier transform, and a Sample-Difference transform.

12. The apparams according to claim 8, wherein said means for interscale coding represents higher-order coefficients as linear combinations of lower-order coefficients to provide said set of coefficient vectors and said residual vectors.

13. The apparams according to claim 8, further comprising: means for quantising said set of coefficient vectors and said residual vectors; and means for encoding said quantised set of coefficient vectors and said residual vectors.

14. The apparams according to claim 13, wherein said means for encoding comprises: means for run-length encoding said set of coefficient vectors and residual vectors; and means for entropy encoding said run-length encoded vectors, wherein said entropy encoding is selected from the group consisting of Huffman coding and arithmetic coding.

15. A method of compressing digital image data, said method comprising the steps of: iteratively base image coding said image data to provide a plurality of coefficient sub-images, each sub-image having a different scale; interscale coding said plurality of coefficient sub-images to provide coefficient and residual vectors; quantising said coefficient and residual vectors; and encoding said quantised coefficient and residual vectors to provide hierarchically compressed image data.

16. The method according to claim 15, wherein said step of interscale coding further comprises the steps of: primary interscale coding said plurality of coefficient sub-images; and secondary interscale coding said primary interscale coded coefficient sub-images.

17. The method according to claim 15, wherein said base image coding comprises the steps of: projecting blocks of said image data onto a number of basis vectors; inteφolating a set of node values for a predetermined datapoint of each block; and orthogonalising blocks containing the predetermined datapoint of each block.

18. The method according to claim 15, wherein said step of interscale coding said plurality of coefficient sub-images comprises the further steps of: predicting HL, LH and HH sub-images dependent upon a lowest LL sub-image to produce prediction kernels; generating sub-image residuals dependent upon said HL, LH and HH sub- images and said prediction kernels; wherein primary interscale data comprise said lowest LL sub-image, said prediction kernels and said sub-image residuals.

19. An apparams for compressing digital image data, comprising: means for iteratively base image coding said image data to provide a plurality of coefficient sub-images, each sub-image having a different scale; means for interscale coding said plurality of coefficient sub-images to provide coefficient and residual vectors; means for quantising said coefficient and residual vectors; and means for encoding said quantised coefficient and residual vectors to provide hierarchically compressed image data.

20. The apparams according to claim 19, wherein said means for interscale coding further: means for primary interscale coding said plurality of coefficient sub-images; and means for secondary interscale coding said primary interscale coded coefficient sub-images.

21. The apparams according to claim 19, wherein said means for base image coding comprises: means for projecting blocks of said image data onto a number of basis vectors; means for inteφolating a set of node values for a predetermined datapoint of each block; and means for orthogonalising blocks containing the predetermined datapoint of each block.

22. The apparams according to claim 19, wherein said means for interscale coding said plurality of coefficient sub-images further comprises: means for predicting HL, LH and HH sub-images dependent upon a lowest LL sub-image to produce prediction kernels; means for generating sub-image residuals dependent upon said HL, LH and HH sub-images and said prediction kernels; wherein primary interscale data comprise said lowest LL sub-image, said prediction kernels and said sub-image residuals.