GB2401502A

GB2401502A - Image data encoding and transmission

Info

Publication number: GB2401502A
Application number: GB0310495A
Authority: GB
Inventors: Thomas Davies; Graham Thomas; Philip Tudor; Timothy Borer
Original assignee: British Broadcasting Corp
Current assignee: British Broadcasting Corp
Priority date: 2003-05-07
Filing date: 2003-05-07
Publication date: 2004-11-10
Anticipated expiration: 2023-05-07
Also published as: WO2004100556A2; GB2401502B; GB2401502A8; GB0310495D0; WO2004100556A3

Abstract

A method of encoding data having at least first, second and third portions including the step of forming an estimate of the accuracy of a prediction of the second portion based on the first portion and selectively predicting the third portion based on the second portion and the estimation of accuracy. Also described are a method of coding a picture which is partitioned into first and second portions comprising the steps of predicting elements of the second portion and coding the difference between elements of the second portion and the predictions; a method of coding wherein elements of a second portion are predicted by horizontally displacing elements of a second portion, the portions preferably being first and second fields of an interlaced image; a method of hierarchical coding wherein quantisation parameters of a child coding level are selected according to the quantisation parameters of a corresponding parent level and a method of coding a sequence of pictures comprising selecting a subset of the pictures, interpolating the remaining pictures from the subset and transmitting the subset of pictures with adjustment information to enable the decoder to reconstruct corrected versions of the interpolated pictures.

Description

2401 502 - 1 Data Processing The present invention relates to coding of

data, particularly image data, particularly for motion video. However, aspects of the invention may be applied to other types of data.

A large number of methods for encoding data are known. Some methods use prediction of a portion of the data based on other portions of the data. For example, encoding of motion video using MPEG, a video frame may be predicted based on the content of a preceding frame and estimated motion within the frame. It is then only necessary to code the difference between the prediction and the actual data. In sequences which can be easily predicted, the amount of difference information required for satisfactory reproduction may be I 5 significantly less than the original data, leading to beneficial compression.

However, a drawback is that, particularly for sequences which are difficult to predict, a significant amount of information may be needed to convey the information used in prediction (e.g. motion vectors) and the resulting predictions may still not be very accurate so the saving in data between the original data and the difference information is relatively small. There may even be cases where the prediction gives a worse starting point for determining difference information than no prediction at all.

In summary, predictive methods have two main drawbacks. Firstly, the information required for a successful prediction may become comparable in volume to the data thereby negating the benefit of prediction in data compression. Secondly, the prediction may fail significantly leading to little or no saving in data. Our earlier UK Patent Application No. 0122526.7 seeks to alleviate the first problem by using prediction based on data that will be available to a recipient of the coded data so that the prediction can be recreated by the recipient without having to communicate all of the parameters used in the prediction. However, there is still room for improvement on this technique. - 2

It should be noted that, while predictive techniques have been described above in the context of inter-frame coding, where prediction is based on preceding frames of a video sequence (typically by predicting motion of objects between frames), predictive coding techniques can also be used within a single field or frame of video information (intra-frame prediction) based on information available elsewhere in the frame. Prediction can also be used with data which are not necessarily image data and which need not necessarily be structured, for example text. In many cases where data have inter-dependencies, prediction can be used to reduce the amount of information that needs to be transmitted to recreate the data. By way of further background history, GB-A- 2,195,062 discloses a predictive coding method for a television signal. Aspects of the present invention are generally concerned with predictive coding techniques.

It will be noted that video sequences may be distributed in essentially two ways. Firstly, an entire video sequence may be transmitted as an entity, for example as a computer file. In such a case, the entire sequence will be available to an entity attempting to display the sequence, and compression techniques may be developed based on complete knowledge of the sequence and relying on the complete coded sequence being available to a decoder in order to display the video. Such techniques may be highly efficient but a drawback is that display of the sequence is only possible if the entire sequence is available and this may require large or prohibitive amounts of memory. This situation is not generally applicable to real time transmission of video sequences or to large sequences as playback cannot occur until the entire file is received. In a second possibility, the video is compressed in such a way that a sequence can be displayed when only a portion is available, and generally it is only necessary to store a few frames of data at a time in order to decode the sequence; broadcast MPEG II is in this latter category which will herein be termed a "serially accessible" video sequence. Another criteria is that the sequence is substantially 'randomly accessible', that is the sequencetcan - 3 be accessed at regular intervals, without requiring the sequence to be received from the start (although there may be delays before an accessible key point).

The present invention seeks to provide a coding method suitable for use in compressing video to provide a serially accessible sequence and preferably substantially randomly accessible sequence. However, the invention is applicable to other data and the sequence can be accessed in a parallel fashion if desired.

Aspects of the invention are set out in the independent claims and preferred features are set out in the dependent claims.

In a first aspect, the invention provides a data processing method for processing a stream of data to be coded using prediction, the stream comprising at least first, second and third portions, in which the second portion may be coded using prediction based on the first portion and the third portion may be coded using prediction based on at least the second portion, the method comprising: forming an estimate of accuracy of a prediction of the second portion based on the first portion; and selectively predicting the third portion based on at least the second portion and the estimate of accuracy of the prediction of the second portion.

Pursuant to the invention it has been appreciated that the effectiveness of an earlier prediction often gives a good indication of the likely effectiveness of a subsequent prediction for many types of real world data. It is very important to note that this method will not always give an optimum result. On the contrary, a much better "estimate" of the accuracy of prediction of the third portion can be obtained by actually doing the prediction as the necessary information is available at the coder. In many cases, as might be expected, the estimate of accuracy based on the estimate of accuracy of the previous prediction may turn out to be wrong, for example around transitions in the data. However, surprisingly, pursuant to the invention it has been appreciated that nonetheless - 4 the estimate of accuracy based on previous prediction is found to be helpful overall. A particular advantage is that the estimate can be made with data that will be available to a decoder which is receiving the signal serially and has only received the first and second portions.

The third portion will typically comprise an individual coefficient but may comprise a set of coefficients. The method will typically be repeated a plurality of times for subsequent coefficients, wherein each coefficient is selectively predicted based on the success (actual or estimated) of a previous prediction.

The method may be applied iteratively, selectively predicting subsequent portions based on the success of preceding predictions. In repeating the method iteratively, the second and third portions of a first selective prediction may become respectively the first and second portions of a second selective prediction, to predict a new third portion.

Preferably the estimate is based only on data that will be available to a decoder which is receiving the signal serially.

The method may be applied to a third portion which comprises an individual coefficient (preferably) or to a set of coefficients.

Selectively predicting the third portion may comprise comparing the magnitude of the second portion (or elements thereof) to the magnitude of the difference between the second and first portions (or the difference between one or more elements of the second and first portions).

The third portion may be predicted only if the second portion is substantially greater than said difference between the second and first portions. The second portion may be required to be at least a given multiple of the said difference between the second and first portions. It has been found advantageous if said multiple is at least two, preferably at least about four, more preferably about 6. - 5

In place of a linear multiplication, a non-linear scaling function may be applied which gives multiplies differing magnitudes by differing amounts, for example a higher "standard of proof" (ratio of difference to coefficient value) may be required for prediction of small coefficients on the basis that noise may be more significant or conversely a higher standard may be applied to larger coefficients on the basis that prediction using a large (wrong) absolute value may result in greater coding error than predicting based on a small value. A polynomial function may be used, with coefficients empirically determined for a given set of data - the coefficients may be pre-set or configurable or adaptively determined.

In practice, however, it is found that a simple linear scaling function is highly effective and simple to configure and implement.

In a simple but highly effective implementation, if the third portion is predicted, the second portion is used as a prediction.

Alternatively, prediction may be based on several preceding samples. Thus, where the third portion comprises a single coefficient, the decision whether or not to predict may be made on the basis of success of prediction of a second portion comprising a single coefficient but the prediction itself may be made on the basis of a plurality of coefficients, based on the (recent) history of the sample. The decision whether or not to predict may also be based on the success of prediction of a number of coefficients rather than a single coefficient and the second portion may differ in size from the third portion. Similarly, the first portion may comprise a number of coefficients from the recent history of the data.

Following selectively predicting, either the third portion or the difference between the third portion and a prediction thereof is preferably coded.

The data may be processed by a hierarchical coding scheme in which data is coded in successive layers, wherein the third portion comprises data from a - 6 higher layer and the second portion comprises corresponding data from at least one lower layer.

Selective prediction of the third portion may comprise a prediction based on the data preceding the third portion if a parameter multiplied by a measure of the magnitude of the difference between the second portion and a prediction of the second portion based on the data preceding the second portion is less than a measure of the magnitude of the second portion, and a default prediction otherwise. Preferably, where the third portion comprises one or more numerical coefficients, the default prediction is zero. For non numerical coding, a default symbol may be used. Preferably the default is based on a set value for the expected mean of the samples (or may be based on the actual mean of the last predetermined number of samples although if this number of samples is too high, this may introduce problems for decoder synchronization). The parameter may have a value of at least two.

The parameter may be set to a fixed value. Alternatively, the value of the parameter is adjusted based on dynamic analysis of the data. The parameter may be adjusted based on at least one of: a) The prior inputs to the selective prediction method; b) The prior outputs of the selective prediction method; c) The prior outputs of the prediction based on the data preceding the portion to be coded; d) The prior outputs of the comparison.

If the adaptative algorithm adapts over too long a period, this may present problems for decoder synchronization. However, this can be alleviated by restarting adaption at periodic intervals and/or communicating a default adaptation context at intervals. Typically, where the method is used to code portions comprising pixels of a frame, adaptation may be re-started each frame.

However, it does not require much bit rate to signal with each frame a few - 7 coefficients which give an adaptation context if desired and this may provide some advantage in some applications.

The method of prediction may be dynamically adjusted based on the results of prediction for preceding data. The effectiveness of the selective prediction method is compared with the effectiveness of the basic prediction method.

In a further aspect but closely related aspect, the invention provides a method of selectively predicting a current portion of data based on preceding data to comprising selectively predicting a current portion of data x(i) in accordance with the formula: ( S ( i I) ) ( S ( ) ) C ( ( ( ) ) ( ) ) ( ( ) ) O otherwise wherein P, (S(i-1)) represents the selective prediction of the current portion of data based on at least a part of the set of data preceding the current portion; P(S(i-1)) represents a prediction of the current portion of data based on at least a part of the set of data preceding the current portion; P(S(i-2)) represents a prediction of the preceding portion of data based on at least a part of the set of data preceding the preceding portion; C represents a measure of cost of transmitting or storing the data; x(i-1) represents the preceding portion of data; represents a parameter, normally at least two.

In place of a multiplication parameter, a scaling function may be applied, as mentioned above. - 8

Each portion of data may comprise a single coefficient. Each prediction may simply comprise the preceding coefficient. The cost measure may simply comprise the magnitude of the coefficient. The cost measure preferably comprises a measure of the entropy of the coefficient (which relates to the "cost" (in terms of bits required) to encode it).

One or both of the cost measure, the method of prediction and the parameter may be adapted in response to successive portions of data.

Portions to be coded are preferably quantised prior to prediction; this facilitates the decoding process if quantisation is used.

Alternatively, portions to be coded may be unquantised and the residual data after selective prediction is quantised. In this case, preferably the coefficients used for prediction are reconstructed coefficients, on which inverse quantisation and prediction have been performed.

In the case of adaptive modification of prediction method or cost measure, adjustment is preferably based on reconstructed coefficients. This facilitates repeating the operation at a decoder.

In a further aspect, the invention provides a method of coding a video picture in which the picture is partitioned into at least two portions, the method comprising predicting elements of the second portion by displacing elements of the first portion and coding the differences between elements of the second portion and the respective predictions thereof.

Wavelet coding is known not to code diagonal edges well, and this method could be used to shuffle lines so that elements of the lines about the edges are aligned vertically by said displacing before filtering.

The two portions may correspond to first and second times, in which case displacing elements may comprise displacing elements based on an estimate of motion between the first and second times.

Preferably elements of the first portion are positioned at spatially different but substantially adjacent positions to elements of the second portion. Preferably the first and second portions are interleaved. The portions most conveniently comprise alternating picture lines, preferably vertically separated, but horizontal and other separation may be used additionally or alternatively.

In a preferred implementation, the first and second portions comprise respectively first and second fields of an interlaced video picture frame.

In a preferred implementation of this aspect, there is provided a method of coding an interlaced video picture frame comprising at least first and second fields, each field comprising a plurality of picture lines, the lines of the first field interleaving with lines of the second field, the first and second fields corresponding to respective first and second times, the method comprising: predicting elements of the second field based on the first field, wherein predicting comprises shifting elements of the first field along the direction of said picture lines based on an estimated component of motion along said lines between said first and second times, whereby elements of the second field are predicted based on elements of the first field which are estimated to be aligned;

coding the second field following prediction. -

This is found to give an effective method of efficiently coding interlaced video, better than simple filtering which takes advantage of the similarities between adjacent fields but without requiring complex motion compensation to reconstruct a progressive picture.

Typically, the lines are horizontal and only horizontal motion is estimated.

Estimating horizontal motion only has the benefit that the motion estimation process can be greatly simplified, requiring little computational power. In particular, only a one-dimensional search is required to find a match and this may be an order of magnitude faster than a traditional two-dimensional search for a motion vector. It may only be necessary to store a few lines of picture to perform the motion estimation, for example if prediction is based simply on the preceding line, only two lines are required for the motion estimation and this IS can efficiently be implemented in hardware if required.

However, vertical motion may also be estimated. Conventional two dimensional motion estimation may be used. Alternatively, vertical motion estimation may be first performed only to integer line accuracy to index a vertical line to be used as the basis of horizontal motion estimation and this two-stage process may be faster than a conventional two- dimensional estimate.

Elements of a field may be predicted using a single element from the other field as a basis for prediction, and this has numerous advantages in terms of algorithm simplicity and efficiency. However, multiple elements may be used, using long filters if desired (and these may extend in both the vertical and the horizontal direction if desired) and motion may be estimated to sub-pixel accuracy in both vertical and horizontal directions. In practice, improvements in prediction accuracy from the greater complexity may be outweighed by the complexity of the prediction method.

Advantageously, elements of the first field are used both as a basis for estimating a component of apparent motion between the first and second fields and as a basis for predicting values of elements of the second field.

In a closely related aspect, the invention provides a method of coding an interlaced video frame comprising first and second fields, the method comprising predicting elements of the second field based on elements of the first field which are displaced horizontally in accordance with estimated horizontal components of motion between the first and second fields and

coding the second field following prediction.

Only a single estimate of horizontal motion may be used for each picture line; this simplifies problems of multiple predictors pointing to the same pixel but gives effective results. Having multiple predictors may cause difficulties when coding the first field by the method described herein.

Each element of the second field is preferably associated with a predictor comprising an element of the first field from a corresponding picture line at a position along the corresponding line corresponding to the position along the line of the element of the second field to be predicted plus an offset based on estimated motion.

The corresponding picture line may simply be the line above. Alternatively, it may comprise another line indexed by a first stage of vertical motion estimation, as described above. - 12

Advantageously, the interlaced picture is coded by a hierarchical or recursive coding method after prediction. Most preferably, the picture is coded by a wavelet coding method following prediction.

Preferably a difference between each element of the second field and a corresponding element of the first field is coded. Preferably the corresponding element of the first field comprises an element of the first field shifted based on a component of estimated motion. The difference may be used to provide a high-pass output of a wavelet coding filter, preferably wherein half the difference is coded. A low pass output of the wavelet coding filter may comprise the corresponding a component if the estimated prediction predicts no elements on a line or, if one or more elements on the second line are predicted, the average of the corresponding element of the first field and the

predicted element or elements of the second field.

Although this aspect works well with horizontal motion compensation alone, vertical motion compensation could also be incorporated Each element preferably comprises an individual pixel. However, each element may comprise a group of pixels. The frame may be coded without further motion estimation. Further wavelet decomposition may be performed without further motion estimation.

In a further aspect, the invention provides a method of hierarchical coding of data in which data is coded by a coding scheme into at least parent and child coding levels, wherein parent and child levels are both quantised, the method comprising selecting quantisation parameters for at least a portion of the child coding level based on the quantisation parameters for at least a corresponding portion of the parent coding level.

In this way, coding of child level(s) is simplified and less information needs to - 13 be communicated to a decoder to reconstruct the child levels. A minor drawback in certain cases is that the child levels cannot be reconstructed independently from the parent levels, but this will typically not be problematic.

The coefficients in the parent coding level may be partitioned and the partitioning of the child level may be based on the partitioning of the parent level. Preferably, partitioning of the parent level is based on the quantised coefficients of the parent level. Preferably, the parent level is communicated to a decoder prior to the child level.

The principle is not limited to quantisation classes, but could be used for other coding parameters, for example for defining contexts for entropy coding. The classification could be used for defining quantisers as well as quantisation classes. If the quantiser values themselves are determined implicitly, this avoids the need to signal them to the decoder.

A useful development is to classify coefficients in the parent level into subsets with different variances or other statistical properties, for example into zero and non-zero coefficients, or into zero, small, and large coefficients. Local properties can also be taken into account, for example local masking so that a class of coefficients defined in this way can be quantised more heavily if the results are likely to be masked, or less heavily in a region of interest.

Implicit classification into subsets, although implicit, may be assisted by additional signalling from the encoder.

It is important to note that this aspect of the invention, although applied to video in the embodiment, can be applied to other signals and can be applied to coding of multiple levels (with children and grandchildren) and to coding in which each parent has a number of children coeffcients (for example in audio coding using wavelet transforms in which each parent coefficient has two child coefficients). Similarly, although wavelet transforms are advantageous, other - 14 transforms in which parent-child relationships can be defined may be used, for example Block transforms, Lapped-Orthogonal Transforms and others.

In a further aspect, the invention provides a method of coding a sequence of pictures comprising: selecting only a subset of the sequence of pictures for communication leaving a subset of residual pictures; interpolating from the subset of pictures for communication according to an interpolation algorithm to create at least one interpolated picture corresponding substantially to one of the subset of residual pictures; encoding adjustment information based on a difference between the at least one interpolated picture and the corresponding residual picture; communicating the subset of pictures for communication together with the adjustment information.

In this way, a higher frame rate can be recreated from a signal which is essentially a lower frame rate using interpolation at a decoder and then adjustment of the interpolated frames (which may require considerably less information to achieve than transmission of the frames by a conventional method).

Preferably the subset of pictures is encoded.

The sequence of pictures has a first frame rate and the subset of pictures preferably comprises a sequence of pictures having a lower frame rate than the first frame rate.

Pictures may be dropped (i.e. be selected as residual pictures) at substantially - 15 regular intervals to reduce the frame rate. Additionally or alternatively pictures may be dropped based on picture information content, for example low information content pictures may be dropped, so that little adjustment information is required to recreate them. Pictures may be dropped to control the output frame rate and/or to control the output bitrate. Both selection of pictures and coding parameters may be adjusted to control the output bitrate.

The method typically further comprises communicating the output to be received by a decoder arranged to perform interpolation in accordance with the interpolation algorithm.

A complementary aspect provides a method of reconstructing a sequence of pictures comprising: receiving a sequence of pictures to produce a sequence of received 1 5 pictures; interpolating at least one picture from the received pictures according to a predetermined interpolation algorithm; receiving adjustment information and applying the adjustment information to the or each interpolated picture to modify the or each interpolated picture; outputting a sequence of pictures comprising the received pictures and the modified interpolated pictures.

The method may include decoding the sequence of pictures received to provide received pictures.

In an aspect relating to the overall system, there is provided a method of communicating a sequence of pictures comprising: at a coder, selecting only a subset of the sequence of pictures for communication leaving a subset of residual pictures; - 16 interpolating from the subset of pictures for communication according to an interpolation algorithm to create at least one interpolated picture corresponding substantially to one of the subset of residual pictures; encoding adjustment information based on a difference between the at least one interpolated picture and the corresponding residual picture; communicating the subset of pictures to a decoder together with the adjustment information; and at the decoder, receiving the subset of pictures to produce a sequence of received l O pictures; interpolating at least one picture from the received pictures according to the interpolation algorithm; receiving the adjustment information and applying the adjustment information to the or each interpolated picture to modify the or each interpolated picture; outputting a sequence of pictures comprising the received pictures and the modified interpolated pictures.

In all the related aspects, the interpolation algorithm preferably comprises interpolating based only on information available for one or more pictures other than the picture to be interpolated. Pictures other than the picture to be interpolated may comprise pictures encoded using prediction based on motion vectors, for example according to an MPEG II or similar scheme. Alternatively, pictures may be coded according to other aspects of the invention. Where motion vectors are available, motionvectors for the other pictures are used in interpolation and preferably motion vectors are communicated for pictures selected for communication but no motion vectors are communicated specifically for the interpolated pictures.

The adjustment information may include information concerning the timing of - 17 an interpolated frame to which the adjustment information relates. The adjustment information may include information concerning the motion between the interpolated frame and other communicated frames or residual frames. The adjustment information may include information describing global motion between the interpolated frame and other communicated frames or residual frames. The adjustment information may include information concerning the selection of parameters used for the control of, or modification of the operation of, the interpolation procedure. Adjustment information may be provided for all pictures to be interpolated. Some pictures may be interpolated without I O adjustment.

The invention extends to corresponding methods and apparatus for decoding.

The invention further provides a method of decoding a signal coded by a method according to anyone herein comprising reconstructing a portion of the signal to be reconstructed based on a received portion of data and a method of selective prediction of a subsequent portion corresponding to said method.

A method of decoding a signal coded according to the first aspect may comprise receiving and decoding first and second coded portions of data, selectively predicting a third portion of data based on the first and second portions and applying received coded data for the third portion to the selective prediction to reconstruct the third portion.

A method of decoding data coded hierarchically may comprise receiving data encoding a parent level and reconstructing the parent level and reconstructing the child level based on received data for the child level and re-using data used in the reconstruction of the parent level.

The invention extends to a computer program or computer program product for performing a method according to any preceding claim. - 18

The invention further provides apparatus corresponding to all method aspects.

The invention provides a coder arranged to perform any coding method. The invention provides a decoder arranged to perform any decoding method.

Preferred features of method aspects may be applied to apparatus and program aspects and vice versa. The coding methods of each aspect may be combined with each other (as in the embodiment) or combined with other coding methods, as may their preferred features.

Embodiments will now be described, by way of example, with reference to the accompanying drawings in which: Figure 1 shows overall hybrid encoder architecture; Figure 2 shows rate-distortion curves for two signal components; Figure 3 illustrates minimization of the Lagrangian cost function; Figure 4 illustrates frame coding architecture; Figure 5 illustrates perfect reconstruction analysis and synthesis filter pairs; Figure 6 illustrates wavelet transform frequency decomposition; Figure 7 illustrates a 3-level wavelet transform of a library picture ("LENNA"); Figure 8 illustrates uniform and dead-zone quantisers, with mid point reconstruction values; Figure 9 is an entropy coding block diagram; - 19 Figure 10 illustrates a unary encoding tree; Figure 11illustrates prediction of L1 and L2 frames when L1 frames are P frames; Figure 12 illustrates prediction of L1 and L2 frames when L1 frames are also Bframes; Figure 13 illustrates overlapping blocks in which the darkershaded areas show overlapping areas; Figure 14 illustrates sub-pixel motion-vector refinement; Figure 15: shows the neighbouring vectors available in raster-scan order for local variance calculation; Figure 16 illustrates Macroblock splitting modes; Figure 17 illustrates motion vector entropy coding architecture; Figure 18 demonstrates that data other than MB_SPLIT and MB_CBMODE is always associated with particular blocks, even if the relevant prediction unit is the sub-MB or MB itself; Figure 19 illustrates how block data is scanned in raster order by MB and then in raster order within each MB; Figure 20 shows that, for the purposes of prediction, values are deemed to be propagated within MBs or sub-MBs; Figure 21 shows the aperture for a) block parameter prediction, and b) MB parameter prediction, showing shaded blocks repeated in the NWMP scheme; Figure 22 shows how the MV prediction residue values are deemed to be propagated just as the values themselves are; Figure 23 illustrates exp-Golomb VLC coding; Figure 24 illustrates the lifting scheme analysis and synthesis processes; Figure 25 illustrates motion- compensated vertical filtering for wavelet decomposition of interlaced pictures; Figure 26 illustrates a motion-compensated frame and its wavelet transform; Figure 27 illustrates the parent-child relationship between pixels in subbands with the same orientation; Figure 28 illustrates an error feedback predictor (based on single tap feedback delay); Figure 29 illustrates an embodiment of adaptive prediction with a single tap adaptation delay; Figure 30 illustrates prediction according to an embodiment of the invention, with in-loop quantisation; Figure 31 illustrates rearrangement of block transform coefficients into subbands; Figure 32 illustrates parent-child relationships for subband transforms.

To assist in understanding the invention, a complete coding and decoding system will be explained. Certain components may be based on conventional components and so will not be described in detail, but it will be appreciated that the application in the context of this system may differ from conventional application. Although described in the context of a complete system for ease of understanding, it should be appreciated that the coder is essentially modular and, while there are certain synergies, features of the system may be provided independently of other features and in alternative combinations unless explicitly stated to be linked. As will become apparent, while certain components are novel per se, particularly in the adaptive prediction and interlace coding modules, other parts of the system comprise novel combinations of components which, while based on conventional components, combine in a novel manner to provide particular advantages and these combinations may be provided as building blocks for an alternative coder. - 21

Overall, the codec is based on a conventional motion-compensated hybrid codec. The coder has the architecture shown in Figure 1 below, whilst the decoder performs the inverse operations.

There are four main elements or modules to the coder: À Transform and scaling - this involves taking picture data and applying a transform (in this case the wavelet transform) and scaling and rounding the coefficients to perform quantisation; Entropy coding - this is applied to quantised transform coefficients and to motion vector (MV) data and performs lossless compression on them; 10. Motion estimation (ME) - this involves finding matches for picture data from previously coded pictures, trading off accuracy with motion vector bit rate; Motion compensation (MC) - this involves using the motion vectors to predict the current frame, in such a way as to minimise the cost of encoding the residual data.

The following sections describe these modules in more detail, after first describing the rate-distortion framework used throughout the encoder.

Rate-Distortion Outimisation (RDO) To make good decisions in compression it is desirable to be able to trade off the number of bits used to encode some part of the signal being compressed, with the error that is produced by using that number of bits (although a working coder can be made without using this principle). There is little point striving hard to compress one feature of the signal if the effect it produces is much more significant than compressing some other feature with fewer bits. In other words, it is desirable to distribute the bit rate to get the least possible distortion overall. - 22

One basic explanation of how this can be done is the Principle of Equal Slopes, which states that the coding parameters should be selected so that the rate of change of distortion with respect to bit rate is the same for all parts of the system.

To explain why this is so, we consider two independent components of a signal.

They might be different blocks in a video frame, or different subbands in a wavelet transform. Compressing them at various rates using a selected coding technique tends to give curves like those shown in Figure 2. They show that at low rates, there is high distortion (or error) and at high rates there is low distortion, and there is generally a smooth curve between these points with a convex shape.

We assign B1 bits to component X and B2 bits to component Y and examine ]5 the slope of the rate-distortion curves at these points. At B1 the slope of X's distortion with respect to bit rate is much higher than the slope at B2, which measures the rate of change of Y's distortion with respect to bit rate. This is not the most efficient allocation of bits - consider increasing B1 by a small amount to B1 +A and decreasing B2 to B2- . Then the total distortion has reduced even though the total bit rate hasn't changed, due to the disproportionately greater drop in the distortion of X. The conclusion is therefore that for a fixed total bit rate, the error or distortion is minimised by selecting bit rates for X and Y at which the rate-distortion curves have the same slope. Likewise, the problem can be reversed and for a fixed level of distortion, the total bitrate can be minimised by finding points with the same slope. - 23

Two issues arise in practice, firstly, how to find points on these curves with the same slope and secondly, how to hit a fixed overall bit budget. The first question can be answered by Figure 3: the intercept of the tangent to the rate- distortion curve at the point (RO,DO) is the point RO+ADO where -A is the slope at the point (RO,DO). Furthermore it is the smallest value of R+AD for all values of (R,D) that lie on the curve. So in selecting, for example, a quantizer in a given block or subband, one minimises the value D(Q)+AR(Q) over all quantizers Q. where D(Q) is the error produced by quantizing with Q and R(Q) is the rate implied.

In order to hit an overall bit budget (alternatively, distortion), one needs to iterate over values of the Lagrangian parameter A in order to find the one that gives the right rate (alternatively, distortion). In practice, this iteration can be done in slow time given a reasonable encoding buffer size, and by modelling the overall rate distortion curve based on the recent history of the encoder.

Rate-distortion optimization (RDO) is used throughout the video encoder of the embodiment and it has a very beneficial effect on performance (but is not essential to the invention). Although RDO is described mathematically, there is plenty of scope for a practical implementation to use empirical approximations, particularly to deal with the practical problems mentioned below.

1) There may be no common measure of distortion.

For example: quantising a high-frequency subband is less visually objectionable than quantising a low-frequency subband, in general. So there is no direct comparison with the significance of the distortion produced in one subband with that produced in another. This can be overcome by perceptual weighting, in which the noise in HE bands is downgraded according to an estimate of the Contrast Sensitivity Function (CSF) of the human eye, and this is what is preferably done in an embodiment. The problem even occurs in - 24 block-based coders, however, since quantisation noise can be successfully masked in some areas but not in others. Perceptual correction factors are therefore desirable in RDO in all types of coders.

2) Rate and distortion may not be directly measurable.

In practice, measuring rate and distortion for, e.g. every possible quantiser in a coding block or subband cannot mean actually encoding for every such quantiser and counting the bits and measuring MSE. However, what one can do is estimate the values using entropy calculations or assuming a statistical model and calculating, say, the variance. In this case, the R and D values may well be only roughly proportional to the true values, and an empirical factor(s) may be used to compensate for this effect in each component part of the encoder.

3) Components of the bitstream will be interdependent.

The model describes a situation where the different signals X and Y are fully independent. This is often not true in a hybrid video codec. For example, the rate at which reference frames are encoded affects how noisy the prediction from them will be, and so the quantisation in predicted frames depends on that in the reference frame. Even if elements of the bitstream are logically independent, perceptually they might not be. For example, with l-frame coding, each l-frame could be subject to RDO independently, but this might lead to objectionally large variations in quantisation noise between frames with low bit rates and rapidly changing content.

Frame coding Frame coding will now be described. Both intra and inter frames are treated similarly in the codec. First they are wavelettransformed using separable wavelet filters. Then they are quantised using RDO quantisers, before prediction (in the case of intra frames) and entropy coding. - 25

The architecture of a frame coding implementation is shown in Figure 4.

As can be seen, each wavelet subband is coded independently. This means that the encoding and decoding can be parallelised across all the subbands.

Wavelet transform The discrete wavelet transform is described in many sources, including J. Goswami and A. Chan, 'Fundamentals of Wavelets: Theory, Algorithms and Applications', Wiley 1999. In the codec it plays the same role of the DCT in MPEG-2, although applied to a whole picture rather than a block of data, in decorrelating data in a roughly frequencysensitive way, whilst having the advantage of preserving fine details better. In one dimension it consists of the iterated application of a complementary pair of half-band filters followed by subsampling by a factor 2.

These filters are termed the analysis filters. Corresponding synthesis filters can undo the aliasing introduced by the critical sampling and perfectly reconstruct the input. This arrangement is illustrated in figure 5. Clearly not just any pair of half-band filters can do this, and there is an extensive mathematical theory of wavelet filter banks. The filters split the signal into a LH and HE part; the wavelet transform then iteratively decomposes the LF component to produce an octave-band decomposition of the signal.

Applied to two-dimensional images, wavelet filters are normally applied in both vertical and horizontal directions to produce four so-called subbands termed Low-Low (LL), Low-High (LH), High-Low (HL) and High-High (HH). In the case of two dimensions, only the LL band is iteratively decomposed to obtain the decomposition of the two-dimensional spectrum shown in Fig. 6. - 26

The number of samples in each resulting subband is as implied by Figure 6: the critical sampling ensures that after each decomposition the resulting bands all have one quarter of the samples of the input signal.

The choice of wavelet filters has an impact on compression performance, filters having to have both compact impulse response in order to reduce ringing artefacts and other properties in order to represent smooth areas compactly.

One criterion is the number of 'vanishing moments' of the high-pass filter. A filter has zeroth-order moments if it removes constant signals any filter with a zero at DC does this; it has first-order moments if it removes linear signals; second-order moments if it removes quadratics; and so on. A filter has N vanishing moments if its z-transform has an N- order zero at unity. Suitable filters are the Daubechies (9,7) filter [M. Antonini, M. Barlaud, P. Mathieu, and 1.

Daubechies, 'Image coding using wavelet transform', IEEE Trans. Image Processing, vo 1, pp 295-220, 1992] which if applied naively require an average of 8 multiplications per sample for the transform in both directions.

However, the so-called 'lifting-scheme' /V. Sweldens, 'The lifting scheme: A custom-design construction of biorthogonal waveless'. Technical Report 1994:7, Industrial Mathematics Initiative, Department of Mathematics, University of South Carolina, 1994] allows wavelet filters to be factorised and many implementations of these filters and similar exist using fewer operations (typical factorisations can reduce the number of multiplications by a factor of 3 or more).

Quantisation Having transformed the frame, each subband's coefficients are independently quantised using a so-called uniform dead-zone quantiser. A simple uniform quantiser is a division of the real line into equal- width bins, of size equal to the quantisation factor OF: the bins are numbered and a reconstruction value is selected for each bin. The bins consist of the intervals - 27 [(N-112)*QF, (N+112)*QF] for integers N. which are also the labels for the bin, and it is the labels that are subsequently encoded. The reconstruction value used in the decoder (and for local decoding in the encoder) can be any value in each of the bins. The obvious, but not necessarily the best, reconstruction value is the midpoint N*QF. See Figure 8a below.

A uniform dead-zone quantiser is slightly different in that the bin containing zero is twice as wide. So the bins consist of [-QF,QF], with a reconstruction value of O and other bins of the form [N*QF, (N+1)*QF] for N>0 and [(N-1)*QF, N*QF] for N<O, with reconstruction points somewhere in the intervals. The bin structure is shown in Figure 8b with mid-point reconstruction points.

The advantage of the dead-zone quantiser is two-fold. Firstly, it applies more severe quantisation of the smallest coefficients, which acts as a simple but effective de-noising operation. Secondly, it admits a very simple and efficient implementation. The quantised value N of a coefficient c is given by the formula: N = CI/QF]il C > 0 = -Ucl / QFlotherwise where the braces L | mean that the remainder is to be discarded. The corresponding reconstructed value c is given by: c =OifN =0 = (N+0.375)*QFif N > 0 = (N-0.375)*Qif N < 0 A value of 0.5, giving the mid-point of the interval might be the obvious reconstruction point, giving as it does the mid-point of the bin. Typically, - 28 however, the values of transformed coefficients in a wavelet subband have a distribution with mean very near zero and which decays reasonably rapidly and uniformly for larger values (the Laplacian distribution, which has exponential decay, is often used to model wavelet coefficients). Values are therefore more likely to occur in the first half of a bin than in the second half and the smaller value of 0.375 selected reflects this bias, and gives better performance in practice. It is noted that a quantiser need not have uniformly distributed steps, and for a given set of data the optimum quantiser (in a rate-distortion sense) can be constructed by the so-called Lloyd-Max algorithm [R. Gray and D. Neuhoff, 'Quantisation', IEEE Trans. Info. Theory, vol. 44 no. 6, October 1998].

However, for data with a Laplacian distribution, this turns out to be a uniform quantiser.

Laurannian parameter control of subband quantisation The optimum quantiser is chosen for each subband by computing the quantiser which minimises a Lagrangian combination of rate and distortion. Rate is measured via a zeroth-order entropy measure Ent(q) of the quantised symbols resulting from applying the quantisation factor q, calculated as a value of bits/pixel. Distortion is measured in terms of the perceptually-weighted mean- square-error, MSE(q), resulting from the difference between the original and the quantised coefficients. Hence the total measure for each quantiser q is: MS (q) + . Ent(q) w where w is the perceptual weight associated with the subband - higher frequencies having a larger weighting factor. The quantisers are incremented in quarter-powers of 2 - i.e. q is an integer approximation of 2n4 for integers n. In other words, the quantisers represent the coefficient magnitudes to variable fractional-bit accuracies in quarter-bit increments.

Suitable perceptual weighting may be based on that disclosed by A. Watson et al, 'Visual thresholds for wavelet quantisation error', SPIE Proc. Vol 2657, Hum. - 29

Vis. &Elect Imag. B. Rogowitz and J. Allebach ea., 1996. In an implementation, for ease of use, the Lagrangian parameter A may be entered as a user- selectable quantisation parameter. It may be convenient for the actual value to be entered in log units, since then there will be a roughly linear relationship with the resulting PSNR. The larger the value, the lower the resulting bit rate, and vice-versa.

Frame entronv coding The entropy coding used in frame coding is based on four stages: coefficient prediction, binarisation, context modelling and adaptive arithmetic coding. All these stages are present for intra frame coding, but prediction is absent from inter-frame coding, since the picture has already largely been decorrelated by motion compensation.

The purpose of the first two stages is to provide a bitstream with easily analysable statistics that can be encoded using arithmetic coding which can adapt to those statistics to reflect local variations in picture characteristics.

Suitable binarisation and adaptive context modelling may be based on the Context-Adaptive Binarized Arithmetic Coding (CABAC) used in various parts of the H.264/AVC standard [ITU-T Rec. H.264 I ISO/IEC 14496-10 AVC, Draft standard, available at ftp://ftp.imtc-files.org/jvt-experts/1, although implementation details may differ. The inventive adaptive prediction used herein, however, is quite different to conventional methods and will be described below in detail.

Prediction The aim of the prediction stage is to remove any residual interdependencies between coefficients in the wavelet subbands, so that subsequent entropy coding can be applied as far as possible to decorrelated data. Prediction only applies to intra frames. -

Residual interdependencies within subbands result in the main from edges in the original image. For example, a horizontal edge produces a horizontal line in the LH subbands, which have been low-pass filtered horizontally and high-pass filtered vertically. In this case, it is possible to predict each coefficient c(x,y) from its predecessor on the same line, c(x-1,y). So the value to be entropy coded could instead be: d(x,y) =c(x,y)- c(x-1,y) i.e. the stream is equivalent to DPCM followed by entropy coding if the coefficients are scanned in raster order.

For HL bands, the roles of x and y would be swapped, since vertical edges will introduce vertical dependencies in these bands. HL bands should therefore be scanned by column rather than by row.

Prediction in diagonal (HH) bands could proceed by using diagonally adjacent coefficients. However, there is little practical benefit gained with a convenient diagonal predictor and hence embodiments (including the present embodiment) may have no prediction in HH bands. The only remaining band is the DC band.

This band is a very low-resolution version of the original picture. In this band, the prediction operation is defined by: d(x,y) =c(x,y)-( c(x-1, y)+ c(x,y-1) c(x-1,y-1))/3 If the subband is scanned in row (raster) or in column order then each of the predicting values is available to the decoder.

For pictures with low texture content, these prediction operations reduce bit- rate considerably. Unfortunately, for pictures with a large amount of texture, neighbouring coefficients are already decorrelated and the prediction actually increases bitrate. Most pictures are a mixture of these characteristics, and so the improvement is not clear-cut. - 31

To deal with this problem, the embodiment makes the prediction adaptive, since edges and texture are local features. The approach taken is to predict if prediction would have worked for the previously scanned coefficient. For vertically-oriented subbands (i.e. LH bands) the rule is: I(x,y) =c(x,y)- c(x- 1,y) if xlc(x - 1, y) - c(x - 2, Y)1 < IC(X - 1, Y) =c(x,y) otherwise In one implementation, the best value of was determined experimentally to be 6 (preferably at least 2, more preferably at least 3 and preferably also no more than 10, values of 4-8 being particularly suitable). This implies that the desired threshold for the efficacy of prediction is rather high. Adaptation is applied to LH and HL bands but not to the DC band.

This technique resulted in bit-rate reductions of up to 30% for some pictures, and increased the bit-rate of other, highly textured, pictures by less than 2% (which represents an acceptable increase for such a picture - in most practical sequences a significant decrease can be expected).

Adaptive prediction of wavelet coefficients To start with some definitions, consider a stream of symbols x(i). These could be binary symbols (O or 1) or integers or any series of elements from a discrete alphabet on which it is possible to perform arithmetic operations. The symbol stream x(i) could result, as in the case described above, from scanning coefficients from a wavelet subband transform of an image in a predetermined order, but many other examples are possible, such as a stream of characters in a text document. - 32

Denote by S(i) the set of all symbols up to and including x(i): S(i)= {X(k): k < i The symbols in S(i-) are all available for predicting x(i). The result of predicting using prior symbols can be written: )(i)=X(i)-P(S(i- 1)) where P is some prediction operator. A traditional error-feedback predictor can adapt P on the basis of consideration of the values y(k) and x(i) for k<i - that is, on the actual prior outputs of the predictor and on the prior inputs. This is illustrated in Figure 28.

In distinction to this approach, the invention conditions a predictor on the basis of what would have been the best prediction to have made for prior symbols.

This distinction is illustrated in Figure 29.

The whole scheme can be thought of as deriving a new predictor P. from a simpler predictor P. by switching between a prediction of zero and that given by P. In the example of a HL-filtered subband of a wavelet transform of an image, each wavelet coefficient is predicted by its immediate predecessor on the same line. If the symbols are scanned in raster order (ie line by line) then the predictor P is simply given in this case by P(S(i))= X(i) since this is the previous symbol in scan order also. With these definitions, a new predictor P. can be defined by, for example, performing a simple weighted comparison between predicted and non-predicted outputs: - 33 P (S(i - 1)) jP(S(i - 1)) if IP(S(i - 2)) x(i - 1)1 < Ix(i - 1)1 (*y O otherwise Generalisations 1) The adaption mechanism can be combined with other means of adaption, such as error feedback, and its component parts can themselves be adapted.

For example, the new predictor P. can be used as the predictor in an error- feedback scheme. By such means, all parts of the predictor - the subsidiary predictor P. the comparison means and the control means - can all be adapted. In particular, in the example given above, the weighting coefficient A can be adapted. A great variety of signals are available for further adaptation: a) The prior inputs to the prediction mechanism Pa; b) The prior outputs of the prediction mechanism P.; c) The prior outputs of the subsidiary prediction mechanism P; d) The prior outputs of the comparison mechanism.

Feedback elements can additionally be incorporated into the embodiment of the invention. In particular, it is possible to assess the past success of the new prediction mechanism P' and compare it with the success of usingthe subsidiary predictor P and use it to establish the reliability of the switch mechanism and so adjust either the control means or the comparison means.

2) The process can be iterated and the new prediction operator P. can be used as the subsidiary predictor to produce a further prediction operator P2 by the same method.

3) Various methods can be used for performing the comparison and control means.

A simple generalization of formula (*) would be to define - 34 P (S(i 1)) jP(S(i -1)) if C(P(S(i - 2)) - x(i - ])) < C(x(i -1)) O otherwise where C is any cost measure (such as an entropy measure) that may also be adapted over time, and A is a parameter, that may also be adapted over time.

4) In the case of the coding a discrete signal in a compression system, quantisation may also be brought within the prediction loop.

In the example given of coding wavelet subbands, the wavelet coefficients were assumed to have all been quantised prior to prediction and entropy coding.

However it is possible to use prediction on unquantised data and then quantise the prediction residuals, provided that: a) the coefficients used for prediction are reconstructed coefficients, on which inverse quantisation and prediction have been performed; b) any adaption is based only on reconstructed coefficients.

The adjustments required to the block diagram of Figure 29 are shown in Figure 30.

1 5 Binarization Binarization is the process of transforming the multivalued coefficient symbols into bits. The resulting bitstream can then be arithmetic coded. The reason for doing this is that the efficiency of arithmetic coding depends on the accuracy of the statistics that the encoder uses. Wavelet coefficients cover a wide range but are heavily clustered around zero (except in the LL band). So the symbol space is large but most possible symbol values occur rarely or not at all.

Maintaining a probability model of all the possible values that could occur in a quantised subband consumes a lot of memory, and results in inaccurate estimates of probabilities since the sample space for most symbols is so small.

Where the arithmetic coding is adaptive, as it is here, this can impact on coding efficiency significantly. -

The simplest way to binarize a symbol is directly: a symbol is encoded by encoding the constituent bits of the binary representation of its magnitude, followed by a sign bit. This is termed bit-plane coding. Modelling the resulting bitstream in order to code it efficiently is possible and has been done but somewhat complicated. Each bit-plane has different statistics, and needs to be modelled separatelytaking into account interdependencies between bit-planes.

Modelling many conditional probabilities is possible if complex, of course, but in practice tends to reintroduce the problem of small sample sizes.

Instead, the arithmetic coding scheme used in the codec of the embodiment binarizes coefficient magnitudes with so-called unary coding, subsequently encoding sign bits for non-zero magnitudes. Unary coding is a simple VLC in which every non-negative number N is mapped to N zeroes followed by a 1 as shown in Fig. 1 0.

Unary encoding is actually optimal for a Laplacian probability distribution where the probability of N occurring is 2-'l+lNI) (This is because the length of each symbol is bits, which is precisely the optimal length -log2(p) predicted by information theory, where p is the probability of occurrence). A different context, or set of contexts, can be defined for each bin or for a number of bins together, and the effect is to analyse the coefficient statistics in an iterative way. For example, the probability of zero occurring in Bin 1 is the probability that the coefficient magnitude is >0, whilst P(O: Bin 2) is the probability that the magnitude is >1, and so on. The sparse statistics of the larger coefficients can be effectively combined by having a single context for all the larger bins, representing the tails of the distribution. The process will be further explained

by example. - 36

For example, we suppose one wished to encode the sequence: -3 0 1 0 -1 When binarized, the sequence to be encoded is: O O 0 1 1 1 1 1 o 1 1 1 1 1 1 o 1 1 o The first 4 bits encode the magnitude, 3. The first bit is encoded using the statistics for Bin1, the second using those for Bin 2 and so on. When a 1 is detected, the magnitude is decoded and a sign bit is expected. This is encoded using the sign context statistics; here it is O to signify a negative sign. The next bit must be a magnitude bit and is encoded using the Bin 1 contexts; since it is 1 the value is O and there is no need for a subsequent sign bit. The same principle is applied iteratively.

Context modellinn Suitable context modelling of this embodiment is based on the principle that small coefficients, particularly zero, are wellpredicted by their neighbours whilst large coefficients are not. Therefore the codec conditions the probabilities used by the arithmetic coder for coding bins 1 and 2 on the size of the neighbouring coefficients.

The reason for this approach is that, whereas prediction removes correlation between a coefficient and its neighbours, they may not be statistically independent even if they are uncorrelated: small and especially zero coefficients in wavelet subbands tend to clump together, located at points corresponding to smooth picture areas.

To compute the context, a value nhood_sum is calculated at each point (x, y) of each subband, as: nhood _ sum(x, y) = |c(x -1, A)| + |c(x, y -1)| 37 (NB: nhood_sum depends on the size of the neighbouring coefficients, not on the predicted neighbouring coefficients d(x,y) as defined above).

There are nine contexts used, which are as follows: 1. FRAME_SIGN_CTX context for sign bits; 2. FRAME_BIN1a_CTX - context for Bin 1, O<nhood_sum<3 3. FRAME_BIN1 b_CTX - context for Bin 1, nhood_sum 3 4. FRAME_BIN1 z_CTX - context for Bin 1, nhood_sum=0 5. FRAME_BIN2a_CTX context for Bin 2, nhood_sum<3 6. FRAME_BIN2b_CTX - context for Bin 2, nhood_sum 3 7. FRAME_BIN3_CTX - context for Bin 3 8. FRAME_BIN4_CTX context for Bin 4 9. FRAME_BIN5plus_CTX - context for Bin 5 or more After binarization, a context is selected based on the value of nhood_sum and on the bin, and the probabilities for O and 1 that are maintained in the appropriate context will be fed to the arithmetic coding function along with the value itself to be coded.

In the example of the previous section, when coding the first value, -3, the encoder then checks the values of neighbouring coefficients and produces the value nhood_sum. Based on this value, a different statistical model (that is, a count of 1 and a count of zero) is used to code the first two bins. So the coder maintains, for example, the probabilities that Bin 1 is O or 1, given that the value of neighbouring coefficients is O - this is contained in FRAME_BIN1z_CTX.

These are fed to the arithmetic coding engine for encoding the bit in Bin 1, and the context probabilities are updated after encoding. - 38

Arithmetic coding Examples of Arithmetic coding are known, for example as disclosed in A. Moffat et al, 'Arithmetic coding revisited', ACM Transactions on Information Systems, 16(3), pp256-294, July 1998 so it will only be described briefly.

Conceptually, an arithmetic coder can be thought of a progressive way of producing variable-length codes for entire sequences of symbols based on the probabilities of their constituent symbols. For example, if we know the probability of O and 1 in a binary sequence, we also know the probability of the sequence itself occurring. So if P(0)=0.2, P(1)=0.8 then P(11 1011111110111101 01)=(0.2)3(0.8)47=1.8x10- 4 (assuming independent occurrences).

Information theory then says that optimal entropy coding of this sequence requires log2(1/P)=12.4 bits. Arithmetic coding produces a code-word very close to this optimal length, and implementations can do so progressively, outputting bits when possible as more arrive. Using a look-up table VLC for 20- bit sequences is, by contrast, less practical, requiring several Mbytes storage.

VLCs for long sequences do exist, and depend on formulae for generating codewords on the fly. An example is the exp-Golomb codes used for the DC coefficient coding in Intra frames, described below. Adaptive VLCs also exist, based on a parameterised statistical model. In both cases, efficiency is only very high if the model used is accurate. Arithmetic coding can adapt to any statistics.

All arithmetic coding (AC) requires is estimates of the probabilities of symbols as they occur, and this is where context modelling fits in. Since AC can, in effect, assign a fractional number of bits to a symbol, it is very efficient for - 39 coding symbols with probabilities very close to 1, without the additional complication of run-length coding. The aim of context modelling is, in fact to use information about the symbol stream to be encoded to produce accurate probabilities as close to 1 as possible. This is illustrated by example, considering a bit stream with P(0)=0. 5=P(1). If we knew that the bit stream was actually composed of two bit streams (perhaps alternated), in one of which P(0)=0.25 and P(1)=0.75 and in the other P(1)=0.25 and P(0)=0.75 then we could feed these more refined probabilities to the arithmetic coder as appropriate. The result would be a bit rate saving of 19%.

The estimates are computed for each context simply by counting their occurrences. In order for the decoder to be in the same state as the encoder, these statistics cannot be updated until after a binary symbol has been encoded. This means that the contexts must be initialised with a count for both l S O and 1, which is used for encoding the first symbol in that context.

An additional source of redundancy lies in the local nature of the statistics. If the contexts are not refreshed periodically then later data has less influence in shaping the statistics than earlier data, resulting in bias, and local statistics are not exploited. A simple way to refresh the contexts is to pick a number, preferably at least about 1000, preferably no more than about 10,000. (8092 is used in the codec) so that if the count of O plus the count of 1 for that context exceeds that number then these counts are halved. The effect is to maintain the probabilities to a reasonable level of accuracy, but to keep the influence of all coefficients roughly constant. -

Motion estimation and motion compensation Motion estimation will now be described.

GOP structures Three types of frames are defined in the codec. Intra frames (I frames) are coded without reference to other frames in the sequence, whereas inter frames are predicted by motion estimation from other frames. Level 1 frames (L1 frames) are inter frames which are also used as temporal references for other frames. The embodiment provides for L1 frames to be either P frames (i.e. only forward predicted), or B frames (bi-directionally predicted). Level 2 frames (L2 ] 0 frames) are conventional B frames which are not used as a temporal reference.

It is possible in either case to have more than one reference and indeed in the embodiment the references for P/L1 frames are (in temporal order) the previous I frame and the preceding L1 frame, if one exists (see Figure 11).

A practical embodiment may allow L1 frames to be B-frames. This might be useful in some applications, such as streaming, perhaps, where the key issue might not be the overall latency (which must always be at least one GOP) but the frequency of Intra frame updates. The references in this case are the (temporally) prior L1 frame and the succeeding l-frame, or the preceding and succeeding l-frame, if the first L1 frame is being coded (Figure 12). In certain cases, the second configuration may confer advantages.

Overlapped Block-based Motion Compensation Motion compensation in the codec uses Overlapped Block-based Motion Compensation (OBMC) to avoid block-edge artefacts which would be expensive to code using waveless; this is a known technique, disclosed in G. Heising, D. Marpe, H. Cycon and A. Petukhov, 'Wavelet-based very low bit-rate video coding using image warping and overlapped block motion compensation', IKE Proceedings Vision, Image and Signal Processing, vol. 148, no. 2, pp. 93 - 41 101, April 2001. The size of blocks can be varied with a desired degree of overlap selected: this is configurable within the codec. Although the codec need not be designed to be scalable, in the embodiment the size of blocks is the only non-scalable feature, and for lower resolution pictures, smaller blocks can easily be selected.

The OBMC scheme adopted in the embodiment is based on a separable RaisedCosine mask. This acts as a weight function on the predicting block.

Given a pixel p=p(x,y,t) in frame t, p may fall within only one block or in up to four if it lies at the corner of a block (see Figure 13).

Each block that the pixel p is part of has a predicting block within the reference frame selected by motion estimation. The predictor p for p is the weighted sum of all the corresponding pixels in the predicting blocks in frame t', given by p(x-Vj,y-wi,t') for motion vectors (V,Wi). The Raised-Cosine mask has the necessary property that that sum of the weights will always be 1: p(x,y,t) = Wip(x - vi,y - Wi,t), Wi = 1 i i Although this may seem prima facie complicated, in a practical implementation the additional complexity is to apply the weighting mask to a predicting block before subtracting it from the picture and this is feasible.

Motion estimation Motion estimation will now be described, beginning with Hierarchical motion estimation.

Hierarchical motion estimation The embodiment of the codec supports motion estimation to 1/8 pixel accuracy.

Motion estimation (ME) is by far the single largest computational load in encoding and in many cases it is impractical to perform brute-force motion - 42 vector searches over reasonable areas, especially for HD material. Various shortcuts are desirable and the technique adopted herein is a Rate-Distortion Optimised hierarchical search method.

In this method, integral pixel accuracy is produced by hierarchical ME, and then refined to sub-pixel accuracy. The hierarchical approach repeatedly downconverts both the current and the reference frame by a factor of two in both dimensions, four times in all. Motion vectors are estimated for blocks at each level and used as a guide for the next higher resolution. The block size remains constant (and the blocks will still overlap at all resolutions) so that at each level there are only a quarter as many blocks and each block has 4 children at the next higher resolution. The lower-resolution block's motion vector is then used as a guide vector to be refined by the children at the next highest level of resolution. At each resolution, block matching proceeds by searching in a small range around the guide vector for the best match using the RDO metric (which is described below).

Downconversion and search ranges in hierarchical ME The hierarchical approach dramatically reduces the computational effort involved in motion estimation for an equivalent search range. However it has two significant risks that need to be mitigated in an implementation.

The first risk is that when there is a variety of motions in a sequence, the hierarchical method matches the motion to that of the largest objects at an early stage and it is impossible to then escape to better matches at later stages because the search ranges are too small. To mitigate this, the codec does two things: it always searches around the zero vector (0, 0) as well as around the guide vector - this allows it to track fast and slow-moving objects; secondly, it always searches a range of +/-2 pixels at each level - this means that an error of 1 pixel at the next lowest resolution can always be corrected. - 43

The second risk is less apparent, but we have found it can still be a problem with very highly textured sequences. This is that the aliasing resulting from repeated downconversion with short (e.g. 1/2, 1/2) filters renders motion estimation difficult at lower levels, and can produce poor guide vectors. This is remedied in the codec by using the non-separable filter: 1 1 1 2 2 1 1 2 2 1 101 1 which reduces aliasing.

Sub-pixel refinement and upconversion Sub-pixel refinement also operates hierarchically. Once pixel-accurate motion vectors have been determined, the reference picture is unconverted using a Nyquist windowed-sine function filter. Each block has an associated vector (VO,Wo) where V0 and W0 are multiples of 8 with respect to the unconverted reference. 1/2-per accurate vectors are found by finding the best match out of (V0,Wo) and its 8 neighbours: (Vo+4,Wo+4), (Vo'Wo+4), (Vo-4'Wo+4), (Vo+4, Wo), (Vo-4,Wo)' (Vo+4'Wo-4)' (Vo'Wo-4), (Vo-4,Wo-4) This in turn produces a new best vector (V',W,), which provides a guide for 1/4-per refinement, and so on.

The process is illustrated in Figure 14.

RDO motion estimation metric The performance of motion-estimation and motion-vector coding is very important critical to the performance of a practical video coding scheme, as motion vectors can often comprise the majority of the information used in the encoding of inter frames, particularly B- frames. With motion vectors at 1/4 or 1 /8th pixel accuracy, a simple-minded strategy of finding the best match between frames can greatly inflate the resulting bitrate for little or no gain in - 44 quality. What is required is the ability to trade off the vector bitrate with prediction accuracy and hence the bit rate required to code the residual picture and the eventual quality of that picture.

A simple but effective way we have found to do this is to incorporate a smoothing factor into the metric used for matching blocks. So the metric consists of a basic block matching metric, plus some constant times a measure of the local motion vector smoothness.

The basic block matching metric used is Sum of Absolute Differences with DC removal (DCSAD). Given two blocks XYof samples, this is given by: DCLSAD(X, Y) = lXi j - Yj j - DC(X, Y)| i.i where DC(X,Y) =-xj j - yi j is the average of the differences.

DC-removal helps maintain motion vector accuracy in fades and lighting changes.

The smoothness measure used is the local variance between the selected motion vector and previously computed motion vectors. So if the blocks are estimated in raster-scan order then vectors for blocks to the left and above are available for calculating the local variance, as illustrated in Fig. 15.

The local variance is defined, if the vectors of the neighbouring blocks are Vj, by the recipe: VAR(V) = ||V -Vj||, where||W|| = W W (inner product) ieN The total metric is a combination of these two metrics. Given a vector V which maps the current picture block X to a block Y=V(X) in the reference frame, the metric is given by: - 45 M (X, Y) = DCSAD(X, Y) + VAR(V) The value A is a coding parameter used to control the tradeoff between the smoothness of the motion vector field and the accuracy of the match. When A is very large, the local variance dominates the calculation and the motion vector which gives the smallest metric is simply that which is closest to its neighbours.

When A is very small, the metric is dominated by the SAD term, and so the best vector will simply be that which gives the best match for that block. For values in between, varying degrees of smoothness can be achieved.

The parameter A can be set at the command-line of the encoder, but if omitted is calculated as a multiple (currently 0.1) of the Lagrangian rate-control parameters for the L1 and L2 frames (see above), so that if the inter frames are compressed more heavily then smoother motion vector fields are derived.

Macroblock structures and motion vector data This section describes the macroblock structures which are used to introduce a degree of adaption into motion estimation by allowing the size of the blocks used to vary. The motion estimation stage of the encoding is organised by macroblock, and each combination of block size and prediction mode is tried using the RDO block-matching metric, and the best solution adopted macroblock by macroblock.

Macroblocks and variable-sized block matching A macroblock consists of a 4x4 array of blocks, and there are three possible ways of splitting a MB, which are encoded in the MB variable MM_SPLIT: MB_SPLIT=O: no split, a single MV per reference frame for the MB; MB_SPLIT=1: split into four submacroblocks (sub-MBs), each a 2x2 array of blocks, one MV per reference frame per sub-MB; MB_SPLIT=2: split into the 16 constituent blocks. - 46

It is noted that, while still somewhat complex, the structures defined herein are considerably simpler than the 7 possible MB splittings, together with prediction from any two of up to 5 reference frames, and several special forms of motion vector prediction conventionally proposed in standard H.264.

The splitting mode is chosen by redoing motion estimation for the sub-MBs and the MB as a whole, again using the RDO metric described in the previous section, suitably scaled to take into account the different sizes of the blocks. At the same time, the best prediction mode (PREDMODE) is chosen. Four prediction modes are available for each prediction unit (block, sub-MB or MB): MODE_NO_PRED: no prediction - intra coded; MODE_REF1_0NLY: only predict from the first reference; MODE_REF2_0NLY: only predict from the second reference (if one exists); MODE_REF1AND2: bidirectional prediction.

So a different prediction mode can be chosen for each prediction unit of the MB. However, mode data itself incurs a cost in bit-rate. So a further MB parameter is defined, MB_CBMODE, which records whether a common block prediction mode is to be used for the MB: MB_CBMODE=1: the same prediction mode is to be used for each prediction unit with the MB; MB_CBMODE=O: the prediction modes of each block or sub-MB are different.

For example, if MB_SPLIT=1 then the MB is split into 4 sub-MBs, each itself combining four basic blocks. If MB_CBMODE=1 then if one of those sub-MBs has PREDMODE equal to MODE_REF1_0NLY, then all the sub-MBs have this - 47 prediction mode. Of course if MB_SPLIT is 0, then this is true by default- in this case MB_CBMODE need not be coded.

The result is a hierarchy of parameters: MB_SPLIT determines whether MB_CBMODE needs to be coded for a given macroblock; the MB parameters together determine which prediction units need to encode PREDMODE; and PREDMODE itself determines what motion vectors need to be present.

In motion estimation, an overall cost for each MB is computed, and compared for each legal combination of MB_SPLIT, MB_CBMODE and value of PREDMODE. Then the best combination is selected for coding.

Block data Parameters other than MB_SPLIT and MB_CBMODE are termed block data, even though they may apply to blocks, sub-MBs or the MB itself depending on the value of the MB data. PREDMODE has already been described. The four remaining block parameters are: REF1_x: horizontal component of motion vector to the first reference frame; REF1_y: vertical component of motion vector to the first reference frame; REF2_x: horizontal component of motion vector to the second reference frame; REF2_y: vertical component of motion vector to the second reference frame.

Clearly not all of these values must be coded. If PREDMODE is MODE_REF1_0NLY then REF2_x and REF2_y will not be coded, for example.

Motion vector coding Motion vector (MV) coding is important to the performance of video coding, especially for codecs with a high level of MV accuracy (1/4 or 1/8 per). For this reason, MV coding and decoding is probably the most complicated part of the wavelet codec since significant gains in efficiency can be made by choosing a - 48 good prediction and entropy coding structure. The basic format of the MV coding module is similar to the coding of quantised coefficients: it consists of prediction, followed by binarisation, context modelling and adaptive arithmetic coding (Figure 17).

Median prediction of motion vector data Some of the conventions used herein will first be explained.

All the motion vector data is predicted from previously encoded data from nearest neighbours using a form of median prediction, which is described below. The two parameters MB_SPLIT and MB_CBMODE are deemed to comprise the MB data -all other parameters, even though (depending on the MB parameters) they may refer to blocks, sub-MBs or MBs, are deemed to be associated with particular blocks, and are termed block data. This allows for a consistent prediction and coding structure to be adopted.

Example. If MB_SPLIT=1 and MB_CBMODE=0 then the prediction units in a MB are sub-MBs. Nevertheless, the prediction mode and any motion vectors are associated with the top-left block of each sub-MB and values need not be coded for other blocks in the sub-MB.

Example. If MB_SPLIT=2 but MB_CBMODE=1 then the block parameter PREDMODE need only be coded for the top-left block in the MB. Motion vectors need to be coded for every block in the MB if PREDMODE is not equal to MODE_NO_PRED.

The second convention is that all MB data is scanned in raster order for encoding purposes. All block data is scanned first by MB in raster order, and then in raster order within each MB. That is, taking each MB in raster order, each block value which needs to be coded within that MB is coded in raster order (see Figure 19). - 49

The third convention concerns the availability of values for prediction purposes when they may not be coded for every block. Since prediction will be based on neighbouring values, it is necessary to propagate values for the purposes of prediction when the MB modes have conspired to ensure that values are not required for every block.

Example. In Figure 20 below, we can see the effect of this. Suppose we are coding REF1_x. In the first MB, MB_SPLIT=0 and so at most only the top-left block needs a value, which can be predicted from values in previously coded MBs. As it happens, PREDMODE=MODE_REF1_0NLY and so a value is coded. The value v is then deemed to be applied to every block in the MB. In the next MB, MB_SPLIT=1 and MB_CBMODE=O, so the unit of prediction is the sub-MB. In the top-left sub-MB PREDMODE is, say, MODE_REF1AND2 and so a value x is coded for the top-left block of that sub-MB. It can be predicted from any available values in neighbouring blocks, and in particular the value v is available from the adjacent block.

Neiahbour-weiahted median prediction Before entropy coding the MV data is predicted in order to remove correlation, just as in Intra frame coding of wavelet subbands. In this case the prediction used is what I term neighbour-weighted median prediction (NWMP). This is median prediction from previously-coded neighbouring blocks/MBs, but with a bias towards those neighbours closest to the block being predicted. The aperture for the NWMP predictor is shown in Figure 21 below.

In many cases values are not available from all blocks in the aperture, for example if the prediction mode is different. In this case the blocks are merely excluded from consideration.

NWMP works by taking values associated to the nearest blocks, shown shaded in the diagram, and repeating them twice in the list of values from which the - 50 median is taken. The purpose of NWMP is that, in the event (for example) of a prediction mode change between a block and its neighbours, no predictor may be available from the nearest neighbours and so more distant neighbours should be consulted. On the other hand, if the nearest neighbours' values are available then they should be given more weight. Hence a larger aperture with neighbour-weighting. NWMP is, however, complex, and empirical attempts to simplify it, while permissible in implementations, should consider carefully the effects on performance.

In the case of the MB data, the number of possible values is only 3 in the case of MB_SPLIT and 2 in the case of MB_CBMODE. The prediction thereforecan use module arithmetic and produces an unsigned prediction residue of 0,1 or 2 in the first case and 0 or 1 in the second. All other predictions produce signed prediction residues. There are also only four prediction modes, so prediction module 4 could be used for PREDMODE; however, this number could increase if more potential reference frames are allowed. Median prediction for MB data and for PREDMODE may not seem prima facie intrinsically sensible as there is no concept of order. However, in practice it actually works reasonably well, and this data comprises a small proportion of the MV data as a whole, so a unified approach has been maintained.

Entropy coding Entropy coding of the MV prediction residuals uses the same basic architecture as for wavelet coefficient coding: unary VLC binarization, followed by adaptive arithmetic coding with multiple context models. For MV coding there are many different types of data, and these have their own context models.

There are 24 motion vector data contexts in total. However, there are many components of the MV data, and the majority are not in play at any one time.

The contexts for MB_SPLIT residues are: - 51 MB_CTX_SPLIT_BIN1: context for Bin 1 of the unary binarization; MB_CTX_SPLIT_BIN 2: context for Bin 2; MB_CTX_SPLIT_BIN 3: context for Bin 3.

There is only one context required for MB_CBMODE, since the value of residues is either 0 or 1:

MB_CTX_CBMODE

PREDMODE residues are more numerous - there are five: MODE_CTX_BIN1: context for Bin 1 of magnitude the bits; MODE_CTX_BIN2: context for Bin 2 of magnitude the bits; MODE_CTX_BIN3: context for Bin 3 of magnitude the bits; MODE_CTX_BIN4plus: context for the remaining bins of the magnitude bits; MODE_CTX_SIGN: context for sign bits.

The remaining data is the actual motion vector values: REF1_x, REF1_y, REF2_x, REF2_y. As for wavelet coefficient coding, these values are contextualized on the basis of the size of their neighbours, although in this case it is the size of the neighbouring prediction residuals not the neighbouring values themselves. For the purposes of context modelling, residue values are assumed to be propagated in just the same way that the values themselves are - so after prediction, the propagated values corresponding to Figure 20 are illustrated in Figure 22 below.

If the neighbouring residues r(x-1,y) and r(x,y-1) are available then a value nhood_residue_size can be defined by: Mood _ resid ue _ size(x, y) = I r(x - 1, Y)| + | r(x, y - 1)1 If only one of these values is available, then nhood_residue_size is equal to twice its magnitude. Only Bin 1 is contextualised on nhood_residue_size - the - 52 contexts for horizontal vector components are (vertical components are similar): VEC_CTX_X_BIN1 a: Bin 1 context for magnitude bits, nhood_residue_size<3; VEC_CTX_X_BIN1b: Bin 1 context for magnitude bits, 3 nhood_residue_size 15; VEC_CTX_X_BIN1c: Bin 1 context for magnitude bits, nhood_residue_size>15 or r(x-1,y) and r(x,y-1) are not available.

The remaining horizontal vector contexts are: VEC_CTX_X_BIN2: Bin 2 context for magnitude bits; VEC_CTX_X_BIN3: Bin 3 context for magnitude bits; VEC_CTX_X_BIN4: Bin 4 context for magnitude bits; VEC_CTX_X_BIN5plus: context for the remaining bits of the magnitude bits; VEC_CTX_X_SIGN: context for the sign bits.

The resulting compression is quite powerful, but rather complex; ideally, the number of contexts should be reduced.

DC coefficient coding If good predictions are not available, then a block is deemed to be intra coded, and no subtraction is performed for that block in the motion compensation process. Intra blocks have quite different characteristics from other areas of the Inter frame. In particular, their mean will be non-zero, whereas the mean value in other areas will be close to zero. Although the block overlaps will soften the edges of these areas, they will still be coded relatively inefficiently because of the transition in DC value. To deal with this, the codec subtracts the local DC from these areas and codes it separately.

The DC value is computed in the coder from the un-compensated original frame, although the coder is of course free to compute the value in any way. It - 53 is subtracted from the Inter frame using the same raisedcosine windowing: intra blocks can therefore be considered as not being intra at all, but as predicted by weighted 'constant blocks'. Since PREDMODE can apply to blocks, sub-MBs or MBs themselves, depending on MB parameters, the same applies to DC coefficients. Hence DC removal is applied for variable-sized blocks and DC values are coded according to the MB types imposed by the MB parameters.

DC coding operates on the principle established herein of prediction followed by entropy coding. However, both prediction and entropy coding are simpler in this case.

As for MV data encoding, the DC coefficients are scanned by MB in raster order and in raster order within each MB, depending on the MB data and prediction modes. DC coefficients are predicted by some value dependent only on the previous DC value to be encoded. In particular, there is no attempt as yet at neighbourhood prediction. Instead, the prediction for DC(n), the nth DC coefficient so far coded is given by: R(n)=DC(n)-d DC(n1) for some value d.

If the block coordinates of DC(n) are designated (x(n),y(n)) then the factor d is given by: I I Max(|x(n) - x(, - 1)| + Iy(n) - y(n -1)|,8) The purpose of this arrangement is that DC blocks are generally very sparse.

There is no point in having a very complicated prediction structure from neighbours that don't, for the most part, have coded DC values anyway. The value of a prediction also recedes with distance. If the last DC block coded was a long way away then using it as a prediction will increase bit rate, not decrease it. - 54

Entropy coding uses the expGolomb VLC for the magnitude bits, which is as shown in Fig. 23.

The rule is that there is a length prefix of N zeroes followed by a 1. Then there is an N-bit binary number b. If N>O, the number represented is 2N-+b. Sign bits are uncoded.

* The method of DC coefficient coding may be empirically adjusted to improve its performance. However, the basic idea is effective and should give some gains on the most difficult sequences, where motion estimation fails.

Further developments The following sections briefly sketch some further ideas in video coding that can be used to develop the above wavelet coding architecture.

Non-linear waveless Wavelet transforms can be understood in the context of a device called a lifting scheme. In the lifting scheme, a discrete signal is decomposed into two parts in three stages, shown in Fig. 24. A splitting stage does an initial reversible split, which could (for example) just be a split into alternate samples. A predict stage then uses one (the lower in the figure) of the resulting streams to predict the other, and this is then replaced by the difference between the sample and its predictor. Using this new sequence the other stream can then be modified by an update function. The result, provided that the original splitting process is reversible by some merge function, is itself clearly reversible by performing these operations in reverse order with addition replaced by subtraction. - 55

If the split, predict and update processes are linear, then the result is a perfect- reconstruction filter bank, and hence waveless. In fact, any orthogonal or biorthogonal wavelet transform can be constructed by iterating lifting schemes, optionally with additional gain multipliers and reversing the order of predict and update.

Example. Suppose the split function simply splits the signal xn into even and odd samples, x2n and x2n+. Suppose also that the predict operator is the identity P(zn)=zn, and the update operator simply divides by 2: U(zn) =z,,/2. Then the resulting transform is (up to a gain factor in each stream) equal to the Haar wavelet filter bank (112,112) and (1/2,-112).

Very complicated transforms can be built up from simple stages very easily, since each Split function can be iteratively composed of a whole lifting stage.

The update and predict operators need not be linear. If the initial split is a linear wavelet splitting, then the predict and update functions can be used to correct for the less welcome characteristics of the wavelet filters. This opens the way to non-linear waveless.

On natural images, wavelet filters must perform a compromise. For relatively smooth areas of the picture, a large number of vanishing moments are desirable so that the signal can be represented compactly as being approximately polynomial. This requires a long filter, however. At edges within the image, long filters are a disadvantage and cause ringing when combined with quantisation. So it makes sense to vary the length of the filter according to the signal characteristics so that both smooth areas and edges can be represented compactly. Possible advantageous implementations include: - 56 1) A splitting function derived from a short-length wavelet filter bank. So far experiments have been confined to (1/2,112), (1/2,-1/2) although slightly longer filters may improve iterability.

2) An edge detector for switching prediction modes, operating on the lowpass filtered and subsampled signal.

3) A predict operator based on high-order polynomial interpolation away from the corners of detected edges, leaving values near the corners (where ringing occurs) unchanged.

There is no update operator in this scheme.

The effect is that information about the high-pass filtered subband is gleaned from the low-pass filtered subband by investigating edges, which are compact artefacts spatially but which spread energy over a wide range of frequencies.

This is then used to provide additional prediction of the high-pass band to reduce the number of coefficients that need coding.

In this case the predict operator is only non-linear by virtue of being switchable.

Median or stack filters might also prove useful.

Coding interlaced pictures using waveless Wavelets are generally used in video coding to transform the whole picture: although it is possible to use them as local block transforms, their coding efficiency tends to be reduced. However, in interlaced video, this appears to limit the application of the wavelet transform to coding either whole frames or whole fields. As a first attempt at interlace video coding, coding frames or fields might be quite effective. However, interlace artefacts (vertical-temporal aliasing) arise through motion, and often local motion at that. We have therefore considered a motion-compensated wavelet decomposition. - 57

The idea is that motion compensation is applied to the first stage of wavelet decomposition only, and then only in the vertical direction. MC is used to shift lines, or parts of lines, from the two fields to compensate for the motion of objects so that vertical filtering can then be applied. This process is illustrated in Figure 25.

This principle, although particularly helpful for interlace pictures may be applied to pictures which have been partitioned in other ways than conventional interlace splitting. The portions may correspond to different spatial subsets (e.g. alternate horizontal or vertical lines, chequerboard patterns etc) and the different portions may correspond to different times or the same time. Where the portions correspond to different times, motion vectors may be estimated but in other cases a simple displacement vector between matching portions which does not correspond to motion but simply to areas of correlation may be estimated. It will be appreciated that there are numerous possibilities for employing the technique; the embodiment will be described in the advantageous context of application to interlace however.

We note that it is known to perform motion compensated filtering using perfect reconstruction filter banks along the temporal axis [S.-J. Choi and J. Woods, Motion-compensated 3-D subband coding of video', IEEE Trans. Image Processing, Vol 8 No 2, Feb 1999]. However, this is complex whereas in the present embodiment we provide a motion-compensated vertical filter within the same frame; this feature may be provided independently.

The filtering process works as follows. Field 1 is used as a temporal reference for motion estimation of Field 2, but the only motion vectors considered are horizontal vectors shifting lines (or parts of lines) in Field 1 relative to the corresponding line in Field 2. Suppose that we consider the lines of the frame numbered in consecutive order from O with Field 1 and Field 2 lines alternating; - 58 suppose also for convenience that the top line is from Field 1, and so all Field 1 lines are even and all Field 2 lines odd (other orderings and numberings will be very similar in effect). Each sample xm,2n+1 on line 2n+1 of Field 2 is therefore supplied with a predictor Xm+v(m, 2n+1), 2n from Field 1. The decomposition high-pass output is: (Xm,2n+1 Xm+v(m,2n+1),2n)/2 The low-pass filter is slightly more complicated, since each pixel in Field 1 may predict none (if the pixel is concealed), one or more than one of the pixels in

Field 2. The rule is that:

i) if xm,2n predicts no pixels in Field 2 then the low-pass output is xm, 2n; ii) if xm 2n predicts one or more pixels then the output is one half of xm,2n and half the average (possibly weighted) of all of those predicted pixels. So that in the case that xm,2n predicts just one pixel xp 2n+1 then the output is: 1 5 (Xp,2n+1 + Xm,2n)/2 If it predicted two pixels, xp,2n+1 and xq,2n+1, then the output is: ((Xp,2n+1+ Xq,2n+1)12+ Xm,2n)12 (Weighted averages might be used in the case that overlapped blocks were used for motion estimation and compensation.) To recover the original values, one reverses this process. If each point in Field 1 predicts at most one point in Field 2, this is simple, but it is a little more complex if it predicts more than one. Suppose that both xp,2n+ 1 and xq,2n+1 are predicted by xm,2n. Then the high pass outputs for (p, 2n+1) and (q,2n+1) are: (Xp,2n+1 xm,2n)12 and (Xq,2n+1 xm,2n)12 respectively. 59

The low-pass output for (m,2n) is ((Xp,2n+1+ Xq,2n+1)/2+ Xm,2n)/2, and adding this to the average of the two high-pass outputs yields: ((Xp,2n+1+ Xq,2n+1)12+ Xm,2n)12+((Xp,2n+1 xm,2n)12+(xq,2n+1 Xm,2nl2)12 which is (xp,2n+1+ xq,2n+1)12. From this xm 2n and so xp,2n+1 and xq,2n+1 can be determined.

The case of more than one pixel in Field 2 being predicted by a pixel in Field 1 can be avoided by having only a single motion vector per line, and this would be the simplest approach (which is surprisingly effective) for an initial implementation.

After this initial vertical decomposition, the rest of the wavelet decomposition can be performed without any further motion compensation.

Developments The basic principle can be modified by in several ways. For a start, longer filters could be used. The description given above is for motion compensated (112,112) and (112,-112) filters. However any pair of half-band perfect reconstruction filters could be used. Suppose that g(k) and in(k) were such a pair, with g(k) the low-pass and in(k) the high-pass filter. The result of vertically filtering a picture by in(k) using motion compensation is given by using the real sample for each sample in Field 2 and using a motion-compensated sample for samples in Field 1. This means that the output at line 2n+1 is: Xr,2+1 = (h */rC x)r'2nFl = h(2k) X,.,2, j+l-2k + h(2k + 1) x,+;(, 2-2h+l)((2ntl) i2h+1)) = h(2k) Xr 2n+1-2k + h(2k + 1) t'+ii,,27-2k+l)72n-2' k k The low-pass filter could be constructed in a similar manner to that described above: x''2n (g mc X), 2n ig(2k) X, 2' 2k +g(2k+l).xr(2-2k+ly) ( *) k k - 60 where xp 2q+ denotes the average (possibly weighted) of all those samples Xv'2q' in Field 2 whose motion-compensated predictor in Field 1 is xp 2 Aside from longer filters, vertical motion compensation could also be incorporated, and the formulae given can be generalized to this case: for example, (*) would be modified to be: X, =(h* X) 2 +] =Xh(2k) X/ 2n+1-2k + úh(2k+1) X/\'I(r2/1-2k+1)2ll-2klv2(/ 2n-2k+1) k k The idea is not limited to the particular description given: other methods of performing motion-compensated perfect reconstruction filtering could be used.

Also, the roles of Field 1 and Field 2 can be interchanged in the previous description, as can the roles of even and odd lines. Although conventional interlaced video invariably has two fields, the techniques can be applied to more than two fields per frame were this considered desirable.

Coding of inter-frame residuals Most frames in any video coding scheme are likely to be inter frames, and most of the bit rate is due to them also, given a reasonably long GOP. Their efficient coding is therefore very important, and in the current scheme there is probably more scope to reduce their bitrate than that of other components of the bitstream.

Inter frames can be very variable in character. When prediction is good, the predicted data consists just of low-level noise. With 1/4 or 1/8 pel accuracy, this noise is at a very low level indeed, and will almost certainly be quantised away.

Areas that are predicted poorly can be of two basic types: revealed/concealed areas and non-translational motion areas. It is likely that a good motion estimator will code the former areas as intra, so they will be picture-like in character. The latter areas may sometimes be matched with similar areas and sometimes not. If predicted the residual can look quite random, although often very 'edgy' and certainly not whitenoise-like. - 61

The wavelet transform still does a good job of decorrelating these poorly predicted areas. However, the appropriate quantisation level in different parts of the picture may be quite different. In a basic but effective implementation, the coder defines a single quantiser for each subband. More quantisers could be defined by splitting each subband into blocks and this would be an advantageous next step.

A more complicated but potentially adaptive idea uses the parent-child relationship that waveless can introduce. As can be seen from the Figure 26 below, areas which are not well predicted tend to produce significant coefficients across all subbands, whereas areas that are well predicted tend to produce low-level, independent noise across all subbands.

The large coefficients tend to be organised in a parent-child arrangement, whereby the pixels corresponding to the same picture area with the same orientation (LH,HL or HH) are related, as illustrated in Figure 27.

Although parent and child pixels are likely to be decorrelated, they are not statistically independent, and it seems that the magnitude of a parent and its children are related. One way of exploiting this is to partition the coefficients in each subband (other than the DC band) according to size. This partition can then be inherited by the child subband via the parent-child relationship, and a different quantiser assigned to each partition. The important point is that if the partitioning process is performed on quantised coefficients in the parent subband, then it can be performed by the decoder in an identical fashion. There is then no need to signal the shape of the different partitions of the subbands to the decoder, and the only overhead, therefore, is signalling the additional quantisers. - 62

Implicit classification and quantisation As described above, the parent-child relationship available in wavelet- transformed data can be used to partition data implicitly for quantisation. When quantising a subband, the data in the parent subband can be classified by whatever means is desired into a number of sets and this classification then inherited by the child subband. Each subset of the band can then be given a separate quantiser. If the parent band has been encoded before the child band then the classification can be performed identically at the decoder. A different quantiser can then be selected by the encoder for each defined subset of the child band, and transmitted to the decoder, with no need to transmit the classification of child subband pixels. There are some important considerations: 1) The classification need not, or not only, be used for defining quantisation classes, but could also be used for defining contexts for entropy coding.

If a given set of coefficients have been quantised differently from another set of coefficients within the same subband, then it is likely that the statistical properties of the two sets will differ and different statistical contexts would be appropriate for entropy coding for the two sets, especially for adaptive arithmetic coding. Even if the same quantiser was selected for all sets (ie the classification was not used for defining quantisation classes) a partition of coefficients in the parent subband (on the basis of coefficient statistics or other means) could be used to define different coefficient classes for entropy coding, and could prove useful.

In so-called zerotree coders [A. Said and W. Pearlman, 'A new, fast, and efficient image codec based on set partitioning in hierarchical trees', IKE Trans Circ. Syst. for Vid. Tech., Vol 6, No3, June 1996], significance data - whether a - 63 coefficient is larger than a given threshold - is often coded conditionally on whether the parent coefficient is significant or not, but quantisation is not affected, and general-form partitioning into multiple sets has not been entertained. s

2) One useful classification is to classify coefficients in the parent subband into subsets with different variances or other statistical properties.

In particular, one suitable simple classification is into zero and nonzero coefficients, or into zero, small, and large coefficients. Local properties can also be taken into account, for example local masking so that a class of coefficients defined in this way can be quantised more heavily if the results are likely to be masked. Another case where this might be useful would be Region-of-lnterest (ROI) coding where coefficients corresponding to more important areas of the picture (such as human faces) might be less heavily quantised.

To facilitate the exploitation of local features, the classification into subsets, although implicit, could also be augmented, refined or amended with additional signalling from the encoder so that a finer classification could be obtained if the implicit one proved insufficient. For example, if coefficients are split into two classes: has_zero_parent and has_non-zero_parent, then the encoder could signal that has_nonzero_parent should be further split into, say, coefficients correponding to central areas of a picture and those corresponding to peripheral areas. Certain coefficients could even be identified and moved from one class to another.

3) The principle is not limited to two dimensional transforms. It can be applied to one-dimensional signals (for example in audio coding): in this case each parent coefficient has two child coefficients. It can also be applied to 3 - 64 dimensionals signals, for example a 3D-wavelet transform, motion- compensated or not, of a video signal. In a 3D transform, each coefficient has 8 children in the child subband.

4) Wavelet transforms are not the only transforms for which parent-child relationships can be defined. Other transforms, for example block transforms, Lapped-Orthogonal Transforms, wavelet packets and even adaptive so-called best-basis' transforms can be given a parent-child relationship [Z Xiong, K Ramchandran and MT Orchard, 'Wavelet Packet Image Coding Using Space Frequency Quantization', IEEE Trans. Image Proc, Vol 7, No. 6, June 1998.], and the method described above can be applied.

Block transforms, such as the LOT or the widely-used DOT, can be given a parent-child structure in a number of ways [T. Tran and T. Nguyen, 'A progressive transmission image coder using linear phase uniform filterbanks as block transforms', IEEE Trans. On Image Proc., Vol 8 No 11, Nov 1999].

To understand how this might be done, it is helpful to think of block transform coefficients as the outputs of critically-sampled perfectreconstruction filterbanks, just like wavelet coefficients, only with all subbands having the same bandwidth and being subsampled by equal factors. This can be done by conceptually grouping coefficients corresponding to the same area of the picture. So the (i,j)-th coefficient of block (p,q) can instead be thought of as point (p,q) within frequency subband (iJ) see Figure 31 in which a 4x4 array of 2x2 transform blocks can be considered as a 2x2 array of 4x4 subbands.

Given this rearrangement, Figure 32 shows two possible parent-child relationships, the first a wavelet-like relationship. In this case a coefficient may have children in different bands. -

5) The classification can be used for defining quantisers as well as quantisation classes.

An analysis of the statistics of the parent band can allow the quantiser values themselves to be determined implicitly also, avoiding the need to signal them to the decoder.

A representation of the partition and quantisation algorithm can be defined as IO follows. Given a subband B. let P(B) denote the parent subband to B. If c represents the coordinates of a coefficient in B. let p(c) in P(B) denote the parent coefficient coordinates; if v(c) denotes the value of the coefficient at c then let q(v(c)) denote the quantised value.

1. For every subband B. if B has a parent band that has already been encoded,

DO

{ 1.1 Set P=P(B) 1.2 Define sets Pa, , Pk such that a) Pi IMP, =0forall i, j b) UP=P i=, according to a predefined rule based on the quantised coefficients q(v(c)) ce P 1.3 Define sets B., , Bk by the rule ce Bj if and only if p(c)e P;. For each set B

DO

{ 1.3.1 Pick a quantiser qj for Bj. - 66

1.4.1 For each ce Bi, quantise c by q'.

This implicit adaptive quantisation can therefore be used to match the quantisation levels to the shape of the residual data. The approach may have a drawback that subbands cannot be decoded independently and this is often useful, but this is not always essential, and other methods of paralleliation may be employed.

Global motion compensation This is a simple idea to implement and is suitable for pans, tracking shots and especially for camera rotations and zooms. In these cases it makes sense to remove the global motion by globally motion-compensating the reference frame before using it to encode translational motion. In the case of a zoom, the magnification factor can be extracted and used to expand/contract the reference frame to match the current frame; the remaining translational motion will be small and both the motion vectors and the residual will be relatively easy to code.

A more complicated idea was used in an early proposal for H.264. In this case, general warping parameters were defined for the reference frame and the resulting reference used for translation motion compensation. This process can help remove many local sources of prediction error, such as figures walking towards a camera, or objects rotating, which cannot be eliminated in translational motion modelling.

Proportional weighting for motion compensation The bi-directional prediction performed within the coder uses a weighting of (0.5,0.5) for each of the reference frames. One feature introduced into H. 264 is - 67 the ability to apply different weights to reference frames, to compensate for fading/cross-fading effects. H.264 allows for a linear calculation of weights based on frame distance as well as direct calculation.

The effect of choosing different weights can be quite dramatic when there are cross-fades or major lighting changes. However, the problem is in detecting these occurrences and addressing them properly. Brute-force motion estimation, in particular, is likely to lead to errors even if a cross-fade has been detected.

The wavelet coder uses a hierarchical block-matching algorithm with DC-removal. This means that motion estimation is improves for fades and cross- fades. It also gives an opportunity to detect these effects with very little complexity: since a low-resolution version of both the reference frame(s) and the current frame are available, a simple correlator can be used to determine the correlation between them. If there is a significant imbalance between the correlation factor between the first reference and the current frame, and that between the second reference and the current frame, then the ratio of the correlation factors can be used to weight the references.

Frame interpolation A frame interpolator can be used to increase the decoded picture frame rate. A commercially available tool (from Microsoft Windows Media Player), based on gradient-flow techniques, not on the motion vector field can be computed in real time on a moderate PC for up to CIF resolution, although it possess artefacts when the motion estimation was not adequate, for example a fly-over of a mountain in which the revealed area behind the mountain was not present in the preceding frame and a different shape from the same area in the succeeding frame. 68

An inventive development, to mitigate the problems of the existing tool, is based on the fact that in most cases of Internet delivery, a higher frame-rate original is available to the encoder. A typical example would be standard definition material which had been converted to CIF at 25 frames/sec progressive, and then encoded at 12.5 fps by dropping frames. The function of the decoder frame interpolator is to recover the original 25Hz frame rate.

If this is the case it would be possible to perform the same frame interpolation in both the encoder and the decoder, and for the encoder to compare the result with the actual frame and send a correction signal. One possible signal would be the difference, but we have appreciated that this might not necessarily be the best because the interpolated frame, although plausible, might be quite different from the actual one - for example through a disparity in actual and predicted global motion. It would be better only to send a correction for areas which can be identified as revealed and to compensate for any global motion disparity in correcting them.

There are a number of possible cases of adjusted interpolation, in which information is sent to aid interpolation, including but not limited to: i) Sending a difference between the residual picture and the corresponding interpolated picture (ie the difference between true and interpolated pictures); ii) Sending a difference between the suitably scaled motion vector field between members of the subset of pictures used in deriving the interpolated picture, and the measured motion vector field between the residual picture and members of the subset of pictures (ie the difference between true and interpolated motion); - 69 iii) Sending motion vectors to adjust one or more of the members of the subset of pictures before using them for interpolation; iv) Sending a set of guide motion vectors for use in the interpolation procedure to improve accuracy; v) Sending parameters to otherwise control or improve the accuracy of the interpolation procedure, or to control its complexity. Coder

In a preferred implementation, adjustable parameters may include one or more of the following: -Lagrangian QP for l-frames -Lagrangian QP for L1-frames - Lagrangian QP for L2-frames - Lagrangian parameter for motion estimation.

- Cycles per degree in video standard at assumed viewing distance for perceptual weighting.

- Luminance only coding - Width of block used for OBMC, specified for luma file - Height of block used for OBMC, specified for luma file - Horizontal separation of blocks used for OBMC. Must be < block width for both luma and chrome, taking into account chrome subsampling.

- Vertical separation of blocks used for OBMC. Must be < block height for both luma and chrome, taking into account chrome subsampling. -

- Number of L1 frames in GOP - Separation of L1 frames from nearest I or L1 frame in GOP - Specifying that L1 frames are forward-predicted only s Decoder In a preferred implementation, all the relevant encoder adjustable parameters, and any other information necessary for successful decoding of the bitstream, will be conveyed to the decoder by means of suitably encoded header parameters embedded in the bitstream.

The parameters may include one or more of the following: - Luminance only decoding - Width of block used for OBMC, specified for luma file - Height of block used for OBMC, specified for luma file - Horizontal separation of blocks used for OBMC. Must be < xblen for both luma and chrome, taking into account chrome subsampling.

- Vertical separation of blocks used for OBMC. Must be < yblen for both luma and chrome, taking into account chrome subsampling.

- Width of luma picture - Height of luma picture Additional parameters may include: - Number of frames to be decoded - the option to specify 444 format chrome sampling - the option to specify 422 format chrome sampling - 71 - the option to specify 420 format chrome sampling - the option to specify 411 format chrome sampling - Number of L1 frames in GOP Separation of L1 frames from nearest I or L1 frame in GOP - Specifying that L1 frames are forward-predicted only It will be appreciated that modifications of detail may be made within the scope of the invention. As explained above, features may be provided independently or in other combinations. - 72

Claims

1. A data processing method for processing a stream of data to be coded using prediction, the stream comprising at least first, second and third portions, in which the second portion may be coded using prediction based on the first portion and the third portion may be coded using prediction based on at least the second portion, the method comprising: forming an estimate of accuracy of a prediction of the second portion based on the first portion; and selectively predicting the third portion based on at least the second portion and the estimate of accuracy of the prediction of the second portion.

2. A method according to Claim 1 wherein the estimate is based only on data that will be available to a decoder which is receiving the signal serially.

3. A method according to Claim 1 or 2 wherein the third portion comprises an individual coefficient.

4. A method according to Claim 1 or 2 wherein the third portion comprises a set of coefficients.

5. A method according to any preceding claim further comprising selectively predicting subsequent portions based on the success of preceding prod ictions.

6. A method according to Claim 5 wherein the second and third portions of a first selective prediction are used respectively as the first and second portions in a second selective prediction method, to predict a new third portion. - 73

7. A method according to any preceding claim wherein selectively predicting the third portion comprises comparing the magnitude of the second portion to the magnitude of the difference between the second and first portions.

8. A method according to Claim 7 wherein the third portion is predicted only if the second portion is substantially greater than said magnitude of the difference between the second and first portions.

9. A method according to Claim 8 wherein the second portion is required to be at least a given multiple of the said magnitude of the difference between the second and first portions.

10. A method according to Claim 9 wherein said multiple is at least two, preferably at least about four, more preferably about 6.

11. A method according to Claim 8 wherein a scaling function is applied to one of a cost measure of the second portion and a cost measure of the difference between the second portion and a prediction thereof based on the first portion.

12. A method according to any preceding claim wherein each of the second portion and third portions are predicted based on a plurality of preceding elements.

13. A method according to any of Claims 1 to 11 wherein, if the third portion is predicted, the second portion is used as a prediction.

14. A method according to any preceding claim wherein, following selectively predicting, either the third portion or the difference between the third portion and a prediction thereof is coded. - 74

15. A method according to any preceding claim wherein the data is processed by a hierarchical coding scheme in which data is coded in successive layers, wherein the third portion comprises data from a higher layer and the second portion comprises corresponding data from at least one lower layer.

16. A method according to any preceding claim wherein the selective prediction of the third portion comprises a prediction based on the data preceding the third portion if a scaling function applied to a cost measure of the magnitude of the difference between the second portion and a prediction of the second portion based on the data preceding the second portion is less than a cost measure of the magnitude of the second portion, and a default prediction otherwise.

17. A method according to Claim 16 wherein, where the third portion comprises one or more numerical coefficients, the default prediction is zero.

18. A method according to Claim 16 or 17 wherein the scaling function comprises multiplication by a parameter and preferably wherein the parameter has a value of at least two.

19. A method according to Claim 18 wherein the parameter is set to a fixed value.

20. A method according to Claim 18 wherein the value of the parameter is adjusted based on dynamic analysis of the data. -

21. A method according to Claim 20 wherein the parameter is adjusted based on at least one of: a) The prior inputs to the selective prediction method; b) The prior outputs of the selective prediction method; c) The prior outputs of the prediction based on the data preceding the portion to be coded; d) The prior outputs of the comparison.

22. A method according to any preceding claim wherein the method of prediction is dynamically adjusted based on the results of prediction for preceding data.

23. A method according to Claim 22 wherein the effectiveness of the selective prediction method is compared with the effectiveness of the basic prediction method. - 76

24. A method of selectively predicting a current portion of data based on preceding data comprising selectively predicting a current portion of data x(i) in accordance with the formula: P (S(i 1)) P(S(i -1)) if A C(P(57(i 2)) - X(i -1)) < C(X(i -1)) O otherwise wherein P(S(i-1)) represents the selective prediction of the current portion of data based on at least a part of the set of data preceding the current portion; P(S(i-1)) represents a prediction of the current portion of data based on at least a part of the set of data preceding the current portion; P(S(i-2) ) represents a prediction of the preceding portion of data based on at least a part of the set of data preceding the preceding portion; C represents a measure of cost of transmitting or storing the data; A represents a parameter; x(i-1) represents the preceding portion of data.

25. A method according to Claim 24 wherein each portion of data comprises a single coefficient.

26. A method according to Claim 24 wherein each portion of data comprises a group of coefficients.

27. A method according to Claim 25 wherein each prediction comprises the preceding coefficient.

28. A method according to Claim 24 or 25 wherein the cost measure comprises the magnitude of the coefficient.

29. A method according to Claim 21 a, 24 or 25 wherein the cost measure comprises a measure of the entropy of the coefficient or coefficients.

30. A method according to any of Claims 24 to 29 wherein at least one of the cost measure, the method of prediction and the parameter are adapted in response to successive portions of data.

31. A method according to any preceding claim wherein portions to be coded are quantised prior to prediction.

32. A method according to any of Claims 1 to 30 wherein portions to be coded are unquantised and the residual data after selective prediction is quantised.

33. A method according to Claim 32 wherein the coefficients used for prediction are reconstructed coefficients, on which inverse quantisation and prediction have been performed.

34. A method according to Claim 32 or 33 wherein, in the case of adaptive modification of prediction method or cost measure or prediction parameters, adjustment is based on reconstructed coefficients.

35. A method of coding a video picture in which the picture is partitioned into at least first and second portions, the method comprising predicting elements of the second portion by displacing elements of the first portion and coding the differences between elements of the second portion and the respective predictions thereof. - 78

36. A method according to Claim 35 wherein displacing comprises displacing substantially diagonal edges to be more aligned vertically or horizontally.

37. A method according to Claim 35 or 36 wherein coding comprises coding by a wavelet coding method.

38. A method according to Claim 35 wherein the first and second portions correspond respectively to first and second times.

39. A method according to Claim 38 wherein displacing elements comprises displacing elements based on estimated motion between the first and second times.

40. A method according to any of Claims 35 to 39 wherein elements of the first portion are positioned at spatially different but substantially adjacent positions to elements of the second portion.

41. A method according to Claim 40 wherein the first and second portions are interleaved.

42. A method according to Claim 41 wherein the portions comprise alternating picture lines.

43. A method according to Claim 41 wherein the first and second portions comprise fields of an interlaced video picture frame. - 79

44. A method of coding an interlaced video picture frame comprising at least first and second fields, each field comprising a plurality of picture lines, the lines of the first field interleaving with lines of the second field, the first and second fields corresponding to respective first and second times, the method comprising: predicting elements of the second field based on the first field, wherein predicting comprises shifting elements of the first field along the direction of said picture lines based on an estimated component of motion along said lines between said first and second times, whereby elements of the second field are predicted based on elements of the first field which are estimated to be aligned;

coding the second field following prediction.

45. A method according to Claim 42 or 44 wherein the lines are horizontal and only horizontal motion is estimated.

46. A method according to Claim 42 or 44 wherein motion is estimated in two directions.

47. A method according to Claim 46 wherein two-dimensional motion vectors are determined.

48. A method according to Claim 46 wherein motion is estimated separately for the two directions.

49. A method according to any of Claims 35 to 48 wherein a single element of the first portion or field is used to predict each element of the second

portion or field. - 80

50. A method according to any of Claims 35 to 48 wherein a plurality of elements of the first portion or field is used to predict each element of the

second portion or field.

51. A method of coding an interlaced video picture frame comprising at least first and second fields, each field comprising a plurality of picture lines, the lines of the first field interleaving with lines of the second field, the first and second fields corresponding to respective first and second times, the method comprising: predicting elements of the second field based on the first field, wherein predicting comprises shifting elements of the first field along the direction of said picture lines but not perpendicular to said lines based on an estimated component of motion along said lines between said first and second times, whereby elements of the second field are predicted based on elements of the first field which are estimated to be aligned in one direction only;

coding the second field following prediction.

52. A method according to any of Claims 35 to 51 wherein elements of the first portion or field are used both as a basis for estimating a component of apparent motion between the first and second portions or fields and as a basis for predicting values of elements of the second portion or field.

53. A method of coding an interlaced video frame comprising first and second fields, the method comprising predicting elements of the second field based on elements of the first field which are displaced horizontally in accordance with estimated horizontal components of motion between the first and second fields and coding the second field following prediction.

54. A method according to any of Claims 43 to 53 wherein only a single estimate of horizontal motion is used for each picture line. - 81

55. A method according to any of Claims 43 to 54 wherein each element of the second field is associated with a predictor comprising an element of the first field from a corresponding picture line at a position along the corresponding line corresponding to the position along the line of the element of the second field to be predicted plus an offset based on estimated motion.

56. A method according to any of Claims 43 to 55 wherein the picture is coded by a wavelet coding method following prediction.

57. A method according to any of Claims 43 to 56 wherein a difference between each element of the second field and a corresponding element of the

first field is coded.

58. A method according to Claim 57 wherein the corresponding element of the first field comprises an element of the first field shifted based on a component of estimated motion.

59. A method according to Claim 57 or 58 wherein said difference is used to provide a high-pass output of a wavelet coding filter, preferably wherein half the difference is coded.

60. A method according to Claim 59 wherein a low pass output of the wavelet coding filter comprises the corresponding a component if the estimated prediction predicts no elements on a line or, if one or more elements on the second line are predicted, the average of the corresponding element of the first field and the predicted elements of the second field.

61. A method according to any of Claims 35 to 60 wherein each element comprises an individual pixel. - 82

62. A method according to any of Claims 35 to 60 wherein each element comprises a group of pixels.

63. A method according to any of Claims 35 to 62 wherein the frame is coded without further motion estimation.

64. A method according to Claim 63 and 56 wherein further wavelet decomposition is performed without further motion estimation.

65. A method according to Claim 1 wherein the data is coded by a hierarchical coding method having at least parent and child coding levels and wherein third portion comprises a portion of the child coding level and the second portion comprises a portion of the parent coding level.

66. A method of hierarchical coding of data in which data is coded by a coding scheme into at least parent and child coding levels, wherein parent and child levels are both quantised, the method comprising selecting quantisation parameters for at least a portion of the child coding level based on the quantisation parameters for at least a corresponding portion of the parent coding level.

67. A method according to Claim 66 wherein the coefficients in the parent coding level are partitioned and wherein the partitioning of the child level is based on the partitioning of the parent level.

68. A method according to Claim 67 wherein partitioning of the parent level is based on the quantised coefficients of the parent level.

69. A method according to Claim 68 wherein the parent level is communicated to a decoder prior to the child level. - 83

70. A method of decoding a signal coded by a method according to any preceding claim comprising reconstructng a portion of the signal to be reconstructed based on a received portion of data and a method of selective prediction of a subsequent portion corresponding to said method according to any preceding claim.

71. A method according to Claim 70 of decoding a signal coded according to Claim 1, the method comprising receiving and decoding first and second coded portions of data, selectively predicting a third portion of data based on the first and second portions and applying received coded data for the third portion to the selective prediction to reconstruct the third portion.

72. A method according to Claim 70 of decoding data coded hierarchically according to Claim 66, the method comprising: receiving data encoding a parent level and reconstructing the parent level; reconstructing the child level based on received data for the child level and re-using data used in the reconstruction of the parent level.

73. A method of coding a sequence of pictures comprising: selecting only a subset of the sequence of pictures for communication leaving a subset of residual pictures; interpolating from the subset of pictures for communication according to an interpolation algorithm to create at least one interpolated picture corresponding substantially to one of the subset of residual pictures; encoding adjustment information based on a difference between the at least one interpolated picture and the corresponding residual picture; communicating the subset of pictures for communication together with the adjustment information.

74. A method according to Claim 73 wherein the subset of pictures is encoded. - 84

75. A method according to Claim 73 or 74 wherein the sequence of pictures has a first frame rate and wherein the subset of pictures comprises a sequence of pictures having a lower frame rate than the first frame rate.

76. A method according to Claim 75 wherein pictures are dropped as residual pictures at substantially regular intervals to reduce the frame rate.

77. A method according to Claim 75 or 76 wherein pictures are dropped as residual pictures based on picture information content.

78. A method according to any of Claims 73 to 77 wherein pictures are dropped as residual to control the output frame rate.

79. A method according to any of Claims 73 to 78 wherein pictures are dropped as residual to control the output bitrate.

80. A method according to any of Claims 73 to 79 wherein both selection of pictures and coding parameters are adjusted to control the output bitrate.

81. A method according to any of Claims 73 to 80 further comprising communicating the output to be received by a decoder arranged to perform interpolation in accordance with the interpolation algorithm.

82. A method of reconstructing a sequence of pictures comprising: receiving a sequence of pictures to produce a sequence of received pictures; interpolating at least one picture from the received pictures according to a predetermined interpolation algorithm; receiving adjustment information and applying the adjustment information to the or each interpolated picture to modify the or each interpolated picture; outputting a sequence of pictures comprising the received pictures and the modified interpolated pictures. -

83. A method according to Claim 82 including decoding the sequence of pictures received to provide received pictures.

84. A method of communicating a sequence of pictures comprising: at a coder, selecting only a subset of the sequence of pictures for communication leaving a subset of residual pictures; interpolating from the subset of pictures for communication according to an interpolation algorithm to create at least one interpolated picture corresponding substantially to one of the subset of residual pictures; encoding adjustment information based on a difference between the at least one interpolated picture and the corresponding residual picture; communicating the subset of pictures to a decoder together with the adjustment information; and at the decoder, receiving the subset of pictures to produce a sequence of received pictures; interpolating at least one picture from the received pictures according to the interpolation algorithm; receiving the adjustment information and applying the adjustment information to the or each interpolated picture to modify the or each interpolated picture; outputting a sequence of pictures comprising the received pictures and the modified interpolated pictures.

85. A method according to any of Claims 73 to 84 wherein the interpolation algorithm comprises interpolating based only on information available for one or more pictures other than the picture to be interpolated.

86. A method according to Claim 85 wherein the pictures other than the picture to be interpolated comprise pictures encoded using prediction based on motion vectors. - 86

87. A method according to Claim 86 wherein the motion vectors for the other pictures are used in interpolation.

88. A method according to Claim 87 wherein motion vectors are communicated for pictures selected for communication but wherein no motion vectors are communicated specifically for the interpolated pictures.

89. A method according to any of Claims 73 to 88 wherein the adjustment information includes information concerning the timing of an interpolated frame to which the adjustment information relates.

90. A method according to any of Claims 73 to 88 wherein adjustment information is provided for all pictures to be interpolated.

91. A method according to any of Claims 73 to 88 wherein pictures may be interpolated without adjustment.

92. A computer program or computer program product for performing a method according to any preceding claim.

93. A coder arranged to perform a method according to any of Claims 1 to 69 or 73 to 81.

94. A decoder arranged to perform a method according to any of Claims 70 to 72 or 82 to 83.

95. Apparatus substantially as any one herein described or as illustrated in the accompanying drawings.

96. A method substantially as any one herein described, with reference to the accompanying drawings. - 87

97. A coder arranged to process a stream of data to be coded using prediction, the stream comprising at least first, second and third portions, in which the second portion may be coded using prediction based on the first portion and the third portion may be coded using prediction based on at least the second portion, the coder comprising: means for forming an estimate of accuracy of a prediction of the second portion based on the first portion; and means for selectively predicting the third portion based on at least the second portion and the estimate of accuracy of the prediction of the second portion.

98. A coder arranged to process a stream of data to be coded using prediction, the stream comprising at least first, second and third portions, in which the second portion may be coded using prediction based on the first portion and the third portion may be coded using prediction based on at least the second portion, the coder comprising: a prediction estimator arranged to form an estimate of accuracy of a prediction of the second portion based on the first portion; and a selective predictor arranged to selectively predict the third portion based on at least the second portion and the estimate of accuracy of the prediction of the second portion.

99. A coder for coding a video picture in which the picture is partitioned into at least first and second portions, the coder comprising means for predicting elements of the second portion by displacing elements of the first portion and means for coding the differences between elements of the second portion and the respective predictions thereof. - 88

100. A coder for coding an interlaced video picture frame comprising at least first and second fields, each field comprising a plurality of picture lines, the lines of the first field interleaving with lines of the second field, the first and second fields corresponding to respective first and second times, the coder compnslng: means for predicting elements of the second field based on the first field, wherein predicting comprises shifting elements of the first field along the direction of said picture lines based on an estimated component of motion along said lines between said first and second times, whereby elements of the second field are predicted based on elements of the first field which are estimated to be aligned in said direction; means for coding the second field following prediction.

101. A coder for coding an interlaced video picture frame comprising at least first and second fields, each field comprising a plurality of picture lines, the lines of the first field interleaving with lines of the second field, the first and second fields corresponding to respective first and second times, the coder comprising: a predictor for predicting elements of the second field based on the first field, wherein predicting comprises shifting elements of the first field along the direction of said picture lines based on an estimated component of motion along said lines between said first and second times, whereby elements of the second field are predicted based on elements of the first field which are estimated to be aligned in said direction; a coding module for coding the second field based on the output of said predictor. - 89

102. A coder for coding a sequence of pictures comprising: means for selecting only a subset of the sequence of pictures for communication leaving a subset of residual pictures; an interpolator for interpolating from the subset of pictures for communication according to an interpolation algorithm to create at least one interpolated picture corresponding substantially to one of the subset of residual pictures; means for encoding adjustment information based on a difference between the at least one interpolated picture and the corresponding residual picture; means for communicating the subset of pictures for communication together with the adjustment information.

103. A coder according to Claim 102 including means for encoding the subset of pictures for communication.